Why Migrations Fail
Most cloud migrations fail in predictable ways: underestimating data transfer costs, ignoring legacy dependencies, treating lift-and-shift as a strategy, and lacking rollback plans. The successful ones treat migration as a product delivery, not an infrastructure project.
The most expensive migration is the one you have to do twice.
The most common failure pattern we see is the "big bang" migration. An organisation decides to move everything at once, underestimates the complexity, hits unexpected issues, and ends up with a partially migrated system that is less reliable than the original. We have seen companies spend 18 months and £2 million on a migration that could have been completed in 6 months with a phased approach.
The third major failure is neglecting data gravity. Large datasets are expensive and slow to move. We have seen migrations stall for months because data transfer bandwidth was insufficient. Data gravity pulls compute toward it: moving a cluster without its data creates latency, cost, and complexity.
The Six-Phase Framework
1. Discovery and Assessment
Map every application, dependency, data flow, and integration. Classify workloads: retire, retain, rehost, replatform, refactor, or rebuild. Most organisations discover 30% more dependencies than expected. This phase typically takes 4-6 weeks for a medium-sized estate.
The 6R framework provides a structured approach: Retire (shut down unused systems), Retain (keep on-premises), Rehost (lift-and-shift), Replatform (move with minor optimisation), Refactor (re-architect for cloud-native), and Rebuild (start from scratch).
Dependency mapping is the most critical activity. We use automated tools (AWS Application Discovery Service, Azure Migrate, or CloudQuery) to scan the estate and build dependency graphs. Then we validate with application teams, who often know about "shadow dependencies" that the tools miss.
2. Landing Zone Design
Build the cloud foundation before migrating workloads. Network topology, identity management, security baselines, and cost controls must be in place. The landing zone is your production environment — treat it with the same rigour.
The landing zone comprises several layers: network (VPCs, subnets, routing, DNS), identity (SSO, RBAC, service accounts), security (firewalls, encryption, logging, compliance), and operations (monitoring, alerting, backup, DR).
Cost controls are particularly important. Cloud bills can spiral without governance. We implement budget alerts at 50%, 80%, and 100% of allocated spend, with automatic escalation. Resource tagging policies attribute costs to teams, projects, and environments.
3. Pilot Migration
Select 2-3 non-critical applications that represent your estate's diversity. Migrate them completely, including monitoring, backup, and DR. Document every issue. The pilot reveals the real complexity that assessments miss.
The pilot is not a proof of concept; it is a rehearsal. It should use the same tools, processes, and runbooks that the main migration will use.
We recommend selecting pilots that cover different complexity profiles: one simple lift-and-shift (low risk), one replatforming effort (medium risk), and one refactoring project (high risk). The simple pilot builds confidence; the hard pilot reveals true complexity.
4. Wave Planning
Group applications by dependency, risk, and business criticality. Never migrate a system before its dependencies. We typically plan 4-6 week waves with 2-week buffers between them. Rushing waves together causes cascade failures.
The wave structure should reflect the dependency graph. Start with foundational services (databases, message queues, shared libraries), then move to business services, then user-facing applications. Never migrate a service before its database.
5. Execution
Each migration needs: a runbook, a rollback procedure, a communication plan, and a war room. Runbooks should be tested in a staging environment that mirrors production exactly. Untested runbooks fail at 3 AM.
The runbook should be a step-by-step guide that a competent engineer can follow without domain expertise. Every step should have an owner, a duration estimate, and a verification method.
6. Optimisation
Post-migration, 40-60% of cloud spend is typically waste. Rightsize instances, evaluate reserved capacity, and implement auto-scaling. Optimisation should start 30 days after migration, not six months later.
The optimisation phase has three stages. First, rightsizing: analyse CloudWatch, Azure Monitor, or Google Cloud Operations metrics to identify over-provisioned instances. An instance averaging 15% CPU utilisation is over-provisioned; downsize it or consolidate workloads. Second, reserved capacity: for predictable workloads, commit to 1- or 3-year reserved instances for 40-60% savings. Third, auto-scaling: implement dynamic scaling for variable workloads, with scale-up triggers at 70% CPU and scale-down triggers at 30%.
We typically schedule optimisation reviews at 30, 60, and 90 days post-migration. The 30-day review focuses on obvious waste: oversized instances, unused storage, and unattached resources. The 60-day review analyses usage patterns and implements reserved capacity. The 90-day review evaluates auto-scaling policies and refines them based on observed behaviour.
The Retain Decision
Not everything should move. Some systems are too risky, too tightly coupled to physical hardware, or too close to end-of-life to justify migration cost. Be explicit about what stays and why.
The retain decision is not a failure. It is a strategic choice.
Rollback Strategy
Every migration must have a rollback plan that can execute in under 30 minutes. This means maintaining data synchronisation between old and new environments during the migration window, and having DNS cutover tested and ready.
The rollback plan is not a theoretical document. It is a tested procedure that runs in under 30 minutes from the first alert to full service restoration.
Our Recommendation
Start with assessment, not migration. Spend time understanding dependencies. Pilot aggressively. Plan waves conservatively. And always have a tested rollback path.
Do not migrate alone. Cloud providers offer migration programs, funding, and expertise. Partners like us provide frameworks, tooling, and experience. The cost of professional help is a fraction of the cost of a failed migration.
The organisations that succeed treat cloud migration as a strategic transformation, not a technical relocation. They invest in discovery, build robust landing zones, run disciplined pilots, and optimise continuously after migration.