DB Migration
Migration strategies for Azure Cosmos DB involve choosing the right tools based on the API, data volume, and downtime requirements.
1. Migration Types
| Type | Description | Tools |
|---|---|---|
| Offline | Stop application, migrate all data, then restart application pointing to Cosmos DB. Highest downtime. | ADF, Spark, Mongoexport/import, AzCopy |
| Online | Initial snapshot migration followed by incremental sync of changes. Minimal downtime. | DMS, ADF with CDC, Kafka Connector |
2. Core Migration Tools
Azure Data Factory (ADF)
- Best for: Scale-out migrations, multi-source ingestion, and low-code ETL.
- Performance: Integrates with the Bulk Executor Library for high-throughput writes.
- Incremental Load: Supports Change Data Capture (CDC) for SQL and NoSQL sources.
Azure Database Migration Service (DMS)
- Best for: Online migration with minimal downtime, specifically for Cosmos DB for MongoDB.
- Process: Fully managed service; handles initial load and ongoing replication until cutover.
Azure Cosmos DB Live Data Migrator
- Best for: NoSQL API migrations.
- Capability: Supports live migration of data between containers or accounts.
Spark Connector (v3)
- Best for: Extremely high-volume migrations (terabytes) using the power of Spark.
- Mechanism: Uses the Bulk Executor Library internally to maximize RU usage.
Azure Cosmos DB Data Migration Tool
- Best for: Smaller, one-time offline migrations from JSON, CSV, SQL Server, or MongoDB.
- Type: Open-source desktop/command-line tool.
3. Migration Strategy (DP-420 Key Concepts)
-
Preparation:
- Partition Key: Ensure the partition key in the target container is optimized for the new workload.
- Data Modeling: Denormalize data if moving from Relational to NoSQL.
- RU Provisioning: Scale up RUs on the target container (temporarily) during the migration to avoid 429 (Rate Limiting) errors.
-
Validation:
- Start with a subset of data to validate partition key performance and query patterns.
- Verify document integrity and count after migration.
-
Post-Migration:
- Scale RUs back down to normal operational levels.
- Configure TTL, Indexing, and consistency levels as required.
4. Specific Scenarios
- Migrating from MongoDB: Use DMS (Online) or native tools like
mongodump/mongorestore(Offline). - Migrating from SQL Server: Use ADF or the Data Migration Tool.
- Migrating Large Blobs/JSON: Use ADF with a focus on partitioning source files.