A 12TB production database in a HIPAA-regulated environment cannot be migrated with a maintenance window. The stakes for PHI data continuity, insurance claim processing, and real-time scheduling across thousands of practices made downtime commercially and legally unacceptable. The existing MariaDB Enterprise cluster on EC2 had no managed failover, no automated point-in-time recovery, and a replication topology that had grown organically over the years. Before we wrote a single line of Terraform, we needed to understand exactly what we were working with and where the failure modes were.
- Infrastructure and replication audit: We mapped the full topology of the existing MariaDB Enterprise cluster, including replication lag behavior, failover procedures, backup schedule, and the specific EC2 instance configurations that could affect RDS parity. This audit surfaced several undocumented dependencies in the application's connection handling that would have caused silent failures if we had proceeded without them. We pushed back on the initial timeline as a result of this audit, and the client agreed.
- Phased migration design with HAProxy as the control layer: Rather than a direct cutover, we designed a staged approach. HAProxy was introduced as a routing intermediary between the application layer and the database, giving us precise control over traffic during the transition. We defined RPO/RTO targets up front, modeled the replication lag thresholds that would trigger a go/no-go decision, and documented the rollback procedure before beginning execution.

