EMR Platform

Breaking the Terraform monolith

A single Terraform state managing your entire infrastructure feels manageable until it isn't. Every apply touches everything. Manual changes accumulate outside IaC. Drift becomes the norm, not the exception. How do you regain control of infrastructure that has grown beyond the boundaries of any single state file without rebuilding it from scratch?

sales@itsyndicate.org

Turning infrastructure chaos into a manageable platform

The EMR platform operating under HIPAA had reached a point where its Terraform configuration had become a liability. A single monolithic state-managed VPC, RDS, EKS, and shared services were managed together, meaning every infrastructure change, regardless of scope, required traversing the entire dependency graph. Manual modifications in AWS had accumulated outside IaC, creating drift that no one could fully account for. Following an acquisition, the gap between how the infrastructure was managed and what enterprise governance and audit readiness actually required had become impossible to ignore. The platform needed domain-level control, enforced IaC discipline, and an infrastructure posture that could be demonstrated to auditors with confidence.

Quick facts

EHR Platform

Private medical practices in the USA

Cloud-based EMR platform for private medical practices in the USA, handling PHI data under HIPAA across scheduling, charting, billing, and telehealth.

0% drift

Zero drift IaC enforced

By decomposing the Terraform monolith and enforcing an IaC-only change policy, we eliminated the manual AWS modifications that had been accumulating outside state. Every infrastructure change became reviewable, auditable, and traceable to a specific pipeline run.

Terraform + CI/CD

Domain-isolated state files, combined with scoped plan/apply approvals in CI/CD, mean that infrastructure changes are targeted, reviewed, and logged, not broad, manual, and unclear.

"For the first time, we could look at our infrastructure and trust that what Terraform described was what was actually running. That confidence changed how the entire engineering organization operated."

Michael Torres

CTO, US Healthcare EHR Platform

What we did for the EMR Platform

Decomposing the Terraform monolith

A monolithic Terraform state is not just a technical inconvenience. It is an organizational risk. When every domain shares a single state, the blast radius of any apply spans the entire infrastructure, and the incentive to make quick manual changes in the AWS console rather than wait for a full plan cycle becomes hard to resist. That incentive is exactly what had driven the drift this client was dealing with. Before we could enforce any governance policy, we needed to understand the full dependency structure and design a decomposition that teams could operate safely within.

State audit and domain segmentation: We audited the existing Terraform state structure and dependency graph in full, mapping every resource, identifying tightly coupled domains, and surfacing the undocumented manual changes in AWS that had diverged from state. The audit revealed drift across multiple resource types that had been introduced incrementally and never reconciled. We used this map to design a domain-based segmentation strategy, isolating VPC, RDS, EKS, and shared services into independent state files with clearly defined ownership boundaries. Decomposition was executed using Terraform state move operations, with zero resource recreation; the infrastructure did not change, only how it was managed did.
Environment-specific execution pipelines: With state segmented by domain, we created environment-specific execution pipelines for each isolated module. Engineers modifying the RDS domain ran plans and applied them only to that state, with no risk of inadvertently touching VPC or EKS configuration in the same operation. Pipeline execution was configured with a targeted plan output reviewed before any apply was permitted, making every proposed change visible and deliberate. The blast radius of any single infrastructure change dropped from the entire platform to a bounded, well-understood domain.

Enforcing IaC governance and audit readiness

Decomposing the state solves the structural problem. Eliminating drift requires enforcing a cultural and process change: manual modifications in the AWS console are no longer acceptable, and every infrastructure change must originate from a reviewed, pipeline-executed Terraform run. This is the part of infrastructure governance that tooling alone cannot deliver; it requires clear policy, enforced workflow, and documented ownership.

IaC-only change policy and manual modification removal: We audited all existing manual AWS modifications identified during the state review and reconciled them into Terraform, either formalizing them as intentional configuration or removing them as undocumented drift. IAM policies were tightened to restrict direct console modifications on infrastructure managed by Terraform, removing the path of least resistance that had allowed drift to accumulate. Terraform runs were integrated into the CI/CD workflow, making the pipeline the only sanctioned route for infrastructure changes. We pushed back on a request to maintain console access for "emergency" modifications and instead worked with the client to define a documented break-glass procedure that preserved auditability even during incidents.
Scoped approvals, ownership documentation, and audit posture: Plan and apply steps were separated in the CI/CD pipeline, with scoped approval gates that require a named reviewer before any apply is executed in a production environment. Environment ownership and domain responsibilities were documented, formally establishing who was accountable for each state file and who could approve changes to it. This documentation, combined with the pipeline audit trail, gave the compliance team a complete picture of infrastructure change history that could be presented directly in an audit context. Infrastructure governance moved from an aspiration to a demonstrable, evidenced practice.

Terraform administration and infrastructure control: FAQ

Why decompose a Terraform monolith rather than just clean it up?

Because the monolith structure is the root cause, cleaning it up without decomposing it recreates the same conditions that lead to drift and risk.

A single state file managing unrelated infrastructure domains means every apply carries risk across the entire platform, every change requires understanding the full dependency graph, and the incentive to bypass IaC for quick fixes remains. Domain-isolated states structurally eliminate each of these problems. Engineers work within bounded contexts, applications are scoped to targeted domains, and the organizational incentive to make manual changes is removed because pipeline-based changes become faster and safer than console modifications.

How do you decompose Terraform state without recreating infrastructure?

Using Terraform state move operations, which migrate resources between state files without destroying and recreating them.

Each resource is moved from the monolithic state to its target domain state file, with a plan validation confirming that no resources change before any move is applied. The infrastructure continues to run throughout the process, with no downtime, service interruptions, or resource recreation. The decomposition is invisible to the running platform.

We executed this for the client incrementally, validating each domain module in isolation before retiring the corresponding section of the original state.

What does enforcing an IaC-only change policy actually require?

IAM restrictions, pipeline enforcement, and a documented break-glass procedure for genuine emergencies.

IAM policies are tightened to remove the engineering team's direct console modification rights on Terraform-managed resources. All infrastructure changes are routed through the CI/CD pipeline, where the plan output is reviewed, and the application is gated on named approval.

Emergency scenarios where speed genuinely outweighs process are handled via a documented break-glass procedure that requires post-incident reconciliation to Terraform state.

Without that procedure, teams will bypass the policy at the first incident, and the governance model collapses.

How do scoped plan/apply approvals improve infrastructure governance?

They create a review gate between intent and execution, making every infrastructure change a deliberate, attributed decision.

Without approval gates, a Terraform apply is a single operation that can be executed unilaterally. With a separate plan and applying the steps, the proposed change is visible to a reviewer before it is executed. The reviewer can confirm the scope, check for unintended changes, and approve or reject with their identity attached to the audit trail.

For this client, this meant every production infrastructure change had a named approver and a pipeline execution record. This governance standard had not previously existed, and the compliance team could reference it directly.

How does Terraform governance align with HIPAA infrastructure requirements?

HIPAA doesn't mandate specific IaC tooling, but it does require that changes to systems that handle PHI be controlled, documented, and auditable.

Terraform with CI/CD-enforced pipelines and approval gates produces exactly that: a change management trail where every infrastructure modification is attributable, reviewable, and timestamped. Scoped domain states mean security-relevant infrastructure VPC configuration, RDS encryption settings, and IAM policies are managed and reviewed independently of unrelated changes.

For this client, the Terraform governance model directly addressed findings from a pre-acquisition infrastructure review that had flagged change management controls as a gap.

We’d love to hear from you

Ready to migrate critical systems without disrupting your business?

Talk to our team about your needs.