EHR Platform

Building a scalable and compliant data pipeline with Fivetran

Analytics workloads and production databases make poor neighbors, especially when the database stores 12TB of regulated PHI. Direct reporting queries degrade application performance, create uncontrolled data access patterns, and introduce compliance exposure that auditors will find. How do you build the analytics capability your business needs without putting production stability or data governance at risk?

sales@itsyndicate.org

Analytics without production risk

EMR/EHR platform aggregating operational, billing, and patient interaction data across thousands of medical practices needed to scale its analytics capabilities. The existing approach of running reporting jobs directly against the production database was creating measurable performance pressure on a system with strict uptime requirements. In a HIPAA-regulated environment, it also meant PHI was being accessed through channels that were difficult to audit or control. The platform needed a modern ingestion architecture that isolated BI workloads entirely from production systems and gave the data and compliance teams a foundation they could actually govern.

Quick facts

EHR Platform

Private medical practices in the USA

Cloud-based EMR/EHR platform for private medical practices in the USA, handling PHI data under HIPAA across scheduling, charting, billing, and telehealth.

0% load

Production load reduced to zero

BI workloads were fully isolated from the operational database. Analytics queries no longer compete with application workloads; ingestion runs on a dedicated read replica through a private network path, with no exposure to the production primary.

Fivetran + AWS VPC

Structured incremental sync via a private VPC endpoint means data moves from source to destination without traversing the public internet or touching production write capacity. Controlled ingestion replaces ad-hoc querying entirely.

"The new data pipeline allowed us to scale analytics without touching production performance."

Sarah Mitchell

VP Engineering, US Healthcare EHR Platform

What we did for the EHR Platform

Designing a secure data ingestion architecture

Running analytics queries against a production database that stores PHI is a compounding problem; every new dashboard, every new report, every new stakeholder request adds load to a system that exists to serve live clinical workloads. It also creates an access-pattern problem: direct production queries are difficult to scope, log, and provide evidence under audit. Before we configured a single connector, we needed to understand the full picture of how data was moving and where the compliance and performance boundaries were being crossed.

Analytics flow audit and architecture design: We audited the existing reporting and BI setup, mapping every query pattern, scheduled job, and ad-hoc access path that touched the production database. The audit confirmed what the performance data suggested: analytics workloads were creating contention during peak clinical hours, and the access patterns were inconsistent with what a HIPAA audit would expect to see. We selected Fivetran as the ingestion layer based on its structured incremental sync model, clear state tracking, and predictable ingestion behavior properties that matter in a regulated environment where data lineage needs to be demonstrable.
Read replica and VPC endpoint architecture: Rather than allowing Fivetran to connect to the production primary, we designed the ingestion architecture around a dedicated read replica as the data source. Private connectivity was implemented via AWS VPC Endpoint, ensuring data never traversed the public internet between source and destination. This architecture eliminated production write contention from analytics workloads entirely and established a network boundary that could be documented and evidenced for compliance purposes. We proactively recommended this topology over a simpler direct-connection approach; the added configuration cost was minimal, and the compliance and performance benefits were not.

Implementing scalable data infrastructure

A secure ingestion architecture is only useful if it delivers data reliably, predictably, and at the scale the business actually needs. The platform's analytics requirements were growing, with more dashboards, more internal reporting, and more operational visibility across billing and scheduling. The infrastructure needed to support that growth without requiring manual intervention or architectural changes each time a new data source or consumer was added.

Fivetran configuration and controlled sync schedules: We configured Fivetran to pull from the read replica on defined sync schedules, replacing ad-hoc query patterns with structured, incremental ingestion cycles. Sync schedules were tuned to balance data freshness requirements against replica load, and consumers received near-current data without sync jobs creating their own performance pressure. Data landed in the client's existing Redshift and S3 data lake, with encryption enforced in transit via TLS and at rest via AWS KMS across both destinations.
PHI handling validation and compliance alignment: With ingestion running through a controlled path, we validated the full data flow against HIPAA PHI-handling requirements, confirming encryption coverage, access control boundaries, and audit log completeness at each stage, from replica to Redshift. The structured ingestion model that Fivetran provided replaced an uncontrolled pattern of direct queries with a documented, auditable pipeline. Data governance moved from a gap in the platform's compliance posture to a demonstrable control that the compliance team could reference directly in audit documentation.

Healthcare data pipeline: FAQ

Why is querying a production database directly a compliance risk in a HIPAA environment?

Because uncontrolled access patterns are difficult to scope, log, and evidence under audit.

HIPAA requires that access to PHI is purposeful, logged, and limited to the minimum necessary. Ad-hoc analytics queries against a production database rarely meet that bar; they're often broad, inconsistently logged, and run by users whose access was granted for operational rather than analytical purposes.

A dedicated ingestion pipeline through a controlled read replica creates a defined, auditable access path that maps cleanly to HIPAA's minimum necessary and audit control requirements.

Why Fivetran over a custom ingestion pipeline?

Because structured incremental sync with clear state tracking reduces operational risk in regulated environments.

A custom pipeline can ingest data, but it requires ongoing maintenance, failure handling, and state management, all of which create operational overhead and failure modes that need to be monitored.

Fivetran provides incremental sync that resumes correctly after interruption, connector-level observability, and predictable ingestion behavior that can be scheduled and evidenced. For a team that needed analytics infrastructure without adding data engineering headcount, it was the right fit.

What is a VPC endpoint and why does it matter for PHI data pipelines?

It creates a private network path between AWS services that never touches the public internet.

Without a VPC endpoint, traffic between your database and ingestion layer routes through the public internet by default, even if both are on AWS. For PHI data, that routing introduces exposure that HIPAA's transmission security requirements specifically address.

A VPC endpoint keeps the data path entirely within the AWS network, removing a category of transit risk and simplifying the network controls you need to document for compliance.

How do read replicas isolate analytics workloads from production performance?

By serving analytics queries from a separate database instance that receives changes asynchronously from the primary.

A read replica maintains a copy of the production database that stays current through replication, but carries none of the write load and none of the application query traffic. Directing Fivetran at a read replica means ingestion jobs, which can be resource-intensive at a 12TB database scale run against an instance that exists specifically to absorb that load. Production query performance is unaffected regardless of ingestion schedule or data volume.

What does a compliant healthcare data pipeline look like end-to-end?

Source isolation, private transit, encrypted storage, controlled access, and an audit trail at every stage.

For this client, that meant: Fivetran pulling from a dedicated read replica rather than the production primary; private connectivity via AWS VPC Endpoint; TLS encryption in transit; AWS KMS encryption at rest in both Redshift and S3; IAM-scoped access to the destination data stores; and Fivetran's connector logs feeding into the platform's existing audit infrastructure.

Each of these controls addresses a specific HIPAA requirement and produces an artifact that the compliance team can reference. The architecture was designed so that every data movement decision could be explained and evidenced, not just implemented.

We’d love to hear from you

Ready to migrate critical systems without disrupting your business?

Talk to our team about your needs.