SAS to Snowflake Migration | MigryX Case Study

Executive Summary

A Fortune 500 pharmaceutical and life sciences company with a pipeline spanning oncology, immunology, and rare disease therapeutics faced a defining platform decision: renew a multi-million dollar SAS enterprise agreement covering 3,100 programs used across clinical statistics, pharmacokinetics, regulatory submissions, and commercial analytics — or invest in a migration to a modern cloud-native architecture. The stakes were uniquely high: many of these programs directly supported FDA New Drug Application (NDA) submissions, meaning any migration required not only technical fidelity but documented, auditable validation evidence. MigryX completed the full migration of all 3,100 SAS programs — encompassing 2.8 million lines of SAS/BASE, SAS/STAT, SAS/GRAPH, and SAS macro code — to Snowpark Python and Snowflake SQL over 12 months. The results demonstrated significant improvements across performance, cost, and regulatory compliance: a 7X improvement in analytical pipeline performance, complete elimination of the SAS enterprise license, and $6.8 million in documented savings over three years. Every migrated program was accompanied by a machine-generated validation report traceable to FDA 21 CFR Part 11 requirements.

Client Overview

The client is one of the world's largest pharmaceutical companies, with substantial R&D investment across dozens of active clinical programs. Their biostatistics and data management organization employed several hundred statisticians, programmers, and data managers worldwide, the majority of whom used SAS as their primary analytical platform. The SAS estate had grown organically over 18 years, beginning with clinical trial analysis and expanding to encompass commercial sales analytics, supply chain optimization, and post-market pharmacovigilance reporting.

The SAS platform was hosted on a hybrid infrastructure: on-premise SAS Grid Computing environments for validated clinical programs, and cloud-based SAS Viya installations for commercial analytics. This dual-footprint model created significant operational complexity, as the two environments had diverged in version compatibility, macro library versions, and format catalog management. The decision to migrate to Snowflake was driven by the organization's cloud-first strategy, the approaching end of an 18-year SAS enterprise agreement, and the recognition that Snowflake's elastic compute model was far better suited to the episodic, burst-heavy nature of clinical trial analysis workloads.

Business Challenge

The migration planning team identified the following critical technical and regulatory challenges before engaging MigryX:

SAS/STAT and SAS/GRAPH procedure complexity: The estate contained 847 programs using advanced statistical procedures including PROC MIXED, PROC GENMOD, PROC LIFETEST, PROC PHREG, and PROC SGPLOT. Each procedure carried extensive option sets, BY-group processing semantics, and ODS output configuration that required precise mapping to Snowflake-compatible analytical libraries.
Clinical trial reporting macros: The organization had developed a library of 1,200 custom SAS macros over 18 years, covering CDISC SDTM derivation, ADaM dataset construction, TLF (Tables, Listings, Figures) generation, and protocol deviation flagging. These macros were deeply nested, often calling other macros four to five levels deep, with conditional logic that varied by therapeutic area, study phase, and regulatory region.
FDA validation requirements: Clinical programs used in regulatory submissions are subject to Computer System Validation (CSV) under 21 CFR Part 11. Any migration had to produce documented evidence that the converted programs produce results statistically equivalent to the validated SAS originals, with every validation test traced to a formal test script and reviewed by a qualified statistician.
SAS data formats and informats: The estate used 94 custom SAS formats and informats embedded in format catalogs that were referenced across hundreds of programs. These catalogs were not consistently documented, and several had evolved over years without version control, making their current behavior difficult to reconstruct.
Macro variable scoping and %SYSFUNC dependencies: Many programs relied on SAS macro variable scoping behaviors, %SYSFUNC calls for system date and environment values, and %INCLUDE chains that assembled programs dynamically from fragments. These dynamic assembly patterns required static resolution before conversion could proceed.
SAS/CONNECT and grid processing: 280 programs used SAS/CONNECT for distributed processing across the SAS Grid, submitting remote work units and synchronizing results. Translating this parallel execution model to Snowflake's native parallelism required redesigning job orchestration without breaking analytical equivalence.

The MigryX Approach

MigryX deployed its pharmaceutical-specialized migration methodology, which combines automated AST-based code conversion with a structured validation framework designed explicitly for FDA CSV compliance. The engagement began with a four-week discovery and inventory phase that produced a complete program inventory, macro dependency tree, format catalog reconstruction, and a complexity tier classification for all 3,100 programs.

The MigryX SAS parser operates at the AST level, parsing SAS programs into a rich intermediate representation that captures DATA step operations, PROC calls, macro definitions, macro invocations, and ODS output specifications as discrete, analyzable nodes. This intermediate representation enabled the conversion engine to resolve macro call chains statically — unrolling macro parameters and conditional macro logic into equivalent Python function calls with explicit parameter passing — rather than relying on runtime macro expansion as SAS does.

For DATA step conversion, MigryX mapped SAS's row-by-row PDV (Program Data Vector) processing model to equivalent Snowpark DataFrame operations wherever possible, and to Python row iteration for programs with irreducibly sequential logic such as retained variables, first./last. processing, and POINT= random access. The converter preserved SAS missing value semantics, distinguishing numeric missing (.) from character missing (' ') and handling SAS's arithmetic-on-missing propagation rules in the converted Python code.

Statistical procedure conversion was handled by a library of validated PROC mappings. PROC MIXED was mapped to statsmodels.MixedLM with documented parameter equivalence; PROC LIFETEST to lifelines.KaplanMeierFitter; PROC PHREG to lifelines.CoxPHFitter; and PROC SGPLOT to parameterized Matplotlib/Plotly rendering functions. Each mapping was accompanied by a statistical equivalence validation test run against reference datasets from the FDA's publicly available clinical trial data repository.

The 1,200-macro library was converted through a three-tier strategy. Macros with a purely data manipulation function were converted to Snowpark Python functions with equivalent signatures. Macros implementing CDISC derivation logic were converted to parameterized Snowflake Stored Procedures that could be called from Snowflake Tasks. Macros used only for TLF generation were converted to Python report generation functions using existing validated output templates. The custom SAS formats and informats were converted to Snowflake lookup tables and Python dictionaries, with the conversion documented in a format catalog migration log reviewed by the biostatistics team.

Validation was built into every step. MigryX generated a formal Validation Summary Report (VSR) for each converted program containing: the original SAS program hash, the converted Python program hash, the test dataset used, a statistical comparison of all output variables (row counts, means, standard deviations, min/max values, and distribution tests where applicable), the comparison result, and the reviewer signature field. These VSRs were assembled into a master validation package submitted to the organization's regulatory affairs team for review and approval before each wave of programs was promoted to the validated production environment.

Migration Architecture

Dimension	Before (SAS)	After (Snowflake + Snowpark)
Analytics runtime	SAS Grid (on-premise, 24 nodes) + SAS Viya (cloud)	Snowflake virtual warehouses (auto-scaling, serverless)
Statistical procedures	SAS/STAT, SAS/GRAPH, SAS/ETS procedures	Snowpark Python + statsmodels + lifelines + Matplotlib
Macro library	1,200 custom SAS macros (global catalog)	Python function library + Snowflake Stored Procedures
Data formats	94 custom format catalogs (binary .sas7bcat files)	Snowflake lookup tables + Python dictionaries
Orchestration	SAS Grid + LSF scheduler + shell scripts	Snowflake Tasks DAG + Airflow (for cross-system flows)
Output / reporting	ODS RTF/PDF, SAS datasets (.sas7bdat)	Snowflake tables + Python report renderer (RTF/PDF)
Validation framework	Manual double-programmer verification	MigryX automated VSR + statistical equivalence tests
Licensing cost	$2.1M/year SAS enterprise agreement	$0 (Snowflake compute consumption only)

Key Migration Highlights
3,100 SAS programs converted: The complete clinical, commercial, and pharmacovigilance SAS estate migrated in 12 months across six domain-based waves.
2.8 million lines of SAS code: MigryX processed 2.8M LOC spanning SAS/BASE, SAS/STAT, SAS/GRAPH, SAS/ETS, and SAS macro language — all with an 86% fully automated conversion rate.
1,200-macro library fully migrated: Every macro in the organization's 18-year macro library was converted, documented, and validated before decommissioning the SAS macro catalog.
FDA-traceable validation on 100% of programs: Machine-generated Validation Summary Reports were produced for all 3,100 programs, reviewed and approved by the biostatistics QA team, and archived in the company's regulatory document management system.
SAS Grid decommissioned on schedule: The 24-node on-premise SAS Grid was powered down at month 13, recovering significant data center floor space and infrastructure maintenance costs.
Zero regulatory submission delays: Three NDA submissions scheduled during the migration were completed on time using migrated Snowflake programs, with validation evidence that met the organization's regulatory affairs requirements.

Security & Compliance

The pharmaceutical company operates under FDA 21 CFR Part 11 (electronic records and signatures), ICH E6(R2) Good Clinical Practice guidelines, GDPR for European patient data, and HIPAA for US clinical data. Each of these frameworks imposed specific requirements on the migrated Snowflake environment that MigryX addressed systematically during the migration design phase.

For 21 CFR Part 11 compliance, Snowflake's audit trail capabilities were configured to log all data access, modification, and deletion events to an immutable Snowflake event table. User authentication was enforced via SAML SSO with MFA, and Snowflake's time-travel and fail-safe features were configured to retain a full 90-day history of all clinical data tables, enabling reconstruction of any analytical dataset to the exact state used in an NDA submission. Snowflake's column-level security was used to enforce data access segregation by study, therapeutic area, and regulatory region.

Patient data de-identification was implemented using Snowflake Dynamic Data Masking policies, ensuring that production patient identifiers are never exposed to analytical processing environments. Tokenized patient IDs, derived during the SDTM build process, flow through all downstream ADaM and TLF pipelines without ever re-exposing the original source identifiers. The organization's data governance team reviewed and approved the full access control matrix before the first production wave was promoted.

Results & Business Impact

Measured over the six months following completion of the final migration wave, the following outcomes were documented and reviewed by the organization's executive technology steering committee:

3,100

SAS Programs Migrated

2.8M

Lines of Code Converted

Pipeline Performance Improvement

$6.8M

Savings Over 3 Years

86%

Automated Conversion Rate

12 mo

Total Migration Duration

The 7X performance improvement had concrete clinical implications. The end-of-study statistical analysis package for a pivotal Phase III oncology trial, which previously required 19 hours of SAS Grid processing, now completes in under 3 hours on Snowflake. This acceleration compresses the critical path between database lock and NDA submission by days — days that have direct value when the submission deadline is driven by FDA PDUFA dates. The pharmacovigilance signal detection pipeline, which aggregates adverse event data from multiple global safety databases, now runs every 4 hours versus its previous 48-hour batch cycle, giving the drug safety team substantially earlier visibility into emerging safety signals.

Organizationally, the migration has begun to reshape the talent profile of the biostatistics and data management function. Statistical programmers who previously spent the majority of their time maintaining legacy SAS infrastructure are now contributing to new capabilities in Snowflake — automated real-world evidence studies, Bayesian adaptive trial simulations, and patient subgroup identification using Snowflake Cortex ML — that were significantly more practical on the new platform.

"Migrating clinical SAS programs is not like migrating a typical analytics workload. The bar for statistical equivalence is set by the FDA, not by the engineering team. MigryX's validation framework was the only approach we evaluated that could produce the documented evidence chain our regulatory affairs team required. We now have a Snowflake platform that runs our trials faster than our SAS Grid ever did, and an audit trail that is actually stronger than what we had before."

— Executive Director, Biostatistics & Data Management, Fortune 500 Pharma

Ready to Modernize Your SAS Estate?

See how MigryX can accelerate your migration to Snowflake.

Explore Snowflake Migration →

Validated at Scale: Migrating 3,100 Clinical SAS Programs to Snowflake for a Fortune 500 Pharma Company