An article published in the American Journal of Medical Genetics explores how clinical genetic data can be leveraged to estimate genetic disorder prevalence and map variants to local geographies. As genetic testing has expanded, healthcare systems collect expansive genetic data that often needs to be utilized beyond individual patient diagnosis. This research demonstrates groundbreaking methodologies to extract population-level insights from these data.
In recent years, genetic testing has risen exponentially. Drivers include reduced sequencing costs, expanded screening recommendations, and the growth of commercial testing options. This testing explosion generates tremendous amounts of genetic data, typically stored unstructured in electronic health records (EHRs), without the ability to analyze the aggregate data. While individual tests provide diagnosis, the cumulative data offers untapped potential for population health analysis of disease prevalence, treatment patterns, and health disparities.
The geographic distribution of genetic disorders is highly relevant to population health management. Certain conditions cluster regionally due to founder effects in ancestral populations and cultural endogamy. For example, Zellweger Spectrum Disorder (ZSD), a recessive disorder causing neurological impairment, is enriched in Hispanic farmworker communities in California due to a single damaging founder mutation. Understanding the presence and prevalence of genetic diseases regionally equips healthcare systems to better serve the needs of their specific patient populations through tailored screening programs, diagnostic capabilities, specialty care access, and community education partnerships.
Promise of Accumulated Genetic Data
While growth in genetic testing output is apparent, healthcare systems often need more capabilities to synthesize the data for public health insights. Genetic data overwhelming resides in EHRs without structured fields amenable to analysis, severely limiting utility. In addition, commercial testing fragmentation across different send-out laboratories impedes aggregating data for analysis. Unlocking the potential of amassed genetic data represents the next frontier to realize the true promise of precision medicine.
Researchers at Valley Children's Hospital (VCH) analyzed five years of clinical genetic testing data from their Central Valley/Central Coast California patient population using AI and data visualization methodologies to quantify local disease prevalence and map variants geographically. This case study demonstrates methodologies widely replicable across healthcare settings.
Researchers then match variants to electronic health record demographic data for 98% of patients. Linking molecular data to clinical metadata enables analysis by disease inheritance patterns and geography, which is not possible using test results alone.
Microsoft PowerBI data visualization technology maps variants and creates an interactive choropleth linking genetic data to ZIP codes. User filters display variants by gene, zygosity, inheritance patterns, etc., enabling ad hoc segmentation to research questions and normalizing homozygous variants by population-localized areas with increased recessive disease risk.
Researchers identified the minimal prevalence of 739 genetic diseases locally, providing the first granular disease estimates tailored to a healthcare system's patient population. They uncovered patients eligible for recently approved therapies and significant reinterpretation of variant pathogenicity—the database updates in near real-time as more genetic testing accrues.
Methodology with Valley Children's Data
The VCH dataset comprises over 3000 variants from 3,065 patients over five years. Testing modalities include single genes, gene panels, exome sequencing, and chromosomal microarrays. Nearly a quarter of patients have more than one report. About 80% of variants are single nucleotide or indel changes, with 20% structural or copy number changes.
Franklin AI reinterprets 88% of variants, changing classification to 24%. Over half of reclassified variants are downgraded from pathogenic, demonstrating the importance of routine reanalysis as genetic knowledge evolves.
Researchers match molecular variants to patient electronic health records with demographic information for 98% of cases. Linking molecular profiles to clinical metadata enables powerful population health insights that are impossible from testing reports alone.
Key Findings
Research methodology determines a minimum prevalence estimate of one in 625 patients at VCH with a genetic condition over five years, enabling the first quantification of disease burden tailored to a health system's regional population. Further segmentation by variant inheritance patterns estimates over 350 patients with autosomal dominant disorders, over 100 with autosomal recessive conditions, and dozens with X-linked diseases. This framework can be replicated across other healthcare settings to precisely describe the landscape of genetic diseases specific to whom they serve – information unavailable from national statistics.
In addition to quantifying disease cases, mapping molecular data and clinical metadata together reveals geographic disease clusters due to founder mutations. The analysis uncovered several patients homozygous for the same pathogenic PEX6 variant causing ZSD within a 25-mile radius, reflecting enrichment in the local Hispanic farmworker community. Researchers collaborated with community partners to offer expanded prenatal screening for this known founder mutation. Heatmaps also identify recessive risk hotspots, including agricultural areas with high consanguinity. Characterizing the genetic disease profile of local communities provides critical inputs to targeted population health interventions to diagnose and prevent disease.
The structured reanalysis methodology also facilitates identifying patients diagnosed with diseases now potentially treatable with newly available precision therapies that have been lost to follow-up. Researchers uncovered several patients with rare autism-related neurodevelopmental disorders now modifiable with recently Food and Drug Administration (FDA)-approved medications who have had no recent specialty care. Mining genetic databases reconnected these patients to start transformative therapies relieving disease burdens. Maintaining and analyzing structured genetic data ensures that health systems can match patients to timely interventions.
The automated pipeline analyzes accumulated historical testing data and updates clinical recommendations for patients and families based on evolving evidence. Franklin AI reanalysis changes interpretation for 23% of variants, with over half downgraded from initially pathogenic classifications. Twenty-eight incidental finding variants were reclassified as pathogenic or likely pathogenic, prompting updated screening cascades for those patients. As genetic knowledge rapidly accrues, structured reinterpretation ensures that historical testing data translates to current clinical decision-making so that patients receive contemporary, evidence-based recommendations.
Future Outlook
The potential of accumulated clinical genetic data has only begun to be unlocked. As testing volumes and resultant data continue expanding, methodologies piloted at VCH could be replicated across healthcare settings to empower population health management. Efforts are underway to facilitate automatic data feeds from genetic testing laboratories into aggregated clinical data warehouses equipped with AI interpretation pipelines. Configuring clinical decision support tools within EHRs based on genetic indicators can trigger personalized care recommendations tailored to a patient's molecular profile.
Extrapolating beyond individual institutions, statewide or national consortiums pooling aggregated genetic data resources could enable insights into geographic and demographic trends at an even larger scale. Databanks combining gene variant data with deep clinical and claims metadata could provide invaluable surveillance capability to track the uptake of new therapies, monitor disease incidence, and guide research into health disparities.
Realizing precision medicine requires realizing the potential of precision data. VCH's approach provides a blueprint for healthcare organizations to transform genetic testing output into actionable insights to better detect, manage, and prevent disease in their communities. What is often said about genomic medicine – that data sharing is crucial to maximizing its potential – applies equally to the data science unlocking these clinical genomics resources. Collaborative, thoughtful analysis methods ensure that genetic data fulfills its public health promise.