Screening Methodology
Complete documentation of how Helena prioritizes classified variants for clinical review. After ACMG classification determines what each variant IS, screening determines which variants to REVIEW FIRST based on clinical relevance to the specific patient.
The screening algorithm evaluates each variant across seven independent dimensions, applies patient-specific clinical profile boosts, and produces a four-tier priority ranking. Every score component is transparent and visible in the results. Screening completes in under one second for typical cases.
Overview
Classification and screening solve different problems. Both are necessary for efficient clinical review.
Classification
Determines what a variant IS. Assigns one of five ACMG categories based on evidence criteria. Documented on the Classification Methodology page.
Screening
Determines which variants to REVIEW FIRST. Ranks all classified variants by clinical relevance to this specific patient using a multi-dimensional scoring algorithm adapted to patient demographics and clinical context.
Why Screening Is Necessary
After classification, a clinician typically faces 50-200+ variants requiring review. Manual review of every variant is impractical. Screening reduces this to 3-20 high-priority candidates (Tier 1) by incorporating clinical context that classification alone does not consider: patient age, sex, ethnicity, family history, clinical phenotype, and the specific screening strategy.
A VUS in a highly constrained gene with strong phenotype match and family history of genetic disease may be more clinically relevant than a Pathogenic variant in an unrelated gene. Screening captures these contextual relationships.
Screening Pipeline
Single-pass pipeline over classified variants. Total processing time under one second for typical cases (100-500 variants).
Load Classified Variants
Pre-classified variants (Pathogenic, Likely Pathogenic, VUS) are loaded with full annotation data. Phenotype tier data is joined when available. Gene panel filters are applied if selected.
Calculate Base Component Scores
Each variant is scored across seven independent dimensions (constraint, deleteriousness, phenotype, dosage, consequence, compound heterozygote, age relevance). Each score is normalized to [0.0, 1.0].
Apply Scoring Weights
Component scores are combined using weights appropriate to the patient's age group and screening mode. All weight sets sum to exactly 1.0 and are validated at runtime.
Calculate Clinical Boosts
Patient-specific context (ACMG class, phenotype match, ethnicity, family history, sex, consanguinity, pregnancy, gene panel) adds additional priority. Total score capped at 1.0.
Assign Priority Tiers
Each variant is assigned to one of four tiers based on boosted score and clinical context. All Pathogenic/Likely Pathogenic variants are guaranteed Tier 1 regardless of base scores.
Export Results
Tiered results are persisted for downstream consumption. Gene-level summaries are exported for summary-first clinical review.
Scoring Components
Each variant is evaluated across seven independent dimensions. All scores are normalized to [0.0, 1.0] for direct comparability.
Gene Constraint
Measures the gene's intolerance to variation using gnomAD constraint metrics. Strategy varies by consequence type: loss-of-function variants are scored using pLI and LOEUF jointly, missense variants use mis_z combined with pLI, and non-coding variants receive a heavily discounted score regardless of gene constraint.
Loss-of-function variants in genes with high pLI and low LOEUF receive the maximum score
Missense variants in genes with high missense constraint (mis_z) and high pLI receive elevated scores
Non-coding consequences (intron, upstream, downstream, synonymous) receive minimal scores - gene constraint is clinically irrelevant for variants that do not affect protein function
Data sources: gnomAD v4.1.0 (pLI, LOEUF, mis_z)
Deleteriousness
Weighted aggregate of eight computational predictors assessing variant deleteriousness. BayesDel_noAF serves as the primary predictor with ClinGen SVI calibrated signal, supplemented by SpliceAI for orthogonal splice impact and six additional predictors for coverage gaps.
BayesDel_noAF is the primary signal, normalized from its native range to [0, 1]
SpliceAI provides orthogonal splice impact prediction independent of missense predictors
AlphaMissense contributes independent protein structure signal derived from AlphaFold
When BayesDel is unavailable, its weight is redistributed proportionally among remaining predictors
Conservation scores (PhyloP, GERP) are normalized from their native ranges to [0, 1]
Data sources: dbNSFP 4.9c (BayesDel_noAF, DANN, SIFT, AlphaMissense, MetaSVM, PhyloP, GERP), Ensembl (SpliceAI)
Phenotype Relevance
Evaluates how well the variant's gene matches the patient's clinical presentation. Operates in two modes depending on whether patient HPO terms are available.
Diagnostic mode (HPO terms provided): computes overlap between patient HPO terms and gene HPO associations
Screening mode (no phenotype): uses gene-disease burden as proxy, capped at 0.5 (full 1.0 is reserved for actual HPO phenotype match). Non-coding variants receive additionally discounted scores.
Data sources: HPO gene-phenotype associations
Dosage Sensitivity
Based on ClinGen haploinsufficiency evidence, applied only to loss-of-function variants.
Only evaluated for loss-of-function consequences (frameshift, stop_gained, splice_donor, splice_acceptor)
Three-tier scoring based on ClinGen evidence level: sufficient, emerging, or limited evidence
Non-LoF variants and genes without dosage data receive zero
Data sources: ClinGen dosage sensitivity (haploinsufficiency_score)
Consequence Severity
Hierarchical severity ranking based on VEP consequence type and transcript biotype.
Non-protein-coding biotypes receive minimal scores regardless of consequence
Hierarchy: frameshift/stop_gained/splice_donor/splice_acceptor > start_lost/stop_lost > inframe > missense > splice_region > UTR > synonymous > intron
Data sources: Ensembl VEP (consequence, biotype)
Compound Heterozygote
Detects potential compound heterozygotes for autosomal recessive conditions by identifying multiple heterozygous coding variants in the same gene.
Only true coding/splicing consequences qualify - intronic, synonymous, and splice_region+intron_variant combinations are excluded (splice_region alone without coding impact is not a true coding consequence)
Variants pre-flagged by the upstream classification pipeline receive the maximum score
Same-gene heterozygous coding variant pairs are detected via efficient pre-grouped gene lookup
Data sources: Pipeline-internal genotype analysis
Age Relevance
Prioritizes age-appropriate disease genes using curated gene lists. Different gene categories are emphasized depending on the patient's age group.
Neonatal: early-onset disease genes and treatable metabolic conditions receive highest priority
Pediatric: childhood-onset genes receive highest priority, early-onset genes slightly lower
Adult: cancer predisposition and cardiac genes receive highest priority
Elderly: cardiac genes receive highest priority; cancer gene priority is reduced
ACMG Secondary Findings genes receive elevated priority across all age groups
Panel genes with age_group_relevance="all" receive minimum 0.6 score regardless of patient age
ClinGen gene-disease validity modulates panel gene scores: Definitive/Strong=1.0, Moderate=0.85, Limited=0.7
Data sources: ACMG Secondary Findings v3.2, curated pediatric and adult gene lists, gene panel metadata (when panel selected)
Deleteriousness Ensemble
Eight-predictor weighted ensemble with BayesDel_noAF as the primary signal. Approved via clinical review HELIX-CR-2026-002.
| Predictor | Signal Type | Contribution | Source |
|---|---|---|---|
| BayesDel_noAF | ClinGen SVI calibrated missense | Primary | dbNSFP 4.9c |
| SpliceAI | Splice impact (4 delta scores) | Major (orthogonal) | Ensembl MANE |
| AlphaMissense | Protein structure (AlphaFold) | Significant | dbNSFP 4.9c |
| DANN | Deep learning pathogenicity | Supporting | dbNSFP 4.9c |
| SIFT | Sequence homology | Supporting | dbNSFP 4.9c |
| MetaSVM | Ensemble meta-predictor | Supporting | dbNSFP 4.9c |
| PhyloP (100-way) | Per-site conservation | Minor | dbNSFP 4.9c |
| GERP++ | Per-element conservation | Minor | dbNSFP 4.9c |
BayesDel_noAF Normalization
BayesDel_noAF scores are linearly normalized from their observed range to [0, 1] before weighting. ClinGen SVI evidence tiers are used for justification text but do not directly determine the screening score - the normalized continuous value is used instead for maximum discrimination.
NULL Handling
When BayesDel_noAF is unavailable for a variant (NULL or NaN), its weight is redistributed proportionally among the remaining seven predictors. This ensures consistent scoring regardless of predictor coverage gaps. Each individual predictor uses safe type conversion with NaN detection.
Weight System
Different clinical contexts require different scoring emphasis. The weight system selects appropriate component weights based on patient age group, screening mode, and phenotype availability. All weight sets sum to exactly 1.0.
Diagnostic
Patient has HPO phenotype terms (any age)
Constraint
StandardDeleteriousness
StandardPhenotype
DominantDosage
LowConsequence
MinimalCompound Het
MinimalAge Relevance
NoneRationale: Phenotype matching drives prioritization when clinical presentation is available. Age relevance is unnecessary because the phenotype itself guides variant selection.
Neonatal / Pediatric
0-18 years, no phenotype
Constraint
HighDeleteriousness
StandardPhenotype
LowDosage
ElevatedConsequence
LowCompound Het
MinimalAge Relevance
ElevatedRationale: Gene constraint is critical in neonates and children. Dosage sensitivity is elevated because LoF in haploinsufficient genes is urgent. Age relevance captures actionable early-onset conditions.
Adult Proactive
18-65 years, no phenotype
Constraint
StandardDeleteriousness
ElevatedPhenotype
LowDosage
LowConsequence
LowCompound Het
MinimalAge Relevance
ElevatedRationale: Deleteriousness predictions are elevated because missense predictions are critical for cancer and cardiac risk. Age relevance captures adult-actionable conditions.
Elderly
65+ years, no phenotype
Constraint
ReducedDeleteriousness
StandardPhenotype
LowDosage
LowConsequence
LowCompound Het
MinimalAge Relevance
HighestRationale: Age relevance receives the highest weight because older patients benefit most from narrowly actionable findings. Constraint is reduced because many genes have already been expressed without clinical phenotype.
Design Principle
Age relevance weight increases with patient age: Elevated for neonatal/pediatric, Elevated for adult, Highest for elderly. This reflects the clinical reality that older patients benefit most from narrowly actionable findings, while younger patients warrant broader screening.
Clinical Profile Boosts
Nine boost categories leverage the complete clinical profile submitted with each screening request. Boosts are added to the base weighted score and can promote variants to higher tiers. Total score is always capped at 1.0.
ACMG Classification
Pathogenic and Likely Pathogenic variants receive priority boosts. VUS variants with strong null variant evidence (PVS1 criterion) also receive an elevated boost.
Activated when: ACMG class is P, LP, or VUS with PVS1
Phenotype Match Tier
Variants in genes with strong phenotype correlation receive priority boosts proportional to the phenotype matching tier.
Activated when: Phenotype Matching Service has been executed
Ethnicity
Population-specific founder mutations receive elevated priority. Supports Ashkenazi Jewish, African, East/South Asian, and European founder variant lists.
Activated when: Patient ethnicity is provided
Family History
Cancer and cardiac predisposition genes receive priority boosts when family history is reported. Additional boost when indication specifically references family history.
Activated when: Family history flag is set
Sex-Linked Inheritance
X-linked disease genes receive priority boosts based on patient sex. Males receive a larger boost due to hemizygous expression.
Activated when: Variant on chromosome X in a recognized X-linked gene
Consanguinity
Homozygous variants receive elevated priority in consanguineous families, reflecting increased likelihood of identical-by-descent inheritance.
Activated when: Consanguinity flag is set
De Novo Proxy
Variants in highly constrained genes receive a proxy de novo boost when trio/duo data is available. This is a prioritization heuristic, not confirmed de novo status.
Activated when: Sample type is trio or duo
Pregnancy / Family Planning
Prenatal actionable genes receive elevated priority for pregnant patients. Carrier screening genes receive priority for family planning.
Activated when: Pregnancy or family planning flag is set
Gene Panel
Panel genes receive priority boosts modulated by ClinGen gene-disease validity level and age-group relevance matching.
Activated when: A gene panel is selected
Boost Interaction
Multiple boosts can apply simultaneously. A variant in BRCA1 for an Ashkenazi Jewish patient with family history of cancer will receive ACMG classification boost, ethnicity boost, and family history boost concurrently. The total boosted score is capped at 1.0 to prevent score inflation.
Tier System
Four-tier priority ranking with two-stage assignment: base tier from component scores, then final tier after applying all clinical boosts.
Tier 1: High Priority - Immediate Review
Variants requiring immediate clinical attention. Includes all Pathogenic and Likely Pathogenic variants regardless of base score, strong phenotype matches, and variants exceeding the high-priority score threshold. Capped at a configurable maximum (default: 20 variants).
Tier 2: Moderate Priority - Monitor
Variants with moderate clinical relevance. Includes Tier 2 phenotype matches and variants with intermediate boosted scores. Warrant review but are less urgent than Tier 1.
Tier 3: Low Priority - Future Consideration
Variants with low but non-trivial clinical relevance. May become significant with additional clinical information or future variant reclassification.
Tier 4: Very Low Priority - Likely Benign
Variants with minimal clinical relevance under current evidence. Excluded from results by default; included only when explicitly requested.
Base Tier Assignment
The base tier uses both the total weighted score and individual component peaks. A variant with an exceptional signal in a single component (e.g., a highly constrained gene or very high deleteriousness) can be promoted to Tier 1 even if other components are moderate. This prevents clinically significant variants from being buried by low scores in irrelevant dimensions.
Final Tier Determination (Post-Boost)
After all clinical boosts are applied, the final tier may differ from the base tier. Key guarantee: all Pathogenic and Likely Pathogenic variants, all VUS with strong null variant evidence (PVS1), and all strong phenotype matches always appear in Tier 1 regardless of their base component scores. ACMG classification takes priority over component-level scoring.
Gene Lists
Curated gene lists drive age relevance scoring, clinical actionability assessment, and ethnicity-specific prioritization.
ACMG Secondary Findings v3.2 (81 genes)
Three categories: Cancer Predisposition (25 genes including APC, BRCA1, BRCA2, MLH1, MSH2, TP53, VHL), Cardiac (34 genes including KCNH2, KCNQ1, MYBPC3, MYH7, SCN5A, LMNA), and Metabolic (8 genes including LDLR, BTD, RPE65). These genes receive elevated age relevance scores across all age groups and are used for clinical actionability assessment.
Source: Miller DT et al. Genet Med. 2023;25(1):100726
Pediatric Gene Lists
Three curated categories: Early-Onset Disease Genes (CFTR, SMN1, GAA, and other genes causing conditions diagnosable at birth), Treatable Metabolic Conditions (PAH, GALT, and other genes where early intervention changes outcomes), and Childhood-Onset Genes (NF1, PKD1, and other genes causing conditions typically presenting in childhood).
Adult-Onset Gene Lists
Two curated categories: Cancer High-Risk (BRCA1, BRCA2, MLH1, MSH2, TP53, PTEN, and other high-penetrance cancer predisposition genes) and Cardiac (KCNH2, MYBPC3, SCN5A, LMNA, and other genes associated with sudden cardiac death or cardiomyopathy).
Population-Specific Founder Variants
Curated founder mutation gene lists for ethnicity-aware prioritization: Ashkenazi Jewish (BRCA1/2, GBA, HEXA, FANCC, BLM, MSH2, MSH6), African ancestry (HBB, G6PD), East/South Asian (ALDH2, CYP2C19, HBA1, HBA2, HBB), and European (BRCA1/2, CFTR).
Screening Modes
Six screening modes determine the weight profile and prioritization strategy.
Diagnostic
Patient has HPO phenotype terms. Phenotype matching dominates scoring. Used when clinical presentation guides interpretation.
Neonatal Screening
Newborn screening (0-28 days). Prioritizes early-onset disease genes, treatable metabolic conditions, and haploinsufficient genes.
Pediatric Screening
Child and adolescent screening (1-18 years). Prioritizes childhood-onset conditions with emphasis on gene constraint and age-relevant genes.
Proactive Adult
Adult health screening (18-65 years). Emphasizes cancer predisposition, cardiac risk genes, and computational deleteriousness predictions.
Carrier Screening
Recessive carrier identification for reproductive risk assessment.
Pharmacogenomics
Drug response screening for medication safety and efficacy.
Age Group Determination
Patient age is converted to one of six age groups: Neonatal (0-28 days), Infant (29 days - 1 year), Child (1-12 years), Adolescent (12-18 years), Adult (18-65 years), Elderly (65+ years). Day-level precision is used for neonatal/infant boundary. Age can be provided in days, years, or both.
Limitations
Screening is a prioritization tool, not a diagnostic tool. It determines review order, not variant pathogenicity. ACMG classification determines pathogenicity.
Compound heterozygote detection is inferred from genotype data without formal phasing. Trio data or long-read sequencing provides definitive confirmation.
De novo boost is a proxy based on gene constraint, not confirmed de novo status. Parental genotype comparison is required for confirmation.
Ethnicity-based boosts use curated founder mutation lists. Population-specific variants outside these lists do not receive ethnicity boosts.
Phenotype scoring without HPO terms uses gene-disease burden as a proxy, which favors well-characterized genes over recently described disease associations.
Age relevance gene lists are curated and may not include all relevant disease genes for each age group. Lists are updated periodically.
Carrier screening and pharmacogenomics modes are not yet weight-differentiated from adult screening. Dedicated weight profiles are planned.
Gene panel boosts are modulated by ClinGen gene-disease validity, which may not be available for all panel genes.
All scores are normalized to [0.0, 1.0]. Raw metric magnitudes are compressed into a relative scale.
Screening results should always be interpreted by a qualified clinical geneticist in the context of the patient's clinical presentation and family history.
Version History
Every methodology change is versioned and documented.
Panel-aware age relevance scoring: panel genes with age_group_relevance="all" receive minimum 0.6 score regardless of patient age
ClinGen gene-disease validity modulates panel gene scores: Definitive/Strong=1.0, Moderate=0.85, Limited=0.7
Compound het detection: splice_region + intron_variant combinations excluded from coding consequences (splice_region alone is not true coding impact)
Phenotype scoring: gene-disease burden capped at 0.5 in screening mode (full 1.0 reserved for actual HPO phenotype match)
Non-coding variants receive discounted phenotype scores proportional to their inability to exploit gene-disease burden
Panel metadata corrections: INS/NEUROD1/LMNA age_group_relevance set to "all", GATA4 ClinGen status corrected to Strong
New sub-panels added: 5 diabetes sub-panels + Comprehensive Diabetes Panel (47 genes)
Initial production release with 7-component scoring system
BayesDel_noAF as primary deleteriousness predictor with ClinGen SVI calibration
8-predictor weighted ensemble for deleteriousness scoring
Age-aware weight profiles: Diagnostic, Neonatal, Pediatric, Adult, Elderly
Clinical profile boosts: ethnicity, family history, sex-linked, consanguinity, pregnancy
Four-tier priority ranking with ACMG classification guarantees
Gene panel boost with ClinGen modulation
References
Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants.
Genetics in Medicine. 2015;17(5):405-424.
PMID: 25741868Pejaver V, Byrne AB, Feng BJ, et al. Calibration of computational tools for missense variant pathogenicity classification and ClinGen recommendations for PP3/BP4 criteria.
Am J Hum Genet. 2022;109(12):2163-2177.
PMID: 36413997Karczewski KJ, Francioli LC, Tiao G, et al. The mutational constraint spectrum quantified from variation in 141,456 humans.
Nature. 2020;581(7809):434-443.
PMID: 32461654Cheng J, Novati G, Pan J, et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense.
Science. 2023;381(6664):eadg7492.
PMID: 37733863Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, et al. Predicting Splicing from Primary Sequence with Deep Learning.
Cell. 2019;176(3):535-548.e24.
PMID: 30661751Miller DT, Lee K, Abul-Husn NS, et al. ACMG SF v3.2: Reducing noise and improving clinical actionability.
Genet Med. 2023;25(1):100726.
PMID: 36344267Kohler S, Gargano M, Matentzoglu N, et al. The Human Phenotype Ontology in 2021.
Nucleic Acids Research. 2021;49(D1):D1207-D1217.
PMID: 33264411Questions About Our Screening Methodology?
We welcome technical questions from clinical geneticists and laboratory directors. Transparency is foundational to clinical trust.