Helena

Screening Methodology

Screening Enginev2.1.0|Updated March 2026

Complete documentation of how Helena prioritizes classified variants for clinical review. After ACMG classification determines what each variant IS, screening determines which variants to REVIEW FIRST based on clinical relevance to the specific patient.

The screening algorithm evaluates each variant across seven independent dimensions, applies patient-specific clinical profile boosts, and produces a four-tier priority ranking. Every score component is transparent and visible in the results. Screening completes in under one second for typical cases.

Overview

Classification and screening solve different problems. Both are necessary for efficient clinical review.

Classification

Determines what a variant IS. Assigns one of five ACMG categories based on evidence criteria. Documented on the Classification Methodology page.

Screening

Determines which variants to REVIEW FIRST. Ranks all classified variants by clinical relevance to this specific patient using a multi-dimensional scoring algorithm adapted to patient demographics and clinical context.

Why Screening Is Necessary

After classification, a clinician typically faces 50-200+ variants requiring review. Manual review of every variant is impractical. Screening reduces this to 3-20 high-priority candidates (Tier 1) by incorporating clinical context that classification alone does not consider: patient age, sex, ethnicity, family history, clinical phenotype, and the specific screening strategy.

A VUS in a highly constrained gene with strong phenotype match and family history of genetic disease may be more clinically relevant than a Pathogenic variant in an unrelated gene. Screening captures these contextual relationships.

Screening Pipeline

Single-pass pipeline over classified variants. Total processing time under one second for typical cases (100-500 variants).

1

Load Classified Variants

Pre-classified variants (Pathogenic, Likely Pathogenic, VUS) are loaded with full annotation data. Phenotype tier data is joined when available. Gene panel filters are applied if selected.

2

Calculate Base Component Scores

Each variant is scored across seven independent dimensions (constraint, deleteriousness, phenotype, dosage, consequence, compound heterozygote, age relevance). Each score is normalized to [0.0, 1.0].

3

Apply Scoring Weights

Component scores are combined using weights appropriate to the patient's age group and screening mode. All weight sets sum to exactly 1.0 and are validated at runtime.

4

Calculate Clinical Boosts

Patient-specific context (ACMG class, phenotype match, ethnicity, family history, sex, consanguinity, pregnancy, gene panel) adds additional priority. Total score capped at 1.0.

5

Assign Priority Tiers

Each variant is assigned to one of four tiers based on boosted score and clinical context. All Pathogenic/Likely Pathogenic variants are guaranteed Tier 1 regardless of base scores.

6

Export Results

Tiered results are persisted for downstream consumption. Gene-level summaries are exported for summary-first clinical review.

Scoring Components

Each variant is evaluated across seven independent dimensions. All scores are normalized to [0.0, 1.0] for direct comparability.

Gene Constraint

Measures the gene's intolerance to variation using gnomAD constraint metrics. Strategy varies by consequence type: loss-of-function variants are scored using pLI and LOEUF jointly, missense variants use mis_z combined with pLI, and non-coding variants receive a heavily discounted score regardless of gene constraint.

Loss-of-function variants in genes with high pLI and low LOEUF receive the maximum score

Missense variants in genes with high missense constraint (mis_z) and high pLI receive elevated scores

Non-coding consequences (intron, upstream, downstream, synonymous) receive minimal scores - gene constraint is clinically irrelevant for variants that do not affect protein function

Data sources: gnomAD v4.1.0 (pLI, LOEUF, mis_z)

Deleteriousness

Weighted aggregate of eight computational predictors assessing variant deleteriousness. BayesDel_noAF serves as the primary predictor with ClinGen SVI calibrated signal, supplemented by SpliceAI for orthogonal splice impact and six additional predictors for coverage gaps.

BayesDel_noAF is the primary signal, normalized from its native range to [0, 1]

SpliceAI provides orthogonal splice impact prediction independent of missense predictors

AlphaMissense contributes independent protein structure signal derived from AlphaFold

When BayesDel is unavailable, its weight is redistributed proportionally among remaining predictors

Conservation scores (PhyloP, GERP) are normalized from their native ranges to [0, 1]

Data sources: dbNSFP 4.9c (BayesDel_noAF, DANN, SIFT, AlphaMissense, MetaSVM, PhyloP, GERP), Ensembl (SpliceAI)

Phenotype Relevance

Evaluates how well the variant's gene matches the patient's clinical presentation. Operates in two modes depending on whether patient HPO terms are available.

Diagnostic mode (HPO terms provided): computes overlap between patient HPO terms and gene HPO associations

Screening mode (no phenotype): uses gene-disease burden as proxy, capped at 0.5 (full 1.0 is reserved for actual HPO phenotype match). Non-coding variants receive additionally discounted scores.

Data sources: HPO gene-phenotype associations

Dosage Sensitivity

Based on ClinGen haploinsufficiency evidence, applied only to loss-of-function variants.

Only evaluated for loss-of-function consequences (frameshift, stop_gained, splice_donor, splice_acceptor)

Three-tier scoring based on ClinGen evidence level: sufficient, emerging, or limited evidence

Non-LoF variants and genes without dosage data receive zero

Data sources: ClinGen dosage sensitivity (haploinsufficiency_score)

Consequence Severity

Hierarchical severity ranking based on VEP consequence type and transcript biotype.

Non-protein-coding biotypes receive minimal scores regardless of consequence

Hierarchy: frameshift/stop_gained/splice_donor/splice_acceptor > start_lost/stop_lost > inframe > missense > splice_region > UTR > synonymous > intron

Data sources: Ensembl VEP (consequence, biotype)

Compound Heterozygote

Detects potential compound heterozygotes for autosomal recessive conditions by identifying multiple heterozygous coding variants in the same gene.

Only true coding/splicing consequences qualify - intronic, synonymous, and splice_region+intron_variant combinations are excluded (splice_region alone without coding impact is not a true coding consequence)

Variants pre-flagged by the upstream classification pipeline receive the maximum score

Same-gene heterozygous coding variant pairs are detected via efficient pre-grouped gene lookup

Data sources: Pipeline-internal genotype analysis

Age Relevance

Prioritizes age-appropriate disease genes using curated gene lists. Different gene categories are emphasized depending on the patient's age group.

Neonatal: early-onset disease genes and treatable metabolic conditions receive highest priority

Pediatric: childhood-onset genes receive highest priority, early-onset genes slightly lower

Adult: cancer predisposition and cardiac genes receive highest priority

Elderly: cardiac genes receive highest priority; cancer gene priority is reduced

ACMG Secondary Findings genes receive elevated priority across all age groups

Panel genes with age_group_relevance="all" receive minimum 0.6 score regardless of patient age

ClinGen gene-disease validity modulates panel gene scores: Definitive/Strong=1.0, Moderate=0.85, Limited=0.7

Data sources: ACMG Secondary Findings v3.2, curated pediatric and adult gene lists, gene panel metadata (when panel selected)

Deleteriousness Ensemble

Eight-predictor weighted ensemble with BayesDel_noAF as the primary signal. Approved via clinical review HELIX-CR-2026-002.

PredictorSignal TypeContributionSource
BayesDel_noAFClinGen SVI calibrated missensePrimarydbNSFP 4.9c
SpliceAISplice impact (4 delta scores)Major (orthogonal)Ensembl MANE
AlphaMissenseProtein structure (AlphaFold)SignificantdbNSFP 4.9c
DANNDeep learning pathogenicitySupportingdbNSFP 4.9c
SIFTSequence homologySupportingdbNSFP 4.9c
MetaSVMEnsemble meta-predictorSupportingdbNSFP 4.9c
PhyloP (100-way)Per-site conservationMinordbNSFP 4.9c
GERP++Per-element conservationMinordbNSFP 4.9c

BayesDel_noAF Normalization

BayesDel_noAF scores are linearly normalized from their observed range to [0, 1] before weighting. ClinGen SVI evidence tiers are used for justification text but do not directly determine the screening score - the normalized continuous value is used instead for maximum discrimination.

NULL Handling

When BayesDel_noAF is unavailable for a variant (NULL or NaN), its weight is redistributed proportionally among the remaining seven predictors. This ensures consistent scoring regardless of predictor coverage gaps. Each individual predictor uses safe type conversion with NaN detection.

Weight System

Different clinical contexts require different scoring emphasis. The weight system selects appropriate component weights based on patient age group, screening mode, and phenotype availability. All weight sets sum to exactly 1.0.

Diagnostic

Patient has HPO phenotype terms (any age)

Constraint

Standard

Deleteriousness

Standard

Phenotype

Dominant

Dosage

Low

Consequence

Minimal

Compound Het

Minimal

Age Relevance

None

Rationale: Phenotype matching drives prioritization when clinical presentation is available. Age relevance is unnecessary because the phenotype itself guides variant selection.

Neonatal / Pediatric

0-18 years, no phenotype

Constraint

High

Deleteriousness

Standard

Phenotype

Low

Dosage

Elevated

Consequence

Low

Compound Het

Minimal

Age Relevance

Elevated

Rationale: Gene constraint is critical in neonates and children. Dosage sensitivity is elevated because LoF in haploinsufficient genes is urgent. Age relevance captures actionable early-onset conditions.

Adult Proactive

18-65 years, no phenotype

Constraint

Standard

Deleteriousness

Elevated

Phenotype

Low

Dosage

Low

Consequence

Low

Compound Het

Minimal

Age Relevance

Elevated

Rationale: Deleteriousness predictions are elevated because missense predictions are critical for cancer and cardiac risk. Age relevance captures adult-actionable conditions.

Elderly

65+ years, no phenotype

Constraint

Reduced

Deleteriousness

Standard

Phenotype

Low

Dosage

Low

Consequence

Low

Compound Het

Minimal

Age Relevance

Highest

Rationale: Age relevance receives the highest weight because older patients benefit most from narrowly actionable findings. Constraint is reduced because many genes have already been expressed without clinical phenotype.

Design Principle

Age relevance weight increases with patient age: Elevated for neonatal/pediatric, Elevated for adult, Highest for elderly. This reflects the clinical reality that older patients benefit most from narrowly actionable findings, while younger patients warrant broader screening.

Clinical Profile Boosts

Nine boost categories leverage the complete clinical profile submitted with each screening request. Boosts are added to the base weighted score and can promote variants to higher tiers. Total score is always capped at 1.0.

ACMG Classification

Pathogenic and Likely Pathogenic variants receive priority boosts. VUS variants with strong null variant evidence (PVS1 criterion) also receive an elevated boost.

Activated when: ACMG class is P, LP, or VUS with PVS1

Phenotype Match Tier

Variants in genes with strong phenotype correlation receive priority boosts proportional to the phenotype matching tier.

Activated when: Phenotype Matching Service has been executed

Ethnicity

Population-specific founder mutations receive elevated priority. Supports Ashkenazi Jewish, African, East/South Asian, and European founder variant lists.

Activated when: Patient ethnicity is provided

Family History

Cancer and cardiac predisposition genes receive priority boosts when family history is reported. Additional boost when indication specifically references family history.

Activated when: Family history flag is set

Sex-Linked Inheritance

X-linked disease genes receive priority boosts based on patient sex. Males receive a larger boost due to hemizygous expression.

Activated when: Variant on chromosome X in a recognized X-linked gene

Consanguinity

Homozygous variants receive elevated priority in consanguineous families, reflecting increased likelihood of identical-by-descent inheritance.

Activated when: Consanguinity flag is set

De Novo Proxy

Variants in highly constrained genes receive a proxy de novo boost when trio/duo data is available. This is a prioritization heuristic, not confirmed de novo status.

Activated when: Sample type is trio or duo

Pregnancy / Family Planning

Prenatal actionable genes receive elevated priority for pregnant patients. Carrier screening genes receive priority for family planning.

Activated when: Pregnancy or family planning flag is set

Gene Panel

Panel genes receive priority boosts modulated by ClinGen gene-disease validity level and age-group relevance matching.

Activated when: A gene panel is selected

Boost Interaction

Multiple boosts can apply simultaneously. A variant in BRCA1 for an Ashkenazi Jewish patient with family history of cancer will receive ACMG classification boost, ethnicity boost, and family history boost concurrently. The total boosted score is capped at 1.0 to prevent score inflation.

Tier System

Four-tier priority ranking with two-stage assignment: base tier from component scores, then final tier after applying all clinical boosts.

Tier 1: High Priority - Immediate Review

Variants requiring immediate clinical attention. Includes all Pathogenic and Likely Pathogenic variants regardless of base score, strong phenotype matches, and variants exceeding the high-priority score threshold. Capped at a configurable maximum (default: 20 variants).

Tier 2: Moderate Priority - Monitor

Variants with moderate clinical relevance. Includes Tier 2 phenotype matches and variants with intermediate boosted scores. Warrant review but are less urgent than Tier 1.

Tier 3: Low Priority - Future Consideration

Variants with low but non-trivial clinical relevance. May become significant with additional clinical information or future variant reclassification.

Tier 4: Very Low Priority - Likely Benign

Variants with minimal clinical relevance under current evidence. Excluded from results by default; included only when explicitly requested.

Base Tier Assignment

The base tier uses both the total weighted score and individual component peaks. A variant with an exceptional signal in a single component (e.g., a highly constrained gene or very high deleteriousness) can be promoted to Tier 1 even if other components are moderate. This prevents clinically significant variants from being buried by low scores in irrelevant dimensions.

Final Tier Determination (Post-Boost)

After all clinical boosts are applied, the final tier may differ from the base tier. Key guarantee: all Pathogenic and Likely Pathogenic variants, all VUS with strong null variant evidence (PVS1), and all strong phenotype matches always appear in Tier 1 regardless of their base component scores. ACMG classification takes priority over component-level scoring.

Gene Lists

Curated gene lists drive age relevance scoring, clinical actionability assessment, and ethnicity-specific prioritization.

ACMG Secondary Findings v3.2 (81 genes)

Three categories: Cancer Predisposition (25 genes including APC, BRCA1, BRCA2, MLH1, MSH2, TP53, VHL), Cardiac (34 genes including KCNH2, KCNQ1, MYBPC3, MYH7, SCN5A, LMNA), and Metabolic (8 genes including LDLR, BTD, RPE65). These genes receive elevated age relevance scores across all age groups and are used for clinical actionability assessment.

Source: Miller DT et al. Genet Med. 2023;25(1):100726

Pediatric Gene Lists

Three curated categories: Early-Onset Disease Genes (CFTR, SMN1, GAA, and other genes causing conditions diagnosable at birth), Treatable Metabolic Conditions (PAH, GALT, and other genes where early intervention changes outcomes), and Childhood-Onset Genes (NF1, PKD1, and other genes causing conditions typically presenting in childhood).

Adult-Onset Gene Lists

Two curated categories: Cancer High-Risk (BRCA1, BRCA2, MLH1, MSH2, TP53, PTEN, and other high-penetrance cancer predisposition genes) and Cardiac (KCNH2, MYBPC3, SCN5A, LMNA, and other genes associated with sudden cardiac death or cardiomyopathy).

Population-Specific Founder Variants

Curated founder mutation gene lists for ethnicity-aware prioritization: Ashkenazi Jewish (BRCA1/2, GBA, HEXA, FANCC, BLM, MSH2, MSH6), African ancestry (HBB, G6PD), East/South Asian (ALDH2, CYP2C19, HBA1, HBA2, HBB), and European (BRCA1/2, CFTR).

Screening Modes

Six screening modes determine the weight profile and prioritization strategy.

Diagnostic

Patient has HPO phenotype terms. Phenotype matching dominates scoring. Used when clinical presentation guides interpretation.

Neonatal Screening

Newborn screening (0-28 days). Prioritizes early-onset disease genes, treatable metabolic conditions, and haploinsufficient genes.

Pediatric Screening

Child and adolescent screening (1-18 years). Prioritizes childhood-onset conditions with emphasis on gene constraint and age-relevant genes.

Proactive Adult

Adult health screening (18-65 years). Emphasizes cancer predisposition, cardiac risk genes, and computational deleteriousness predictions.

Carrier Screening

Recessive carrier identification for reproductive risk assessment.

Pharmacogenomics

Drug response screening for medication safety and efficacy.

Age Group Determination

Patient age is converted to one of six age groups: Neonatal (0-28 days), Infant (29 days - 1 year), Child (1-12 years), Adolescent (12-18 years), Adult (18-65 years), Elderly (65+ years). Day-level precision is used for neonatal/infant boundary. Age can be provided in days, years, or both.

Limitations

Screening is a prioritization tool, not a diagnostic tool. It determines review order, not variant pathogenicity. ACMG classification determines pathogenicity.

Compound heterozygote detection is inferred from genotype data without formal phasing. Trio data or long-read sequencing provides definitive confirmation.

De novo boost is a proxy based on gene constraint, not confirmed de novo status. Parental genotype comparison is required for confirmation.

Ethnicity-based boosts use curated founder mutation lists. Population-specific variants outside these lists do not receive ethnicity boosts.

Phenotype scoring without HPO terms uses gene-disease burden as a proxy, which favors well-characterized genes over recently described disease associations.

Age relevance gene lists are curated and may not include all relevant disease genes for each age group. Lists are updated periodically.

Carrier screening and pharmacogenomics modes are not yet weight-differentiated from adult screening. Dedicated weight profiles are planned.

Gene panel boosts are modulated by ClinGen gene-disease validity, which may not be available for all panel genes.

All scores are normalized to [0.0, 1.0]. Raw metric magnitudes are compressed into a relative scale.

Screening results should always be interpreted by a qualified clinical geneticist in the context of the patient's clinical presentation and family history.

Version History

Every methodology change is versioned and documented.

v2.1.0CurrentMarch 2026

Panel-aware age relevance scoring: panel genes with age_group_relevance="all" receive minimum 0.6 score regardless of patient age

ClinGen gene-disease validity modulates panel gene scores: Definitive/Strong=1.0, Moderate=0.85, Limited=0.7

Compound het detection: splice_region + intron_variant combinations excluded from coding consequences (splice_region alone is not true coding impact)

Phenotype scoring: gene-disease burden capped at 0.5 in screening mode (full 1.0 reserved for actual HPO phenotype match)

Non-coding variants receive discounted phenotype scores proportional to their inability to exploit gene-disease burden

Panel metadata corrections: INS/NEUROD1/LMNA age_group_relevance set to "all", GATA4 ClinGen status corrected to Strong

New sub-panels added: 5 diabetes sub-panels + Comprehensive Diabetes Panel (47 genes)

v2.0.0February 2026

Initial production release with 7-component scoring system

BayesDel_noAF as primary deleteriousness predictor with ClinGen SVI calibration

8-predictor weighted ensemble for deleteriousness scoring

Age-aware weight profiles: Diagnostic, Neonatal, Pediatric, Adult, Elderly

Clinical profile boosts: ethnicity, family history, sex-linked, consanguinity, pregnancy

Four-tier priority ranking with ACMG classification guarantees

Gene panel boost with ClinGen modulation

References

Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants.

Genetics in Medicine. 2015;17(5):405-424.

PMID: 25741868

Pejaver V, Byrne AB, Feng BJ, et al. Calibration of computational tools for missense variant pathogenicity classification and ClinGen recommendations for PP3/BP4 criteria.

Am J Hum Genet. 2022;109(12):2163-2177.

PMID: 36413997

Karczewski KJ, Francioli LC, Tiao G, et al. The mutational constraint spectrum quantified from variation in 141,456 humans.

Nature. 2020;581(7809):434-443.

PMID: 32461654

Cheng J, Novati G, Pan J, et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense.

Science. 2023;381(6664):eadg7492.

PMID: 37733863

Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, et al. Predicting Splicing from Primary Sequence with Deep Learning.

Cell. 2019;176(3):535-548.e24.

PMID: 30661751

Miller DT, Lee K, Abul-Husn NS, et al. ACMG SF v3.2: Reducing noise and improving clinical actionability.

Genet Med. 2023;25(1):100726.

PMID: 36344267

Kohler S, Gargano M, Matentzoglu N, et al. The Human Phenotype Ontology in 2021.

Nucleic Acids Research. 2021;49(D1):D1207-D1217.

PMID: 33264411

Questions About Our Screening Methodology?

We welcome technical questions from clinical geneticists and laboratory directors. Transparency is foundational to clinical trust.

Contact Us