Helena

Documentation / Changelog

Changelog

Version history and release notes for the Helena platform.

v1.6.0April 2026Classification Engine v3.28.0

Subtractive sophistication guards (v3.25.3 - v3.27.2)

Subtractive sophistication is the deliberate downgrade of Likely Pathogenic to VUS when the evidence profile is computational-only or biologically inconsistent with the gene mechanism. Six independent guards introduced across versions v3.25.3 through v3.27.2 reduce false positive LP classifications without ever creating new P/LP. The combined effect is more conservative classification in genes where the ACMG framework alone would over-call pathogenicity.

ar_biallelic_missense_guard (v3.25.3, HELIX-CR-2026-061): PP2 and LP4/LP5/LP6 blocked for het missense variants in AR/biallelic-only LoF genes without compound het partner and without ClinVar P/LP missense evidence. Richards 2015 Table 3: PP2 requires "missense variants are a common mechanism of disease" -- objectively false for AR LoF genes without documented missense pathogenicity.

computational_only_lp_guard (v3.25.4, HELIX-CR-2026-062): LP4/LP5/LP6 blocked when ALL pathogenic criteria are computational/annotation-based (PM1, PM2, PP2, PP3, PP3_splice). At least one observed/clinical criterion (PS1, PM3, PM4, PP4, PP5) required. De novo projection NOT gated -- PS2 is observed evidence.

ar_biallelic_het_lof_guard (v3.25.5, HELIX-CR-2026-063): all LP rules blocked for het LoF (HIGH impact) variants in AR/biallelic-only LoF genes without compound het partner. Het LoF in AR-only gene = carrier state, not pathogenic event. P rules NOT blocked: P1 with ClinVar PS1 is legitimate override.

LP2 PP2 bypass fix (v3.25.6, HELIX-CR-2026-064): PP2 (gene-level constraint) no longer bypasses the PP3_Strong+PM2 computational-only LP guard. PP4 (HPO match) and PP5 (ClinVar 1-star) still legitimately override.

LP2 PP3_splice_Strong extension (v3.25.9, HELIX-CR-2026-067): computational-only guard extended to block PP3_splice_Strong + PM2 without observed evidence, same as PP3_Strong + PM2.

LP2 splice bypass for AD/XL LoF genes (v3.27.2, HELIX-CR-2026-081): PP3_splice_Strong (SpliceAI >= 0.8) in AD/XL LoF-intolerant genes with disease association and splice proximity bypasses the computational-only guard. High-confidence splice disruption in established LoF-mechanism gene approaches functional LoF proxy. Six mandatory bypass conditions including MANE Select positional guard and GoF/DN exclusion.

PP3 / PP3_splice mutual exclusion (v3.25.7 - v3.25.8)

PP3_splice missense-tolerated guard (v3.25.7, HELIX-CR-2026-065): PP3_splice (all levels) blocked for pure missense variants without VEP splice consequence when BayesDel noAF data is available. ClinGen SVI Walker 2023 Section 3.2: PP3 is single criterion, not applied twice via different tools.

PP3 + PP3_splice mutual exclusion (v3.25.8, HELIX-CR-2026-066): PP3 BayesDel blocked for dual-consequence variants (missense + splice_region/donor/acceptor) where PP3_splice applies. SpliceAI is the more specific tool when VEP confirms splice proximity.

Combined effect: exactly one PP3 path per variant -- never both. Resolves the double-counting that previously inflated evidence count for dual-consequence variants.

PM3 ClinVar partner validation (v3.25.1)

PM3 ClinVar partner validation guard (HELIX-CR-2026-058): compound_het_candidate alone no longer satisfies PM3. Trans partner must have ClinVar P/LP (clinical_significance LIKE Pathogenic% OR Likely_pathogenic%, NOT LIKE %Conflicting%) with review_stars >= 2. Aligns with Richards 2015 Table 3: PM3 explicitly requires detection in trans with a pathogenic variant.

ClinVar-only guard (no acmg_class dependency): avoids circular reference where partner classification depends on PM3 which depends on partner classification.

Subtractive change: PM3 can only be removed, not added. 241 LP -> VUS expected. 5 LP unchanged (validated partner).

BS2 inheritance-aware AR threshold (v3.26.8)

BS2 inheritance-aware AR threshold (HELIX-CR-2026-074): AR-only genes (Orphanet has_ar=TRUE, has_ad=FALSE, or HI=30) with early-onset-only diseases use threshold of 10 homozygotes instead of generic 15.

Onset guard from refdb.orphanet_disease_onset: AR threshold activated only when gene has no Adult/Elderly/All ages onset disease. Aligns with Richards 2015 Table 3: BS2 "with full penetrance expected at an early age".

Criteria string marker: BS2[AR] when AR threshold path was used.

Dual-mechanism AD/AR LoF bypass (v3.27.0)

Dual-mechanism guard bypass (HELIX-CR-2026-079): ar_biallelic_het_lof_guard (CR-063) extended with bypass for genes with documented monoallelic LoF mechanism. Genes with both biallelic LoF AND monoallelic LoF have an AD pathway where het LoF is pathogenic, not carrier state.

Two bypass paths: gene_disease_mechanism monoallelic LoF (definitive/strong/moderate) OR CLINVAR_LOF_AD_GENES (IFT140, GCM2). ClinGen AD Definitive/Strong path removed in v2 review: ClinGen AD moi does not distinguish LoF from GoF/DN.

Additive change: VUS to LP for het LoF in dual-mechanism AD/AR LoF genes. Affected gene: IFT140 (3rd ADPKD gene per Senum 2022 PMID:34890546).

gnomAD LoF tolerance signal + BP_regional (v3.24.0 - v3.25.0)

gnomAD LoF population tolerance signal (v3.24.0, HELIX-CR-2026-057 Phase 1): boolean gnomad_lof_tolerant flags genes with HC LoF variants and homozygous/hemizygous carriers in healthy gnomAD individuals outside MANE CDS. Phase 1 informational only.

BP_regional Supporting benign (v3.25.0, HELIX-CR-2026-057 Phase 3): gnomad_lof_tolerant signal converted into Supporting benign criterion. Applied to HIGH impact variants outside MANE CDS in genes with documented LoF tolerance. AD genes excluded -- het LoF tolerance in AD is a different clinical argument. Compound het candidates excluded.

Reference data: gnomad_lof_tolerant_gene_summary table aggregates max_nhomalt, max_ac_hemi, lof_variant_count per gene with example_variant audit trail. Criteria string marker: BP_regional[gnomAD_LoF].

BP4_splice_benign for non-canonical splice (v3.25.2)

BP4_splice_benign (HELIX-CR-2026-060): Supporting benign for non-canonical splice region variants (splice_donor_region, splice_donor_5th_base, splice_acceptor_5th_base, splice_region, splice_polypyrimidine_tract) with SpliceAI <= 0.1. Excludes synonymous (BP7 handles) and missense (BP4 BayesDel handles).

PVS1 guard: not applied when PVS1 condition is satisfied. Threshold consistent with BP7. Adds benign evidence only -- cannot create new P/LP.

ClinGen negative evidence guard (v3.26.9)

ClinGen negative evidence guard (HELIX-CR-2026-078): genes with ClinGen Gene-Disease Validity classifications limited to Limited/Disputed/Refuted (no Definitive/Strong/Moderate) no longer accepted as having disease association via Orphanet entry alone.

clingen_negative_genes CTE: ~480 genes with ClinGen negative-only evaluation. Affects ClinVar 1-star override gate, LP4/LP5/LP6 disease gate, PVS1 disease gate, PP3_Strong disease gate, de novo projection.

Subtractive change: 205 affected genes with Orphanet entry + ClinGen negative-only evaluation no longer satisfy disease association gate.

PS1 minimum review stars correction (v3.26.7)

PS1 minimum review stars (HELIX-CR-2026-073): ps1_min_stars corrected from 1 to 2 in processing.yaml. Code default was already 2; YAML override allowed PS1 (Strong) at ClinVar 1-star, bypassing v3.10.0 disease association gate.

PP5 restoration: review_stars = 1 ClinVar P/LP now correctly receives PP5 (Supporting) instead of PS1 (Strong). PP5 range no longer empty set.

PM3 partner validation: partner.review_stars >= ps1_min_stars (shared config).

Combining rule corrections via Bug Bounty (v3.26.3 - v3.26.7)

P8 combining rule fix (v3.26.3, BB-001): ps_count >= 1 AND pm_count >= 4 corrected to ps_count >= 1 AND pm_count >= 1 AND pp_count >= 4 per Richards 2015 Table 5 rule (viii). Previous implementation was dead code (subsumed by P6) and encoded a non-existent rule.

De novo projection compound het + hemizygous (v3.26.4, BB-003): compound_het_candidate guard removed from de_novo_projection outer CASE. PS2 (origin) and PM3 (biallelic configuration) are orthogonal evidence types. Hemizygous genotype support added with chromosome guard (X/Y).

PM2 af_grpmax = 0.0 fix (v3.26.5, BB-006): af_grpmax = 0.0 treated as absent from controls, consistent with global_af path. Affects RASopathy genes with pm2_threshold=0.0.

B2 combining rule BP4_Moderate exclusion (v3.26.6, BB-007): B2 (>= 2 Strong benign) replaced with explicit (has_bs1 + has_bs2) >= 2. BP4_Moderate (Moderate benign per Pejaver 2022) excluded from B2. LB1b fallthrough added: BS1 + BP4_Moderate = LB.

De novo projection refinements (v3.26.0 - v3.26.4)

De novo AF ceiling + post-projection conflict guard (v3.26.0, HELIX-CR-2026-071): de_novo_projection CTE filters by AF (global_af > 0.001 OR af_grpmax > 0.001 blocks projection). Post-projection conflict guard: bs_count > 0 blocks de novo projection unconditionally.

Compound het + hemizygous compatibility (v3.26.4, BB-003): de novo and compound het are biologically compatible. Hemizygous males on X/Y eligible for de novo projection.

Subtractive change: removes false de novo candidates. Cannot create new P/LP or new de novo candidates.

Processing notes refactor (v3.27.1, v3.28.0)

Processing notes CASE priority fix (v3.27.1, HELIX-CR-2026-080): single first-match-wins CASE structure replaced with independent CASE WHEN concatenation. Variants matching multiple conditions now receive all applicable notes separated by semicolons.

Processing notes sentinel-guarded dedup (v3.28.0, HELIX-CR-HEL-VA-2026-001): all 14 ACMG note categories prefixed with [ACMG] sentinel. Symmetric refactor in artifact detection stage with [ARTIFACT] sentinel. Eliminates stale note preservation across reclassification runs and N-fold note duplication on repeated classifier runs.

Zero classification logic changes in either CR. Operational improvement only.

Reference databases

gnomAD pext (v4.1, GTEx v10): per-exon expression proportion across 49 GTEx tissues. 189,856 exon records for 18,923 MANE Select genes. Used by PVS1 expression-aware guard (v3.23.0).

MANE Select transcripts (v1.4): 19,354 transcripts (19,288 Select + 66 Plus Clinical) with CDS coordinates. Used by PVS1 positional guard (v3.22.0) and PM4.

gnomad_lof_tolerant_gene_summary: aggregated max_nhomalt, max_ac_hemi, lof_variant_count per gene. Source of gnomAD LoF tolerance signal (v3.24.0) and BP_regional criterion (v3.25.0).

orphanet_disease_onset (HELIX-REF-005): per-disease onset annotations (Adult/Elderly/All ages flags) from Orphadata. Used by BS2 inheritance-aware AR threshold (v3.26.8).

clinvar_missense_genes: 2,136 genes with ClinVar P/LP missense evidence (>= 2 at 2+ stars OR >= 3 at 1+ star). Used by PP3 missense relevance guard Gate 4a (v3.19.0) and BP1 reference-based ClinVar guard (v3.20.9).

gene_disease_mechanism (HELIX-REF-003): unified molecular mechanism table from G2P + GoFCards + manual curation. 2,617 records. Source for gof_genes_unified view (347 GoF/DN/GoE genes) and gof_genes_exclusive view (excluded 55 dual-mechanism genes).

v1.5.0March 2026Classification Engine v3.17.0

Classification (HELIX-CR-2026-023)

PVS1: G2P molecular mechanism integration. DECIPHER Gene2Phenotype (G2P) database now provides primary GoF/DN guard for PVS1. gof_genes_g2p view contains 198 pure GoF/DN monoallelic genes where PVS1 is blocked.

PVS1: GOF_AD_GENES reduced from 36 to 14 fallback genes (genes not in G2P: 6 neurodegeneration/toxic aggregation, 3 somatic/neomorphic, 5 absent from G2P 2026-02-28 release). PRSS1 added to fallback.

PVS1: GNAS removed from GOF_AD_GENES (dual-mechanism: McCune-Albright = GoF, pseudohypoparathyroidism 1A = LoF). GNAS LoF variants now correctly receive PVS1.

PVS1: 49 dual-mechanism genes (SCN5A, LMNA, KCNH2, KCNQ1, FGFR1) excluded from GoF view, preserving PVS1 for legitimate LoF phenotypes.

PVS1: G2P confidence filter: only definitive, strong, moderate included. Limited confidence excluded.

Subtractive change: can only remove PVS1 from GoF/DN genes. Cannot create false P/LP.

Clinical trigger: PD2025_090 PRSS1 p.Gly177Ter (stop_gained in GoF gene).

Reference Databases

DECIPHER G2P: Molecular mechanism data (g2p_gene_mechanism table, 2,372 records) added to reference_db alongside existing HPO enrichment data.

gof_genes_g2p view: 198 pure GoF/DN monoallelic genes for PVS1 guard. Dual-mechanism genes excluded.

New loader script: scripts/load_g2p_mechanism.py with dry-run, force flags, and 9-point validation.

v1.4.0March 2026Classification Engine v3.16.8

Classification

PVS1: GoF AD gene exclusion (HELIX-CR-2026-022). GOF_AD_GENES curated list (36 genes) blocks PVS1 for AD genes where disease mechanism is gain-of-function, dominant-negative, or toxic aggregation.

PVS1: ClinGen AD Definitive/Strong genes bypass pLI/LOEUF constraint gate (v3.16.6). Genes like TUBB1 with ClinGen AD Definitive now qualify for PVS1.

LP classification: Disease association gate for LP rules LP4, LP5, LP6 (v3.16.7, HELIX-CR-2026-021). Prevents LP in genes without known Mendelian disease mechanism.

ClinGen Gene-Disease Validity integration (v3.16.0): ar_lof_genes_clingen and disease_associated_genes_clingen reference tables replace Python constants.

Clinical triggers: PD2025_082 RAC1 frameshift (GoF gene), PD2025_090 discordance analysis.

Reference Databases

ClinGen Gene-Disease Validity: ar_lof_genes_clingen and disease_associated_genes_clingen tables added to reference_db.

Total reference databases: 14 (previously 13).

v1.3.0March 2026Classification Engine v3.15.0

Classification

PM1: Critical functional domains migrated from Python constant to reference database table (refdb.interpro_pfam_domains). HELIX-CR-2026-012.

PM1: Full InterPro Pfam catalog (27,481 entries) loaded. 49 domains marked critical across 14 categories.

PM1: CRITICAL_PFAM_DOMAINS converted from short names to InterPro-verified accession numbers (v3.14.0, HELIX-CR-2026-011). Fixed 0 legitimate PM1 applications since curated list introduction.

PM1: Post-fix validation on HG2023_206: PM1 count from 2 (false positive) to 1,139 (legitimate).

PVS1: ClinVar LOF evidence as fifth constraint gate path (v3.13.0, HELIX-CR-2026-010). Small AD genes with ClinVar P/LP LOF but uninformative gnomAD constraint now qualify.

PP3/BP4: Missense consequence guard (v3.12.1, HELIX-CR-2026-009). BayesDel path now requires missense_variant consequence.

Clinical triggers: HG2023_206 RYR1 (PM1 fix), GCM2 p.Arg131Ter (PVS1 + PP3 guard).

Reference Databases

InterPro / Pfam added as 13th reference database (27,481 Pfam entries).

Loader script: scripts/load_interpro_pfam.py with InterPro API fetch and offline fallback.

Total reference databases: 13 (previously 12).

v1.2.0March 2026Classification Engine v3.12.0

HPO Enrichment (HELIX-REF-001)

Multi-Source HPO Enrichment Pipeline: gene-phenotype annotations expanded from 1 source (5,173 genes) to 6 sources (5,688 genes, +10%)

HPO Consortium: 320K records, 5,173 genes (primary source, curated gene-phenotype associations)

Orphanet disease-to-HPO: 168K records, 3,176 genes with clinical frequency data (via Orphadata en_product4.xml + en_product6.xml)

DECIPHER Gene2Phenotype (G2P): 43K records, 2,125 genes from 7 clinical panels (DD, Eye, Cardiac, Skin, Skeletal, Cancer, Ear)

Monarch Initiative: 151K records, 4,791 genes (HPO Consortium redistribution via Monarch KG)

ClinVar-MedGen: 245K records, 5,258 genes (P/LP variant -> MedGen CUI -> HPO chain mapping, largest contributor of new genes)

Manual clinical curation: common-disease genes outside rare-disease HPO scope (TBC1D4 with PMID evidence)

3,292 genes with increased HPO term coverage from cross-source aggregation

Source priority ordering: Orphanet > G2P > HPO Consortium > Monarch > ClinVar-MedGen

Low-confidence filtering: Orphanet modifying/candidate and G2P limited entries excluded from clinical view

Classification

Classifier version bumped to v3.12.0

PP4 criterion evaluates against enriched HPO set (more genes trigger PP4)

No ACMG criteria logic changes -- only annotation data expanded

Reference Databases

DECIPHER G2P added as new reference database (7 clinical panels)

Monarch Initiative added as new reference database

MedGen HPO-OMIM mapping added as new reference database

HPO database entry updated to reflect multi-source enriched view

Total reference databases: 12 (previously 9)

v1.1.0March 2026Classification Engine v3.11.4

Classification

PVS1 disease association gate: requires established disease association before applying PVS1 (ClinVar P/LP, Orphanet, ClinGen HI, AR LoF list, or VCEP coverage)

BS1 constraint-implied AD fallback: uses AD threshold (0.1%) for LoF-constrained genes without inheritance data

BS1 cascade expanded from 5 to 8 levels with explicit Orphanet AR and AR_LOF_GENES priorities

PVS1 non-canonical splice exclusion: splice_donor_5th_base_variant, splice_donor_region_variant, splice_acceptor_5th_base_variant blocked from PVS1 (v3.11.4)

PVS1 NMD_transcript_variant exclusion fix: NMD_transcript no longer blocks PVS1, NMD_escaping_variant now blocks instead (v3.11.3)

De novo projection: prospective PS2 upgrade computation for VUS variants to guide trio testing (v3.11.0)

ClinVar low-confidence disease association gate: 1-star P/LP requires gene disease evidence (v3.10.0)

disease_genes_clinvar CTE circular reference fix: review_stars >= 2 filter (v3.10.1)

PVS1 last-exon NMD downgrade: last-exon truncating variants receive PVS1_Strong instead of PVS1 Very Strong (v3.9.0)

BP1 ClinVar pathogenic missense guard and PP3_Supporting label (v3.8.2)

PP3_Strong disease association gate (v3.8.1)

Reference Databases

Orphanet/Orphadata documented as separate reference database (gene-disease-inheritance for 3,200 genes)

VCEP gene-specific specifications documented as separate reference database (~50-60 genes)

Total reference databases: 9 (previously 7)

v1.0.0February 2026Initial Release

Classification

ACMG/AMP 2015 classification with Bayesian point-based framework (Tavtigian 2018/2020)

19 of 28 ACMG criteria automated

BayesDel_noAF with ClinGen SVI-calibrated thresholds for PP3/BP4 (Pejaver 2022)

SpliceAI integration for PP3_splice with PVS1 double-counting prevention

ClinVar override logic with review star quality filtering

Gene-specific VCEP threshold support

Reference Databases

gnomAD v4.1 (759M variants, 807K individuals)

gnomAD Constraint v4.1 (18.2K genes)

ClinVar 2025-01

dbNSFP 4.9c

SpliceAI precomputed (Ensembl MANE Release 113)

HPO gene-phenotype associations

ClinGen dosage sensitivity

Ensembl VEP Release 113 with local offline cache

Phenotype Matching

Lin semantic similarity with HPO ontology graph

Five-tier clinical priority system

Gene-level deduplication and aggregation

Automatic HPO term extraction from free-text clinical descriptions

Screening

Seven-component scoring algorithm (constraint, deleteriousness, phenotype, dosage, consequence, compound het, age relevance)

Six screening modes (diagnostic, neonatal, pediatric, proactive adult, carrier, pharmacogenomics)

Age-aware prioritization with curated gene lists

Clinical boosts for ethnicity, family history, sex-linked inheritance, consanguinity, pregnancy

Four-tier priority ranking with clinical actionability labels

AI Clinical Assistant

Conversational variant analysis with natural language database queries

Biomedical literature search (1M+ publications, local PubMed mirror)

Four-level adaptive clinical interpretation generation

PDF and DOCX report export with Helena branding

Genomics-aware visualization suggestions

On-premise LLM inference within EU infrastructure

Infrastructure

EU-based processing (Helsinki, Finland)

All databases stored and queried locally

Zero external API calls during variant processing

GDPR-compliant data handling

DuckDB-based analytical pipeline