Real-World Performance Study
Cohort 2: Multi-Source Performance Characterization
Real-World Performance Study under HEL-RWP-FRAMEWORK-v1.0. Twenty cases evaluated under a three-layer performance model with multi-source ground truth construction. Methodological characterization of all divergences. Not a binary pass/fail study; foundational scientific evidence at the current organizational stage.
Study specification
Layer 1 - Analytical performance
Single non-detection attributed to upstream sequencing pipeline failure via 7-step audit trail; Helena platform integrity verified via 21:21 input-to-database record correspondence; Helena attribution CLEARED
Layer 2 - Classification performance
Identical ACMG class with the peer reference comparator. Anchored variously in ClinVar 2-3 star, ClinGen Hearing Loss VCEP v1.0, KDIGO 2025 ADPKD guideline, or independent multi-criterion convergence.
Methodologically characterized: 2 validated design behavior (computational-only LP guard and AR carrier guard operating per design), 3 feature-gap accumulation at LP/VUS boundary (PP3 tool divergence, PM1 coverage, PP2 implementation), 1 manual-criteria-dependent inherent automation limitation
No opposite-direction classifications observed
Upstream variant calling pipeline failure (verified non-Helena attribution)
Amendola 2016 (66%), Harrison 2017 (72-76%), Bergquist 2025 (70-75%) inter-laboratory FULL concordance benchmarks
Layer 3 - Clinical workflow performance
Target P/LP variants placed in Tier 1 of phenotype matching output
at least 50% match score
Variant mentioned, interpretation correct, overall quality good
Seven consecutive PASS at cohort closure (Cases 14-20). Improvement trajectory documented across the cohort.
Trigger-configuration validation of conservative-guard architecture across diverse real-world clinical scenarios
Six-category methodological-disposition taxonomy
Every observed concordance pattern in Cohort 2 is resolved into one of the categories below, providing comprehensive methodological characterization rather than binary pass/fail disposition. The taxonomy is the principal scientific contribution of Cohort 2 and is intended to inform future cohort design.
Tier 1 ground-truth concordance
13 componentsHelena and the peer comparator both reach the same classification, anchored in VCEP curation, ClinVar 2-star or higher, or established clinical practice guidelines. Multiple ACMG criteria converge on the same class.
Disposition: No action. System operates correctly. Multiple validation milestones documented.
Validated design behavior
2 componentsHelena classification reflects a documented classifier guard operating per ClinGen-aligned conservative design principles. Disagreement with the peer comparator does not reflect platform error; the guard is the explicit intended output for genotype-context-aware classification.
Disposition: No remediation. Update Change Request registry status to validated with cohort cases as evidence anchor.
Feature-gap accumulation at LP/VUS boundary
3 componentsMultiple documented feature gaps in Helena automated classification scope collectively prevent reaching the LP combining-rule threshold. Three identified gaps: PP3 computational predictor tool divergence, PM1 critical-domain coverage scope, and PP2 not yet implemented in the automated classifier. Each gap is independently characterized and roadmap-tracked. Distinct from defect: the gaps are documented limitations, not errors.
Disposition: Roadmap-tracked. PM1 domain coverage expansion in flight (HELIX-CR-2026-082, deployed April 2026 with UniProt residue-level evidence integration). PP2 implementation roadmap formalization. PP3 tool-choice divergence: no classifier change recommended (defensible methodological choice per ClinGen SVI).
Manual-criteria-dependent inherent automation limitation
1 componentClassification depends primarily on manual-curation evidence (case-control / case-series literature, family cosegregation, individual ClinVar submitter evaluation) that is not available from VCF input. Distinct from feature gap: not a classifier roadmap item, but an inherent scope limitation of automated VCF-only classification.
Disposition: Long-term automation roadmap items only: literature mining for PS4 case-control automation, external trio data ingestion for PP1 segregation, gene-aware PP2 thresholds for small genes. Not classifier defect remediation.
Input data layer upstream pipeline failure
1 case (NOT EVALUABLE)Target variants absent from input VCF due to upstream sequencing or variant calling pipeline. Helena platform integrity verified via 21:21 input-to-database record correspondence at correct genomic target coordinates. Failure is upstream, not Helena.
Disposition: No Helena remediation required (attribution cleared). Upstream pipeline investigation pending sequencing facility action.