Family Analysis Methodology
Helena's Family Analysis module derives inheritance-aware evidence from trio (proband + mother + father), duo (proband + one parent), and proband-plus-sibling analyses. The methodology implements ClinGen SVI 2018 specifications for de novo PS2/PM6 application (PMID: 29543229), the Jarvik & Browning 2016 LOD framework for segregation evidence (PMID: 27236918), and ClinGen SVI 2021 PP1 strength bands for cosegregation interpretation. Every threshold and reference data source is documented for audit confidence.
Family Analysis does not re-call variants. Helena receives pre-classified VCF files from upstream laboratories via the Variant Analysis pipeline and operates entirely on the stored classifications. Family context is used solely to derive additional inheritance-based evidence that augments the existing ACMG/AMP 2015 classification.
Contents
Pipeline overview
Seven-stage flow from family composition validation to enriched evidence delivery. Total runtime is typically 75-130 seconds for a full trio with joint VCF, or 45-70 seconds when sample QC is not requested.
Receive pre-classified variants from family members
~5sPer-member classified variant files are validated for schema parity. If column sets diverge between members, family analysis halts and requests re-classification at the same Variant Analysis classifier version. This ensures cross-member ACMG comparisons are coherent.
Unify trio variants via positional join
~15-25sVariants from each family member are joined on chromosome, position, reference allele, and alternate allele to produce a unified family-level dataset. Per-member genotypes, qualities, and ACMG classifications are preserved alongside positional metadata. Variants present in any family member are retained, supporting parent-only variant inspection for Mendelian error detection.
Sample quality control via PLINK IBD
~30-60sWhen a joint multi-sample VCF is provided, PLINK 1.9 computes pairwise Identity-by-Descent proportions across declared family members. Sample swap, consanguinity, and unexpected relatedness are evaluated and surfaced to the clinical reviewer. Skipped when no joint VCF is available.
De novo variant detection with tiered confidence
~14-17sEach proband variant is classified into a confidence tier reflecting parental genotype quality, sequencing depth, and chromosome-presence inference. The tiered output drives PS2 (high confidence) or supporting evidence (lower tiers) per ClinGen SVI 2018.
Compound heterozygosity detection
~10-15sWithin each gene, candidate variant pairs are evaluated for trans configuration using parental genotypes as natural phasing source. Per-gene variant counts are capped pragmatically to control combinatorial complexity in large genes such as TTN and MUC16.
Cosegregation scoring (per-variant LOD)
~5-10sPer-variant LOD scores are computed via the Jarvik & Browning 2016 framework, modulated by phenotype specificity per ClinGen SVI 2021. Each variant is mapped to one of five strength bands ranging from indeterminate through very strong.
Deliver enriched evidence to clinical review
<2sAggregated evidence summary, sample QC report, and per-variant inheritance annotations are made available to the clinical reviewer. Algorithm versions are recorded immutably with each analysis run for regulatory traceability.
Family compositions
Family Analysis supports four composition tiers. The proband must always be marked as affected. The composition tier is validated before analysis begins; insufficient compositions are rejected with a clinical-friendly error message rather than producing partial results.
| Composition | Requirement | Analysis scope |
|---|---|---|
| Full trio | Proband + mother + father | Optimal; all three inheritance phases applicable |
| Duo | Proband + one parent | Reduced confidence; compound heterozygosity applicable, de novo detection not feasible |
| Proband + sibling | Proband + affected or unaffected sibling | Cosegregation evidence via sibling affected status |
| Proband only | Only proband (no relatives) | Insufficient; trio analysis is not executed |
The proband is the affected individual being investigated. Multiple probands per family are not supported in the current version. Composition is validated before analysis begins; insufficient compositions are rejected with a clinical-friendly error message rather than producing partial results.
Sample quality control (PLINK IBD)
Before inheritance analysis begins, Helena verifies that declared family relationships are consistent with the genomic data. Sample swap detection is a critical clinical safety check: a mislabelled sample produces invalid inheritance evidence regardless of algorithmic correctness downstream.
Method
PLINK 1.9 computes pairwise Identity-by-Descent (IBD) proportions across all declared family members using biallelic single-nucleotide variants with minor allele frequency at or above 0.05 and missingness at or below 5 percent. Expected IBD for parent-child pairs is approximately 0.50; expected IBD for unrelated parents is below 0.10. Deviations beyond defined thresholds raise alerts that are surfaced to the clinical reviewer.
Alert types
Sample swap
Declared parent-child pair with PI_HAT below 0.40. Pipeline fails; clinical reviewer must validate sample labelling before re-running analysis.
Duplicate sample
Declared parent-child pair with PI_HAT above 0.70, indicating possible sample duplication or extreme consanguinity. Pipeline proceeds; warning surfaced.
Consanguinity
Declared unrelated parents with PI_HAT above 0.125 (third-degree relatives or closer). Pipeline proceeds; warning surfaced for clinical interpretation.
De novo variant detection
De novo variants arise in the germline of a parent or in the post-zygotic period of the proband and are absent from both parental genomes. Their identification relies on confident parental hom_ref calls at the variant position. Helena applies a tiered confidence assessment that surfaces the strength of de novo inference rather than producing a binary call.
ACMG criteria triggered
ACMG/AMP 2015 (Richards et al., PMID: 25741868) defines PS2 (Strong) for confirmed de novo variants with verified maternity and paternity, and PM6 (Moderate) for assumed de novo variants without confirmation. The ClinGen Sequence Variant Interpretation Working Group specification (Biesecker & Harrison 2018, PMID: 29543229) refined these criteria, distinguishing confidence levels and providing application guidance.
Applicability
De novo detection requires a full trio. Duo analyses (proband plus one parent) cannot trigger PS2 or PM6 because de novo origin requires confident hom_ref evidence from both parents. In duo analyses, all proband variants receive a not-applicable de novo classification, and the inheritance evidence focuses on compound heterozygosity and cosegregation instead.
Tiered confidence assessment
Each candidate de novo variant is assigned to one of three confidence tiers (plus excluded and not-applicable special states). High tier supports PS2 application; lower tiers are surfaced as supporting evidence for clinical reviewer judgment.
High confidence
Both parents pass strict coverage and genotype quality thresholds at the variant position with confident hom_ref calls. Supports PS2 (Strong) application per ClinGen SVI 2018.
Low confidence
At least one parent has sub-threshold coverage, genotype quality below recommended levels, or chromosome representation that cannot be confirmed. Variant is surfaced for manual reviewer interpretation; PS2 is not auto-applied.
Excluded
Variant present in at least one parent (parent genotype not hom_ref). De novo origin is explicitly disconfirmed.
Chromosome-presence gate
Pre-classified variant files contain only variant-positive positions. Absent rows are conventionally interpreted as hom_ref. To distinguish confident hom_ref inference from suspicious absence, Helena verifies that the parent's data contains at least some variants on the same chromosome as the candidate. If yes, the chromosome was sequenced and absence supports hom_ref inference. If no, coverage is unclear and confidence drops to the lower tier.
chrY for female parents
Female parents biologically lack chrY. Absence of chrY variants in maternal data is expected, not a coverage gap. The chromosome-presence gate accommodates this exception so chrY proband variants are not unnecessarily downgraded.
chrM for paternal contribution
Mitochondrial DNA is maternally inherited; paternal mitochondria are degraded during fertilisation. Absence of chrM variants in paternal data is expected. The gate excludes paternal chrM coverage from confidence calculations to reflect this biology.
Exclusions
The following situations are excluded from de novo interpretation or trigger downgrades to lower confidence tiers:
Variants present in at least one parent are explicitly excluded; de novo origin is disconfirmed.
Mosaic de novo variants below detection threshold in parental samples may be incorrectly classified as de novo. Clinical interpretation should consider mosaicism when phenotype suggests it.
Regions with known low-quality parental genotyping (low complexity, segmental duplications) may produce sub-threshold coverage and downgrade confidence.
Variants on parent-only positions (present in a parent but not in the proband) are not de novo candidates by definition and are excluded from de novo classification.
Compound heterozygosity
Compound heterozygosity is a recessive mechanism in which the proband carries two distinct heterozygous pathogenic variants in the same gene, each inherited from a different parent. Trio data provides natural phasing through inheritance, transforming an inferred-only assessment into a definitive setup.
Biological rationale
In autosomal recessive disorders, pathogenicity requires biallelic disruption. Two heterozygous variants in trans configuration (one from each parent) result in functional knock-out of the gene, just as a homozygous loss-of-function variant would. In solo proband whole-exome or whole-genome analysis, phase cannot be determined without parental information or long-read sequencing. Trio data resolves this gap.
ACMG criterion triggered
ACMG/AMP 2015 (Richards et al., PMID: 25741868) defines PM3 (Moderate): "Detected in trans with a pathogenic variant for recessive disorders." Compound heterozygous candidates in trans configuration with a confidently classified partner contribute to PM3 application.
Phasing methodology
If the proband is heterozygous for variant A and variant B in the same gene, and variant A is inherited from the mother (mother heterozygous, father reference) while variant B is inherited from the father (mirror configuration), then A and B are in trans configuration. This is a definitive compound heterozygous setup. When parental genotype data is incomplete (sub-threshold coverage, missing chromosome representation), Helena surfaces the candidate at lower confidence with explicit reason rather than making a binary determination.
Variant consequence filtering
Candidate variants are restricted to clinically relevant consequence types. Intronic variants distant from splice sites, synonymous variants without splice impact, and UTR-adjacent variants are excluded from compound heterozygosity assessment.
Included consequences
missense_variant
stop_gained
stop_lost
frameshift_variant
inframe_insertion
inframe_deletion
splice_donor_variant
splice_acceptor_variant
splice_region_variant
start_lost
initiator_codon_variant
Excluded consequences
intron_variant
downstream_gene_variant
upstream_gene_variant
synonymous_variant
A pragmatic cap of 50 candidate variants per gene protects against combinatorial explosion in large genes such as TTN, MUC16, and SYNE1. Variants are ranked by ACMG severity (P > LP > VUS > LB > B) and confidence score before the cap is applied, so the highest-priority candidates are retained.
Multi-partner candidates
When a variant has multiple potential compound heterozygous partners in the same gene, Helena selects a single primary partner deterministically and flags the multi-partner state for clinical reviewer awareness. The reviewer sees that additional partner variants exist and can examine each pairing individually. This avoids hidden ambiguity and preserves clinical decision authority.
Cosegregation scoring (LOD)
Cosegregation of a candidate variant with phenotype across family members provides quantitative evidence for pathogenicity. Helena implements the classical Jarvik & Browning 2016 LOD framework (PMID: 27236918) with phenotype specificity modulation per ClinGen Sequence Variant Interpretation Working Group 2021 recommendations.
LOD formula
Per-variant LOD scores are computed using the logarithm-of-the-odds framework as described in Jarvik & Browning 2016. Each informative meiotic event consistent with the inheritance hypothesis contributes positively; each violation contributes negatively; uninformative meioses contribute zero. The mathematical formulation is described in detail in the cited reference and is not reproduced here.
Supported inheritance hypotheses
Helena computes LOD scores for the following modes of inheritance. When the inheritance hypothesis is unspecified, segregation scoring is not performed and per-variant strength is reported as not-applicable; this is documented per Option A of the implementation specification.
Five-band evidence strength scale
The continuous LOD score is mapped to one of five evidence strength bands per ClinGen SVI 2021 recommendations. Each band corresponds to an ACMG modifier strength applied to the PP1 (cosegregation) criterion.
| Strength band | LOD range | ACMG points | ACMG strength |
|---|---|---|---|
| indeterminate | [0.0, 0.5) | 0 | -- |
| supporting | [0.5, 2.0) | 1 | PP1_Supporting |
| moderate | [2.0, 3.0) | 2 | PP1_Moderate |
| strong | [3.0, 5.0) | 4 | PP1_Strong |
| very_strong | [5.0, inf) | 8 | PP1_VeryStrong |
Phenotype specificity multiplier
Per ClinGen SVI 2021, the LOD score is dampened by a multiplier reflecting phenotype specificity. Highly specific phenotypes (syndromic, characteristic of a single gene) receive full weight; broad phenotypes (common, multiple genes possible) receive half weight; unspecified phenotypes receive quarter weight as a conservative default.
specific
x1.0
broad
x0.5
unspecified
x0.25
Trio-only mathematical ceiling
Trio analysis provides a maximum of two informative meioses (mother-to-proband and father-to-proband). Per Jarvik & Browning Table 2, this implies a theoretical LOD ceiling of approximately 0.6 before specificity multiplier is applied, and approximately 0.3 in the typical unspecified case. Most real trio cases land in the indeterminate or supporting bands. Reaching strong or very strong bands requires extended pedigree analysis with multiple affected family members across generations. Helena documents this ceiling explicitly in every trio output as scientific honesty rather than marketing aspiration.
Evidence summary for clinical review
After all phases complete, Helena delivers a structured evidence summary alongside the per-variant inheritance annotations. The clinical reviewer receives the following information for each completed analysis:
Number of de novo candidate variants per confidence tier (high, low, excluded counts).
Number of compound heterozygous candidate pairs evaluated, plus per-gene breakdown of variants with assigned partners and multi-partner cases.
LOD distribution across strength bands (indeterminate, supporting, moderate, strong, very strong counts), maximum LOD observed, and mean LOD across scored variants.
PLINK IBD quality control results when joint VCF was provided: per-pair PI_HAT values, raised alerts (sample swap, duplicate sample, consanguinity), and PLINK version for audit.
Algorithm versions used for this specific analysis run, recorded immutably so the clinical reviewer always knows which exact version produced each result.
Methodology architecture
Two architectural principles shape Family Analysis: maximum-fidelity reuse of upstream classifications and write-once algorithm versioning for regulatory traceability.
Maximum fidelity, no re-calling
Helena does not re-call variants and does not perform Variant Effect Predictor annotation locally. Pre-classified files from upstream laboratories are consumed as authoritative input, preserving the full ACMG classification context. This design has three operational benefits: laboratory partners retain complete control over raw sequencing data; the analysis is fast (typically 75-130 seconds for a full trio); and the ACMG context flows through the family module unchanged, with inheritance evidence layered on top rather than recomputed.
Write-once algorithm versioning
Algorithm version labels (joint caller, de novo algorithm, segregation algorithm, sample QC algorithm) are recorded with each analysis at creation time and cannot be modified afterwards. This guarantees scientific reproducibility and regulatory traceability: when a clinical reviewer examines a completed analysis, they always know precisely which algorithm version produced each result, regardless of subsequent platform upgrades.
Reference tools and databases
gnomAD v4.1.0 (joint genomes + exomes)
~759M variants for de novo expectation calibration
Source: Broad Institute
DeepTrio
Reference-quality joint trio caller
Source: Google Health
GLnexus
Joint genotyping for separate per-sample VCFs
Source: Broad Institute / DNAnexus
MANE Select v2.0
19,354 transcripts (canonical reference for HGVS)
Distinctive Helena features
What Helena adds beyond the public standards. Each feature is publicly documented; the value is in disciplined integration, not in algorithmic novelty hidden from clinical scrutiny.
Maximum-fidelity approach without re-calling
Helena consumes pre-classified variant files from upstream laboratories without re-calling or re-annotating. Laboratories retain control over raw data; no BAM or CRAM transfer is required. ACMG classification context is preserved end-to-end through the family module.
Tiered de novo confidence assessment
Rather than producing a binary de novo call, Helena assigns each candidate variant to a confidence tier reflecting parental data quality. The tier surfaces the strength of inference to the clinical reviewer rather than presenting a falsely absolute determination. PS2 is auto-applied only at high confidence; lower tiers are surfaced as supporting evidence for reviewer judgment.
Chromosome-aware safeguards
The chromosome-presence gate distinguishes confident hom_ref inference from suspicious absence by verifying parental data coverage on the candidate chromosome. Sex-aware exceptions (chrY for female parents, chrM for paternal contribution) prevent false confidence downgrades arising from biologically expected absence rather than coverage gaps.
Honest LOD ceiling disclosure
Trio-only analysis has a mathematical LOD ceiling of approximately 0.3 (two informative meioses, modulated by specificity). Helena documents this ceiling explicitly in every trio output. Most trio cases land in the indeterminate or supporting bands; strong and very strong evidence require extended pedigree. This honest scientific posture is preferable to marketing aspirations that overstate capabilities.
Write-once algorithm versioning for regulatory reproducibility
Algorithm version labels are recorded immutably at analysis creation. Subsequent platform upgrades do not retroactively modify completed analyses. A clinical reviewer always knows which exact version produced a given result, supporting both scientific reproducibility and accreditation audit requirements.
Limitations
Honest disclosure of what Family Analysis cannot do. Listing limitations builds clinical trust; concealing them erodes it.
Trio-only analysis has a LOD ceiling of approximately 0.3 (two informative meioses). Strong and very strong cosegregation evidence require extended pedigree analysis with multiple affected family members across generations.
De novo detection requires a full trio. Duo analyses (proband plus one parent) cannot trigger PS2 or PM6 because confident hom_ref evidence from both parents is required.
Compound heterozygosity in duo analyses has reduced certainty because phasing relies on genotype data from only one parent.
Mosaic de novo variants below detection threshold in parental samples may be incorrectly classified as de novo. Clinical interpretation should consider mosaicism when phenotype suggests it.
PLINK Identity-by-Descent quality control requires a joint multi-sample VCF from the laboratory. When joint calling has not been performed upstream, sample QC cannot be executed and the corresponding alerts are not raised.
Structural variants and copy number variants are not processed by the current version. Family Analysis operates on single-nucleotide variants and small insertions or deletions only.
Multi-proband families (more than one affected individual designated as proband) are not supported in the current version and are deferred to future work.
Results must always be interpreted in the context of the patient's clinical presentation, family history, and other available clinical information. Family Analysis evidence augments, but does not replace, expert clinical judgment.
Version history
Released versions of the Family Analysis module. Each release records its scope and underlying standards.
Initial public release of Family Analysis module.
ClinGen SVI 2018 specification for de novo PS2 and PM6 application (PMID: 29543229).
Tiered de novo confidence assessment (high, low, excluded, not-applicable) with chromosome-presence gate and sex-aware exceptions.
Compound heterozygosity detection with trio-based phasing and pragmatic 50-variant per-gene cap.
Cosegregation LOD scoring per Jarvik & Browning 2016 framework (PMID: 27236918) with phenotype specificity multiplier per ClinGen SVI 2021.
Five-band LOD evidence strength scale (indeterminate, supporting, moderate, strong, very strong) aligned with ClinGen SVI 2021 PP1 modifier tiers.
PLINK 1.9 Identity-by-Descent sample quality control with sample swap, consanguinity, and duplicate sample alerts.
Write-once algorithm versioning for regulatory reproducibility.
References
Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.
Genetics in Medicine. 2015;17(5):405-424.
PMID: 25741868Biesecker LG, Harrison SM; ClinGen Sequence Variant Interpretation Working Group. The ACMG/AMP reputable source criteria for the interpretation of sequence variants.
Genetics in Medicine. 2018;20(12):1687-1688.
PMID: 29543229Abou Tayoun AN, Pesaran T, DiStefano MT, Oza A, Rehm HL, Biesecker LG, Harrison SM. Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion.
Human Mutation. 2018;39(11):1517-1524.
PMID: 30192042Jarvik GP, Browning BL. Consideration of cosegregation in the pathogenicity classification of genomic variants.
American Journal of Human Genetics. 2016;98(6):1077-1081.
PMID: 27236918Walker LC, Hoya M, Wiggins GAR, Lindy A, Vincent LM, Parsons MT, et al. Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on splicing: Recommendations from the ClinGen SVI Splicing Subgroup.
American Journal of Human Genetics. 2023;110(7):1046-1067.
PMID: 37352859Veltman JA, Brunner HG. De novo mutations in human genetic disease.
Nature Reviews Genetics. 2012;13(8):565-575.
PMID: 22781750Kolesnikov A, Goel S, Nattestad M, Yun T, Baid G, Yang H, et al. DeepTrio: variant calling in families using deep learning.
bioRxiv. 2021.
DOI: 10.1101/2021.04.05.438434Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses.
American Journal of Human Genetics. 2007;81(3):559-575.
PMID: 17701901Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies.
Bioinformatics. 2010;26(22):2867-2873.
PMID: 20926424McCormick EM, Lott MT, Dulik MC, Shen L, Attimonelli M, Vitale O, et al. Specifications of the ACMG/AMP standards and guidelines for mitochondrial DNA variant interpretation.
Human Mutation. 2020;41(12):2028-2057.
PMID: 33058415Bring family-aware variant interpretation into your clinical workflow
Helena's Family Analysis module integrates with your existing Variant Analysis pipeline. Schedule a discussion with our scientific team to assess the fit for your laboratory.