Helena

Analysis Workflow

Family and Trio Analysis

Enginev1.0|3 inheritance algorithms|ClinGen SVI 2021

Clinical genetics diagnoses of rare Mendelian disorders benefit enormously from family context. A variant classified as VUS in the proband solo WES may be upgraded to Likely Pathogenic once parental genotypes confirm de novo origin. Compound heterozygous pairs in recessive genes are invisible without phasing. Segregation of a variant with disease status across family members provides additional evidence per ClinGen SVI 2021 guidelines. The Family Analysis Service automates that family-aware analysis on pre-classified WGS or WES data.

No variant re-calling. No VEP re-annotation. The service operates purely on the pre-classified DuckDB files produced upstream by Variant Analysis, preserving ACMG classification context and adding family-aware evidence on top.

Clinical Positioning

Three concrete clinical scenarios illustrate why family analysis changes the diagnostic answer in ways solo proband analysis cannot.

Three Scenarios

  • VUS in proband, de novo confirmed by parents. A missense variant in a constrained gene initially classified as VUS in the solo proband. Once parental genotypes confirm the variant arose de novo, the strength of evidence increases substantially, de novo origin in a constrained gene is one of the strongest single pieces of clinical evidence available.

  • Two heterozygous variants in a recessive gene, phased in trans. A pair of variants in a recessive disease gene, one inherited from each parent, forms biallelic loss of function. This is the diagnostic answer for many recessive conditions and is invisible without parental phasing.

  • Variant segregates with disease. A variant present in affected family members and absent in unaffected family members provides per-ClinGen-SVI segregation evidence. The strength depends on the number of meioses observed, a trio caps at supporting; extended pedigree is required for moderate or stronger.

Each of these is a different inheritance algorithm with different data requirements. The Family Service runs all three on the same trio data and presents the combined evidence to the geneticist alongside the upstream ACMG classification.

Supported Family Compositions

Trio is canonical. Duo and proband-with-sibling are supported with explicit reduced-confidence flagging on the affected algorithms.

Trio

Proband, mother, father. The canonical composition. Enables full de novo detection, compound heterozygous phasing with parental origin, and segregation scoring across two meioses.

Capabilities: All three inheritance algorithms run.

Duo

Proband and one parent. Supports inheritance analysis with reduced confidence on de novo calls (one parent unobserved) and partial phasing.

Capabilities: De novo and segregation run with explicit reduced-confidence flagging. Compound heterozygous phasing limited.

Proband and Sibling

For families where parents are unavailable, a sibling can provide segregation evidence when affected status is known on both members.

Capabilities: Segregation scoring runs. De novo and compound het not feasible without parental data.

Pipeline Architecture

Four stages execute sequentially on a typical WGS trio in approximately 30-90 seconds. The pipeline is atomic per phase, if a phase fails, subsequent phases are not executed and the failure point is recorded.

1

Stage 1

Trio JOIN Construction

The service ATTACHes each member pre-classified DuckDB read-only and constructs a unified trio table by joining on (chromosome, position, reference allele, alternate allele). Per-member columns are role-suffixed (proband, mother, father). Inheritance annotation columns are pre-allocated and populated by subsequent stages.

2

Stage 2

Sample Quality Control

When a joint VCF is available, PLINK 1.9 is invoked for identity-by-descent (IBD) analysis. Output is parsed for sample-swap, consanguinity, or duplicate-sample alerts. The resulting trio_qc.json is persisted alongside the trio data and surfaced in the case summary.

3

Stage 3

Inheritance Analysis Pipeline

Three sequential phases run on the trio table: de novo detection, compound heterozygous phasing, and segregation scoring. Each phase is atomic, if a phase fails, subsequent phases are not executed and the failed_at_stage is recorded for operator review.

4

Stage 4

Evidence Aggregation

Per-variant inheritance annotations are summarised into evidence_summary and qc_summary JSON, stored on the trio analysis record for fast retrieval. Per-variant detail remains in the trio DuckDB for on-demand drill-down by the geneticist.

Sample Quality Control

Before any inheritance analysis, the service verifies that the family relationships in the metadata match the genetic relationships in the data. Sample-swap, accidental duplication, and unreported consanguinity all corrupt downstream inheritance calls if undetected.

PLINK Identity-by-Descent Analysis

When a joint VCF is available from the upstream lab, the service runs PLINK 1.9 IBD analysis on biallelic SNPs above a minor allele frequency threshold. The pairwise IBD matrix is parsed for three classes of alert.

Sample Swap

When the IBD between proband and a declared parent does not match the expected parent-offspring relationship.

Consanguinity

When the IBD between the two declared parents is elevated above the expected unrelated-individuals baseline.

Duplicate Sample

When two declared family members are genetically identical, typically an upload error rather than a real biological scenario.

Alerts surface in the case summary alongside the inheritance evidence. The geneticist sees QC results before reading variant calls.

De Novo Detection

Identifies variants present in the proband but absent in both parents, with explicit confidence tiers reflecting the strength of supporting evidence.

Five-Tier Confidence Classification

Variants are classified across multiple confidence levels reflecting the strength of evidence for de novo origin. Higher tiers require both clean parental genotypes and adequate read support; lower tiers reflect ambiguity in parental coverage or genotype quality.

Chromosome-Presence Gate

When a parent genotype is NULL, the system distinguishes confident inferred homozygous-reference (the chromosome was sequenced and called as reference) from suspicious absence (the chromosome may not have been adequately covered). This is a key clinical safety mechanism preventing false de novo calls from coverage gaps.

Gender-Aware Chromosome Exceptions

Biologically expected NULLs are recognised as such: chromosome Y in a female parent, mitochondrial DNA in a father. These do not trigger confidence downgrades. Gender of each family member is part of the input.

Clean ACMG Classification Preserved

The de novo annotation is added on top of the upstream ACMG classification. The original Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign assignment from variant analysis is preserved verbatim. Geneticists see both: the classification, and the new family-aware evidence.

Compound Heterozygous Phasing

Identifies pairs of heterozygous coding or splicing variants in the same gene that may form biallelic loss of function. Phasing requires parental origin information and distinguishes in-trans (biallelic) from in-cis (single allele) configurations.

Phasing from Parental Origin

When parental genotypes are available, the service determines whether a pair of heterozygous variants in the same gene came from different parents (in trans, biallelic) or from the same parent (in cis, single-allele). Only in-trans pairs constitute genuine biallelic loss-of-function.

Multi-Partner Detection

A variant may participate in multiple candidate compound heterozygous pairs. The service flags this so that the geneticist sees the full landscape of biallelic candidates in the gene rather than only the first pair found.

Coding and Splicing Restriction

Compound heterozygous candidacy is restricted to coding and splicing consequences. Synonymous, intronic, and regulatory variants are correctly excluded, they do not constitute biallelic loss of function regardless of zygosity.

Symmetric Annotation

When variants A and B form a compound heterozygous pair, both rows in the trio table are annotated. The geneticist can land on either variant and immediately see the partner.

Segregation Scoring

Per-variant likelihood-ratio LOD scores using the Jarvik framework as specified by ClinGen SVI 2021. Mapped to ClinGen-aligned evidence bands with explicit acknowledgement of trio data limits.

ClinGen SVI 2021 Framework

Per-variant LOD scores are computed using the Jarvik likelihood-ratio framework as specified by the ClinGen Sequence Variant Interpretation Working Group. The methodology is published, peer-reviewed, and explicitly designed for clinical use.

Five-Band Mapping

LOD scores are mapped to ClinGen-aligned evidence bands, not_applicable, indeterminate, supporting, moderate, strong, very_strong. The clinical conversation works in these bands; raw LOD values remain available for audit.

Honest Trio Ceiling

A trio carries only two meioses of segregation evidence. The mathematical ceiling for trio-only LOD is approximately 0.3, supporting band at best. Most real trios land in supporting or indeterminate. Strong and very_strong bands require extended pedigree data, which is the planned next step.

Hypothesis-Aware

When the inheritance hypothesis is unknown, the segregation phase explicitly writes not_applicable rather than emitting a spurious LOD = 0 result. This prevents misinterpretation of "no evidence calculated" as "evidence against".

Feasibility Planning

Before the pipeline runs, the service evaluates which inheritance phases are feasible given the family composition and inheritance hypothesis. Feasibility flags and a free-form rationale are persisted alongside the trio analysis record, providing clinical audit trail.

FlagMeaning
de_novo_feasibleTrue when both parents are sequenced and an inheritance hypothesis consistent with de novo (autosomal dominant or sporadic) is plausible. False otherwise, for example, a duo without the affected parent.
compound_het_feasibleTrue when at least one parent is sequenced and the inheritance hypothesis is autosomal recessive. Phasing requires parental origin information.
segregation_feasibleTrue when at least two family members with affected status are present. Trio with affected proband and one affected parent qualifies; singleton does not.
plan_rationaleFree-form structured explanation of why each phase was planned as feasible or not. Provides clinical audit trail, the geneticist can verify which inheritance evidence was looked for and which was not, and why.

Inputs and Outputs

What the service consumes from the upstream pipeline and from the geneticist, and what it produces for review and downstream use.

Inputs from the Pipeline

Pre-classified classified_variants.duckdb files from variant analysis (one per family member)

ACMG/AMP classification, criteria, and supporting evidence already applied per variant

Optional joint-called VCF when available from the upstream lab

No variant re-calling, no VEP re-annotation, the service operates purely on pre-classified data

Inputs from the Geneticist

Family composition: trio (proband + mother + father), duo (proband + one parent), or proband + sibling

Per-member metadata: session ID, sex, affected status (affected / unaffected / unknown)

Inheritance hypothesis when available (autosomal dominant, autosomal recessive, X-linked, mitochondrial, sporadic, unknown)

Optional joint VCF path for sample-swap and consanguinity QC

Outputs for the Geneticist

De novo annotation per variant: confidence tier and supporting evidence

Compound heterozygous pairs: phase determination, partner variants, multi-partner flags

Segregation evidence: per-variant LOD score and ClinGen-aligned band

Sample QC summary: IBD analysis results, alerts for sample-swap or consanguinity

Feasibility and rationale: which phases were planned as feasible and why

Evidence summary JSON denormalised on the trio analysis record for fast retrieval

Outputs for Downstream Services

Persistent trio_variants DuckDB consumed by the AI Service for the family analysis report

Per-case data available to cohort-level analytics for population work

Standards and Boundaries

The service operates against published standards and within explicit clinical boundaries.

ACMG/AMP

Variant classification follows ACMG/AMP 2015 with subsequent ClinGen specifications. Performed upstream by the Variant Analysis Service. The Family Service consumes that classification and adds family-aware evidence on top, it does not reclassify.

Reference: Richards et al., Genetics in Medicine, 2015, PMID: 25741868

ClinGen SVI 2021 Segregation

Segregation LOD scoring follows the ClinGen Sequence Variant Interpretation Working Group framework for clinical use. The Jarvik likelihood-ratio methodology is published and validated.

Reference: ClinGen Sequence Variant Interpretation Working Group, 2021

Jarvik Likelihood Framework

Per-variant segregation LOD computation. Industry-standard methodology for likelihood-based segregation evidence in clinical genetics.

Reference: Jarvik and Browning, AJHG, 2016, PMID: 27374771

PLINK 1.9

Sample identity-by-descent analysis for sample-swap, consanguinity, and duplicate-sample detection. Industry-standard genetic relationship inference tool.

Reference: Chang et al., GigaScience, 2015, PMID: 25722852

Pre-Classified Data Only

The service does not re-call variants. No DeepTrio, GLnexus, or VEP invocation locally. Joint calling, when available, happens at the upstream sequencing facility and is consumed as input. This preserves ACMG classification context and avoids duplicating computation already performed by the variant analysis service.

Reporting Boundary

The service produces inheritance-annotated variant data with explicit confidence tiers, evidence bands, and feasibility rationale. It does not generate clinical interpretations, does not make diagnostic calls, and does not replace clinical review. All output is for review by a qualified clinical geneticist before any clinical action.

Data Residency

The service runs within the Helena platform on EU-based infrastructure compliant with GDPR Article 9 and 1+MG technical requirements. No family genomic data leaves the platform during analysis.

What Sets It Apart

Eight design choices that make Family and Trio Analysis distinct from generic pedigree analysis tools.

Maximum-fidelity inheritance analysis

Operates on pre-classified data from upstream variant analysis. Preserves ACMG classification context. No re-calling, no re-annotation, no information loss.

Three inheritance algorithms in one pipeline

De novo detection, compound heterozygous phasing, and segregation scoring run sequentially on the same trio data. The geneticist sees the full inheritance picture in one place.

Honest about trio limits

A trio caps at approximately 0.3 LOD, supporting band at best. We document this explicitly rather than overstating evidence strength. Strong and very_strong bands require extended pedigree.

Chromosome-presence gate

Distinguishes confident inferred homozygous-reference from suspicious absence when parent genotype is NULL. A clinical safety mechanism that prevents false de novo calls from coverage gaps.

Gender-aware exceptions

chrY NULL for a female parent and chrM NULL for a father are biologically expected, not coverage gaps. The system recognises this and does not penalise confidence.

Feasibility flags with rationale

Every analysis records which phases were planned as feasible and why. The geneticist can verify what evidence was looked for and what was not, providing clinical audit trail.

Atomic phase semantics

If a phase fails, subsequent phases are not executed. The exact failure stage is recorded. Partial state is preserved for operator inspection rather than overwritten.

Sample QC built in

PLINK IBD analysis runs as part of the standard pipeline when joint VCF is available. Sample-swap, consanguinity, and duplicate-sample alerts surface before clinical interpretation.

See Family Analysis in Practice

Request a demo to see Helena run a real trio through the full pipeline, with de novo detection, compound heterozygous phasing, and segregation scoring all surfaced alongside the upstream ACMG classification.

Contact Us