Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Working with Polyploids

This tutorial covers GWAS analysis in polyploid species using Binx’s specialized genetic models.

Introduction to Polyploid GWAS

Polyploid species (tetraploids, hexaploids, etc.) have more than two copies of each chromosome, which creates unique challenges and opportunities for GWAS:

  • More allele combinations: A tetraploid has 5 possible genotypes per locus (0-4 copies)
  • Complex inheritance: Dominance relationships are more nuanced
  • Higher genetic diversity: More combinations can influence traits

Binx implements the GWASpoly framework, which models various forms of allele dosage effects.

Genetic Models for Polyploids

Understanding Dosage Effects

In a tetraploid, the five genotypes (AAAA, AAAB, AABB, ABBB, BBBB) can affect traits differently:

ModelAssumptionBest For
AdditiveLinear dosage effectQuantitative traits with dosage dependence
GeneralNo assumption (4 df)Unknown inheritance; hypothesis generation
Simplex dominantOne B allele is sufficientTraits with low-dosage dominance
Duplex dominantTwo B alleles are sufficientIntermediate dominance

Choosing Models

Start with additive + general:

binx gwas \
  --geno genotypes.tsv \
  --pheno phenotypes.csv \
  --trait yield \
  --ploidy 4 \
  --models additive,general \
  --out results.csv
  • Additive captures dosage-dependent effects
  • General captures any pattern (exploratory)

Then investigate specific dominance patterns:

binx gwas \
  --geno genotypes.tsv \
  --pheno phenotypes.csv \
  --trait disease_resistance \
  --ploidy 4 \
  --models additive,1-dom,2-dom \
  --out results.csv

Example: Tetraploid Potato GWAS

Let’s analyze a tetraploid potato dataset for tuber yield.

Step 1: Verify Ploidy in Data

Check that dosage values are appropriate:

# Find max dosage value
awk -F'\t' 'NR>1 {for(i=4;i<=NF;i++) if($i>max) max=$i} END {print "Max dosage:", max}' genotypes.tsv

For tetraploid data, max should be 4.

Step 2: Compute Polyploid Kinship

binx kinship \
  --geno genotypes.tsv \
  --ploidy 4 \
  --output kinship.tsv

Step 3: Run Multi-Model GWAS

binx gwas \
  --geno genotypes.tsv \
  --pheno phenotypes.csv \
  --trait yield \
  --kinship kinship.tsv \
  --ploidy 4 \
  --models additive,general,1-dom,2-dom \
  --loco \
  --out gwas_results.csv

Step 4: Compare Models

Extract results by model:

# Count significant hits per model (threshold -log10p > 5)
awk -F',' 'NR>1 && $8>5 {count[$4]++} END {for(m in count) print m, count[m]}' gwas_results.csv

Create model-specific Manhattan plots:

for model in additive general 1-dom-ref 1-dom-alt 2-dom-ref 2-dom-alt; do
  binx plot \
    --input gwas_results.csv \
    --plot-type manhattan \
    --model $model \
    --threshold 5 \
    --title "Yield GWAS - $model model" \
    --output manhattan_${model}.svg
done

Step 5: Interpret Model-Specific Results

If a QTL is significant under:

  • Additive only: Dosage-dependent effect (each additional allele adds to trait)
  • 1-dom only: Presence/absence effect (one copy is enough)
  • General but not additive: Complex dominance pattern
  • Multiple models: Robust association, exact inheritance unclear

Hexaploid Analysis

For hexaploid species (ploidy=6), the same workflow applies:

binx gwas \
  --geno genotypes.tsv \
  --pheno phenotypes.csv \
  --trait yield \
  --ploidy 6 \
  --models additive,general \
  --out results.csv

Hexaploids have 7 possible dosage values (0-6) and even more complex dominance patterns.

Tips for Polyploid GWAS

Sample Size

Polyploids need larger sample sizes due to:

  • More parameters in genetic models
  • Lower power to detect effects
  • Recommendation: 200+ samples for tetraploids

MAF Filtering

Be careful with MAF filtering in polyploids:

# More lenient MAF for polyploids
binx gwas \
  --geno genotypes.tsv \
  --pheno phenotypes.csv \
  --trait yield \
  --ploidy 4 \
  --min-maf 0.02 \
  --out results.csv

Low-frequency variants in polyploids can still be informative.

Interpreting Effect Sizes

Effect sizes are reported for single-parameter models:

  • Additive: Effect per dosage unit (in trait units)
  • Dominance models (1-dom, 2-dom, etc.): Effect of the dominant group vs reference

Note: The general model does not report effect sizes because it performs a joint test of multiple parameters. Use it for detecting associations with complex inheritance, then follow up with specific models to estimate effects.

Diploidized Analysis

Sometimes you want to treat polyploid data as diploid-like:

binx gwas \
  --geno genotypes.tsv \
  --pheno phenotypes.csv \
  --trait yield \
  --ploidy 4 \
  --models diplo-additive,diplo-general \
  --out results.csv

This collapses dosage categories:

  • 0 → “AA-like”
  • 1, 2, 3 → “AB-like” (heterozygotes)
  • 4 → “BB-like”

Useful when expecting diploid-like inheritance in a polyploid.

See Also