Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

binx gwas

Perform genome-wide association studies (GWAS) using GWASpoly-style methods with support for multiple genetic models.

Synopsis

binx gwas --geno <FILE> --pheno <FILE> --out <FILE> --trait <NAME> --ploidy <INT> [OPTIONS]

Description

The gwas command performs association analysis between genetic markers and phenotypic traits. It implements the statistical methods from GWASpoly (Rosyara et al., 2016) and rrBLUP (Endelman, 2011), supporting both diploid and polyploid species.

Key features:

  • Multiple genetic models for polyploid analysis
  • Mixed model framework (K model, P+K model)
  • Leave-One-Chromosome-Out (LOCO) kinship
  • Support for covariates and multi-environment trials

Required Arguments

ArgumentDescription
--geno <FILE>Genotype dosage file (TSV: marker, chr, pos, samples…). Missing values (NA) are imputed with marker mean.
--pheno <FILE>Phenotype file (CSV: sample_id, traits…)
--out <FILE>Output results CSV
--trait <NAME>Trait name to analyze
--ploidy <INT>Ploidy level (e.g., 2, 4, 6)

Missing Value Handling: Missing genotype values (NA) are automatically imputed with the marker mean. This matches the default behavior of R/GWASpoly.

Options

Method

OptionDefaultDescription
--method <METHOD>gwaspolyGWAS method to use

Analysis Options

OptionDefaultDescription
--models <LIST>additive,generalGenetic models to test (comma-separated)
--kinship <FILE>-Pre-computed kinship matrix TSV (optional; auto-generated if omitted)
--locofalseUse Leave-One-Chromosome-Out kinship
--n-pc <INT>0Number of principal components to include as fixed effects (P+K model)
--covariates <LIST>-Covariates from phenotype file (comma-separated)

Note: If --kinship is not provided, Binx automatically computes a kinship matrix using gwaspoly-rs’s set_k() function (equivalent to R/GWASpoly’s set.K()). Pre-computing with binx kinship is recommended when running multiple traits to avoid redundant computation.

QC Filters

OptionDefaultDescription
--min-maf <FLOAT>0.0Minimum minor allele frequency (0.0-0.5)
--max-geno-freq <FLOAT>0.0Maximum genotype frequency (0.0-1.0, 0=auto)
--allow-missing-samplesfalseAllow samples in geno but not in pheno

Threshold Options

OptionDefaultDescription
--threshold <METHOD>-Threshold method: m.eff (recommended), bonferroni, or fdr
--alpha <FLOAT>0.05Significance level

Output Options

OptionDefaultDescription
--plot <TYPE>-Generate plots: manhattan, qq, or both
--plot-output <FILE>-Custom path for plot files
--parallelfalseUse parallel marker testing

Genetic Models

The --models option accepts the following values. These match R/GWASpoly’s gene action models (Rosyara et al., 2016).

For Diploids (ploidy=2)

ModelDescriptionEncodingdf
additiveLinear dosage effect0, 1, 21
generalSeparate effect per dosagedummy coded2
1-domDominant (tests both ref and alt)1 each
1-dom-refDominant (ref group distinct)0, 1, 11
1-dom-altDominant (alt group distinct)0, 0, 11

For Tetraploids (ploidy=4)

ModelDescriptionEncodingdf
additiveLinear dosage effect0, 1, 2, 3, 41
generalSeparate effect per dosagedummy coded4
1-domSimplex dominant (tests both ref and alt)1 each
1-dom-refSimplex dominant (ref group distinct)0, 1, 1, 1, 11
1-dom-altSimplex dominant (alt group distinct)0, 0, 0, 0, 11
2-domDuplex dominant (tests both ref and alt)1 each
2-dom-refDuplex dominant (ref side distinct)0, 0, 1, 1, 11
2-dom-altDuplex dominant (alt side distinct)0, 0, 0, 1, 11
diplo-generalDiploidized general (hets collapsed)dummy coded2
diplo-additiveDiploidized additive (hets = 0.5)0, 0.5, 0.5, 0.5, 11

Model Expansion

Like R/GWASpoly, specifying 1-dom automatically tests both 1-dom-ref and 1-dom-alt. Similarly, 2-dom expands to both 2-dom-ref and 2-dom-alt. Use the specific -ref or -alt variants if you only want one direction.

Using Multiple Models

Specify multiple models separated by commas:

binx gwas --models additive,general,1-dom,2-dom ...

See Genetic Models Reference for detailed explanations.

Examples

Basic GWAS

binx gwas \
  --geno genotypes.tsv \
  --pheno phenotypes.csv \
  --trait yield \
  --ploidy 4 \
  --out results.csv

With Pre-computed Kinship

# First compute kinship
binx kinship --geno genotypes.tsv --out kinship.tsv

# Then run GWAS
binx gwas \
  --geno genotypes.tsv \
  --pheno phenotypes.csv \
  --trait yield \
  --kinship kinship.tsv \
  --ploidy 4 \
  --out results.csv

LOCO Analysis

Leave-One-Chromosome-Out reduces proximal contamination:

binx gwas \
  --geno genotypes.tsv \
  --pheno phenotypes.csv \
  --trait yield \
  --ploidy 4 \
  --loco \
  --out results.csv

With Covariates and PCs

binx gwas \
  --geno genotypes.tsv \
  --pheno phenotypes.csv \
  --trait yield \
  --covariates environment,block \
  --n-pc 3 \
  --ploidy 4 \
  --out results.csv

Multiple Genetic Models

binx gwas \
  --geno genotypes.tsv \
  --pheno phenotypes.csv \
  --trait yield \
  --ploidy 4 \
  --models additive,general,1-dom,2-dom \
  --out results.csv

With Threshold Calculation

binx gwas \
  --geno genotypes.tsv \
  --pheno phenotypes.csv \
  --trait yield \
  --ploidy 4 \
  --models additive,general \
  --threshold m.eff \
  --out results.csv

With Plots

binx gwas \
  --geno genotypes.tsv \
  --pheno phenotypes.csv \
  --trait yield \
  --ploidy 4 \
  --kinship kinship.tsv \
  --plot both \
  --out results.csv

Output Format

The output file contains the following columns:

ColumnDescription
marker_idMarker identifier
chromChromosome
posBase pair position
modelGenetic model used
score-log10(p-value)
p_valueAssociation p-value
effectEffect size estimate
n_obsSample size (non-missing)
thresholdSignificance threshold used

Example Output

marker_id,chrom,pos,model,score,p_value,effect,n_obs,threshold
SNP_1_1000,1,1000,additive,4.49,3.21e-05,0.523,198,5.0
SNP_1_2000,1,2000,additive,0.33,0.469,0.081,200,5.0
SNP_1_3500,1,3500,additive,2.84,1.45e-03,-0.312,195,5.0

Statistical Details

Mixed Model

The GWAS uses a linear mixed model:

y = Xβ + Zu + e

Where:

  • y = phenotype vector
  • X = fixed effects design matrix (intercept, covariates, marker)
  • β = fixed effects
  • Z = random effects design matrix
  • u ~ N(0, Kσ²ᵤ) = random polygenic effects
  • K = kinship matrix
  • e ~ N(0, Iσ²ₑ) = residual errors

P+K Model

When --n-pc is specified, principal components are included as fixed effects to account for population structure (P+K model).

LOCO

With --loco, the kinship matrix is recalculated for each chromosome, excluding markers on the chromosome being tested. This prevents the tested marker from influencing its own significance through the kinship matrix.

Tips and Best Practices

  1. Choose appropriate models: For autopolyploids, start with additive and general. The general model captures complex dominance patterns but uses more degrees of freedom.

  2. Use LOCO for accurate p-values: LOCO prevents proximal contamination and generally provides better-calibrated p-values.

  3. Pre-compute kinship for efficiency: If running multiple traits, compute the kinship matrix once and reuse it.

  4. Filter markers: Use --min-maf to remove rare variants that have low power.

  5. Calculate significance thresholds: Use --threshold m.eff to compute the effective number of tests threshold, which accounts for LD between markers.

  6. Generate plots directly: Use --plot both to generate Manhattan and QQ plots automatically after GWAS completes.

  7. Use parallel mode: For large datasets, --parallel can speed up marker testing.

See Also