Quick Start

This guide will walk you through your first Binx analysis in under 5 minutes.

Overview

A typical Binx workflow looks like this:

VCF file → binx convert → Genotype file ─┐
                                         ├─→ binx gwas → Results → binx plot
                     Phenotype file ─────┘

Step 1: Prepare Your Data

Binx requires two main input files:

Genotype File

A tab-separated file with marker information and sample dosages:

marker_id   chrom   pos     Sample1 Sample2 Sample3
SNP001      1       1000    0       2       4
SNP002      1       2000    1       1       3
SNP003      2       1500    2       2       2

First three columns: marker_id, chrom, pos
Remaining columns: sample dosage values (0 to ploidy)

Phenotype File

A CSV/TSV file with sample IDs and trait values:

sample_id,yield,height,env
Sample1,45.2,120,field_A
Sample2,52.1,115,field_A
Sample3,48.7,125,field_B

First column: sample_id (must match genotype column headers)
Remaining columns: traits and covariates

Step 2: Convert VCF (if needed)

If your genotypes are in VCF format, convert them first:

binx convert \
  --vcf your_data.vcf.gz \
  --format gwaspoly \
  --output genotypes.tsv

Step 3: Run GWAS

Run a genome-wide association study:

binx gwas \
  --geno genotypes.tsv \
  --pheno phenotypes.csv \
  --trait yield \
  --ploidy 4 \
  --models additive \
  --out gwas_results.csv

Understanding the Parameters

Parameter	Description
`--geno`	Path to genotype file
`--pheno`	Path to phenotype file
`--trait`	Column name of the trait to analyze
`--ploidy`	Ploidy level (2, 4, 6, etc.)
`--models`	Genetic models to test
`--out`	Output file path

Step 4: Examine Results

The output CSV contains association results for each marker:

marker_id,chrom,pos,model,score,p_value,effect,n_obs,threshold
SNP001,1,1000,additive,4.49,3.2e-05,0.52,198,5.0
SNP002,1,2000,additive,0.33,0.47,0.08,200,5.0
...

Key columns:

score: -log10 transformed p-value (useful for plotting)
p_value: Association p-value
effect: Effect size estimate
n_obs: Sample size (non-missing)
threshold: Significance threshold used

Step 5: Visualize Results

Create a Manhattan plot:

binx plot \
  --input gwas_results.csv \
  --plot-type manhattan \
  --model additive \
  --threshold 5 \
  --output manhattan.svg

Create a QQ plot:

binx plot \
  --input gwas_results.csv \
  --plot-type qq \
  --model additive \
  --output qq.svg

Step 6: Identify QTLs

Extract significant QTLs:

binx qtl \
  --input gwas_results.csv \
  --bp-window 10000000 \
  --output significant_qtls.csv

Complete Example Script

Here’s a complete analysis pipeline you can adapt:

#!/bin/bash

# Define input files
VCF="data/samples.vcf.gz"
PHENO="data/phenotypes.csv"
TRAIT="yield"
PLOIDY=4

# Create output directory
mkdir -p results

# Step 1: Convert VCF to Binx format
binx convert \
  --vcf $VCF \
  --format gwaspoly \
  --output results/genotypes.tsv

# Step 2: Run GWAS with multiple models
binx gwas \
  --geno results/genotypes.tsv \
  --pheno $PHENO \
  --trait $TRAIT \
  --ploidy $PLOIDY \
  --models additive,general,1-dom,2-dom \
  --out results/gwas_results.csv

# Step 3: Generate plots
binx plot \
  --input results/gwas_results.csv \
  --plot-type manhattan \
  --model additive \
  --threshold 5 \
  --output results/manhattan.svg

binx plot \
  --input results/gwas_results.csv \
  --plot-type qq \
  --model additive \
  --output results/qq.svg

# Step 4: Extract QTLs
binx qtl \
  --input results/gwas_results.csv \
  --bp-window 10000000 \
  --output results/qtls.csv

echo "Analysis complete! Results in results/"

Next Steps

Learn about Input Formats in detail
Explore Genetic Models available in Binx
Follow the First GWAS Tutorial for a more detailed walkthrough
Check the Command Reference for all options

Keyboard shortcuts

Binx Documentation