Quick Start
This guide will walk you through your first Binx analysis in under 5 minutes.
Overview
A typical Binx workflow looks like this:
VCF file → binx convert → Genotype file ─┐
├─→ binx gwas → Results → binx plot
Phenotype file ─────┘
Step 1: Prepare Your Data
Binx requires two main input files:
Genotype File
A tab-separated file with marker information and sample dosages:
marker_id chrom pos Sample1 Sample2 Sample3
SNP001 1 1000 0 2 4
SNP002 1 2000 1 1 3
SNP003 2 1500 2 2 2
- First three columns:
marker_id,chrom,pos - Remaining columns: sample dosage values (0 to ploidy)
Phenotype File
A CSV/TSV file with sample IDs and trait values:
sample_id,yield,height,env
Sample1,45.2,120,field_A
Sample2,52.1,115,field_A
Sample3,48.7,125,field_B
- First column:
sample_id(must match genotype column headers) - Remaining columns: traits and covariates
Step 2: Convert VCF (if needed)
If your genotypes are in VCF format, convert them first:
binx convert \
--vcf your_data.vcf.gz \
--format gwaspoly \
--output genotypes.tsv
Step 3: Run GWAS
Run a genome-wide association study:
binx gwas \
--geno genotypes.tsv \
--pheno phenotypes.csv \
--trait yield \
--ploidy 4 \
--models additive \
--out gwas_results.csv
Understanding the Parameters
| Parameter | Description |
|---|---|
--geno | Path to genotype file |
--pheno | Path to phenotype file |
--trait | Column name of the trait to analyze |
--ploidy | Ploidy level (2, 4, 6, etc.) |
--models | Genetic models to test |
--out | Output file path |
Step 4: Examine Results
The output CSV contains association results for each marker:
marker_id,chrom,pos,model,score,p_value,effect,n_obs,threshold
SNP001,1,1000,additive,4.49,3.2e-05,0.52,198,5.0
SNP002,1,2000,additive,0.33,0.47,0.08,200,5.0
...
Key columns:
score: -log10 transformed p-value (useful for plotting)p_value: Association p-valueeffect: Effect size estimaten_obs: Sample size (non-missing)threshold: Significance threshold used
Step 5: Visualize Results
Create a Manhattan plot:
binx plot \
--input gwas_results.csv \
--plot-type manhattan \
--model additive \
--threshold 5 \
--output manhattan.svg
Create a QQ plot:
binx plot \
--input gwas_results.csv \
--plot-type qq \
--model additive \
--output qq.svg
Step 6: Identify QTLs
Extract significant QTLs:
binx qtl \
--input gwas_results.csv \
--bp-window 10000000 \
--output significant_qtls.csv
Complete Example Script
Here’s a complete analysis pipeline you can adapt:
#!/bin/bash
# Define input files
VCF="data/samples.vcf.gz"
PHENO="data/phenotypes.csv"
TRAIT="yield"
PLOIDY=4
# Create output directory
mkdir -p results
# Step 1: Convert VCF to Binx format
binx convert \
--vcf $VCF \
--format gwaspoly \
--output results/genotypes.tsv
# Step 2: Run GWAS with multiple models
binx gwas \
--geno results/genotypes.tsv \
--pheno $PHENO \
--trait $TRAIT \
--ploidy $PLOIDY \
--models additive,general,1-dom,2-dom \
--out results/gwas_results.csv
# Step 3: Generate plots
binx plot \
--input results/gwas_results.csv \
--plot-type manhattan \
--model additive \
--threshold 5 \
--output results/manhattan.svg
binx plot \
--input results/gwas_results.csv \
--plot-type qq \
--model additive \
--output results/qq.svg
# Step 4: Extract QTLs
binx qtl \
--input results/gwas_results.csv \
--bp-window 10000000 \
--output results/qtls.csv
echo "Analysis complete! Results in results/"
Next Steps
- Learn about Input Formats in detail
- Explore Genetic Models available in Binx
- Follow the First GWAS Tutorial for a more detailed walkthrough
- Check the Command Reference for all options