Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

binx dosage

Estimate genotype dosages from sequencing read count data.

Synopsis

binx dosage --ploidy <INT> [INPUT OPTIONS] [OPTIONS]

Description

The dosage command estimates genotype dosages from read count data using algorithms based on the R/Updog package (Gerard et al., 2018). This is useful when working with genotyping-by-sequencing (GBS) or similar data where discrete genotype calls may be uncertain.

Required Arguments

ArgumentDescription
--ploidy <INT>Ploidy level (e.g., 2, 4, 6)

Input Options

Choose one of the following input modes:

binx dosage --vcf <FILE> --ploidy 4 --output dosages.tsv
OptionDescription
--vcf <FILE>VCF file (plain or gzipped) with FORMAT/AD allele depths
--chunk-size <INT>Chunk size for streaming VCF markers (default: stream one by one)

Two-Line CSV Mode

binx dosage --csv <FILE> --ploidy 4 --output dosages.tsv
OptionDescription
--csv <FILE>CSV file with alternating lines of Ref and Total counts per locus

Matrix Mode

binx dosage --counts --ref-path ref.tsv --total-path total.tsv --ploidy 4 --output dosages.tsv
OptionDescription
--countsEnable matrix mode
--ref-path <FILE>Ref count matrix (markers in rows, samples in columns; first column marker ID)
--total-path <FILE>Total count matrix (markers in rows, samples in columns; first column marker ID)

Options

OptionDefaultDescription
--output <FILE>stdoutOutput file path
--mode <MODE>autoOptimization mode (see below)
--format <FMT>matrixOutput format (see below)
--compress <MODE>noneCompression: none or gzip
--threads <INT>num_cpusNumber of threads for parallel processing
--verbosefalseEnable verbose output

Optimization Modes

ModeDescription
autoAutomatically select best mode based on data
updogStandard Updog algorithm
updog-fastFaster Updog with approximations
updog-exactExact Updog (slower, more accurate)
fastFast estimation
turboFastest estimation
turboautoTurbo with automatic parameter selection
turboauto-safeTurboauto with additional safety checks

Output Formats

FormatDescription
matrixSimple dosage matrix (markers x samples)
statsDetailed statistics per marker
beagleBEAGLE format for imputation
vcfVCF format with dosage annotations
plinkPLINK raw format
gwaspolyGWASpoly-compatible format (marker, chrom, pos, samples…)

Examples

Basic Dosage Estimation from VCF

binx dosage \
  --vcf variants.vcf.gz \
  --ploidy 4 \
  --output dosages.tsv

Output in GWASpoly Format

binx dosage \
  --vcf variants.vcf.gz \
  --ploidy 4 \
  --format gwaspoly \
  --output genotypes.tsv

Parallel Processing with Chunks

binx dosage \
  --vcf variants.vcf.gz \
  --ploidy 4 \
  --chunk-size 1000 \
  --threads 8 \
  --output dosages.tsv

From Two-Line CSV

binx dosage \
  --csv read_counts.csv \
  --ploidy 4 \
  --mode updog \
  --output dosages.tsv

From Separate Ref/Total Matrices

binx dosage \
  --counts \
  --ref-path ref_counts.tsv \
  --total-path total_counts.tsv \
  --ploidy 4 \
  --output dosages.tsv

Compressed Output

binx dosage \
  --vcf variants.vcf.gz \
  --ploidy 4 \
  --compress gzip \
  --output dosages.tsv.gz

Input Formats

VCF Format

The VCF file should contain the AD (Allelic Depths) field in the FORMAT column:

#CHROM  POS     ID      REF  ALT  QUAL  FILTER  INFO  FORMAT      Sample1   Sample2
chr1    1000    SNP001  A    T    .     .       .     GT:AD:DP    0/1:10,5:15   1/1:2,18:20

Two-Line CSV Format

Alternating lines of reference and total counts:

locus,Sample1,Sample2,Sample3
SNP001,10,2,15
SNP001,15,20,18
SNP002,8,12,5
SNP002,16,25,10

Matrix Format

Two separate files with matching structure:

ref_counts.tsv:

marker_id	Sample1	Sample2	Sample3
SNP001	10	2	15
SNP002	8	12	5

total_counts.tsv:

marker_id	Sample1	Sample2	Sample3
SNP001	15	20	18
SNP002	16	25	10

Output Format

The default matrix format:

marker_id	Sample1	Sample2	Sample3
SNP001	1	4	2
SNP002	2	2	1

The gwaspoly format (suitable for binx gwas):

Marker	Chrom	Position	Sample1	Sample2	Sample3
SNP001	chr1	1000	1	4	2
SNP002	chr1	2000	2	2	1

See Also