call-somatic-dna-vars

Call somatic DNA variants in a long-read WGS BAM file.

Usage

Template:

exacto call-somatic-dna-vars \
    --bam-file <bam_file> \
    --bam-bai-file <bam_bai_file> \
    --control-bam-files <control_bam_files> \
    --control-bam-bai-files <control_bam_bai_files> \
    --fasta-file <fasta_file> \
    --output-tsv-file <output_tsv_file> \
    [--preset PRESET] \
    [--num-threads NUM_THREADS] \
    [--regions REGIONS [REGIONS ...]] \
    [--min-reads MIN_READS] \
    [--min-mapping-quality MIN_MAPPING_QUALITY] \
    [--min-base-quality MIN_BASE_QUALITY] \
    [--min-total-depth MIN_TOTAL_DEPTH] \
    [--min-alt-allele-fraction MIN_ALT_ALLELE_FRACTION] \
    [--min-size-proportion MIN_SIZE_PROPORTION] \
    [--max-ins-norm-edit-distance MAX_INS_NORM_EDIT_DISTANCE] \
    [--max-intrachromosomal-distance MAX_INTRACHROMOSOMAL_DISTANCE] \
    [--max-intrachromosomal-distance-tau MAX_INTRACHROMOSOMAL_DISTANCE_TAU] \
    [--max-interchromosomal-distance MAX_INTERCHROMOSOMAL_DISTANCE] \
    [--max-slippage-repeat-length MAX_SLIPPAGE_REPEAT_LENGTH] \
    [--apply-infinite-sites-assumption APPLY_INFINITE_SITES_ASSUMPTION] \
    [--chunk-size CHUNK_SIZE] \
    [--max-records MAX_RECORDS] \
    [--expected-variant-allele-fraction EXPECTED_VARIANT_ALLELE_FRACTION] \
    [--expected-mutation-rate EXPECTED_MUTATION_RATE] \
    [--expected-sequencing-error EXPECTED_SEQUENCING_ERROR] \
    [--expected-slippage-probability EXPECTED_SLIPPAGE_PROBABILITY] \
    [--max-f1-fraction MAX_F1_FRACTION] \
    [--max-fpr MAX_FPR] \
    [--temp-dir TEMP_DIR]

Example:

exacto call-somatic-dna-vars \
    --bam-file tumor_dna.sorted.bam \
    --bam-bai-file tumor_dna.sorted.bam.bai \
    --control-bam-files normal_dna.sorted.bam \
    --control-bam-bai-files normal_dna.sorted.bam.bai \
    --fasta-file reference_genome.fasta \
    --preset pb \
    --output-tsv-file tumor_somatic_dna_variants.tsv

Description

Call somatic DNA variants in a long-read WGS BAM file.

NoteAt a glance

Inputs: *.bam, *.bam.bai, *.fasta, plus one or more control *.bam / *.bam.bai

Outputs: *.tsv (one row per somatic DNA variant call)

Typical next step: annotate-vars

Required arguments

Flag Type Description
--bam-file str Input BAM file.
--bam-bai-file str Input BAM.BAI file.
--control-bam-files str Input control BAM file(s) (e.g. –control-bam-files BAM_FILE_1 BAM_FILE_2).
--control-bam-bai-files str Input control BAM.BAI file(s) (e.g. –control-bam-bai-files BAM_BAI_FILE_1 BAM_BAI_FILE_2).
--fasta-file str Input reference genome FASTA file.
--output-tsv-file str Output TSV file.

Optional arguments

Flag Type Default Description
--preset str (pb|ont) Sequencing-platform preset that fills platform-typical defaults for –expected-sequencing-error, and –expected-slippage-probability. Choices: ‘pb’ (PacBio HiFi) or ‘ont’ (Oxford Nanopore). Any explicit parameter wins over the preset.
--num-threads int 4 Number of threads.
--regions str Genomic regions in which to identify variants (e.g. –regions chr1 chr2 or –regions chr1:1-1000000 chr2:1-1000000). If unspecified, Exacto identifies variants in all contigs found in the BAM file (–bam-file BAM_FILE).
--min-reads int 3 Minimum number of supporting reads.
--min-mapping-quality int 4 Minimum mapping quality.
--min-base-quality int 30 Minimum base quality.
--min-total-depth int 3 Minimum total depth.
--min-alt-allele-fraction float 0.2 Minimum alternate allele fraction.
--min-size-proportion float 0.5 Minimum size proportion between two variants. Size proportion = smaller variant size / longer variant size.
--max-ins-norm-edit-distance float 0.5 Maximum insertion normalized edit (Levenshtein) distance. Normalized edit distance = edit distance / longer insertion size.
--max-intrachromosomal-distance int 1000 Maximum distance for clustering intrachromomsomal variants.
--max-intrachromosomal-distance-tau int 2000 Maximum distance tau for clustering intrachromomsomal variants.
--max-interchromosomal-distance int 1000 Maximum distance for clustering intrachromomsomal variants.
--max-slippage-repeat-length int 30 Maximum slippage repeat length.
--apply-infinite-sites-assumption str2bool yes If ‘yes’, apply infinite sites assumption to the variant calling. That is, if a variant in the BAM file shares breakpoint with any of the variant inany of the control BAM files, filter it out.
--chunk-size int 100000 Chunk size for variant calling.
--max-records int 7 Maximum number of records. Read names having more than this value will be excluded.
--expected-variant-allele-fraction float 0.25 Expected variant allele fraction.
--expected-mutation-rate float 1e-06 Expected mutation rate.
--expected-sequencing-error float 0.01 Expected sequencing error rate.
--expected-slippage-probability float 0.02 Expected slippage probability. Suggested: 2x the expected sequencing error rate.
--max-f1-fraction float 0.99 Maximum F1 value fraction.
--max-fpr float 1e-06 Maximum false positive rate.
--temp-dir str Temp directory.