call-somatic-dna-vars
Call somatic DNA variants in a long-read WGS BAM file.
Usage
Template:
exacto call-somatic-dna-vars \
--bam-file <bam_file> \
--bam-bai-file <bam_bai_file> \
--control-bam-files <control_bam_files> \
--control-bam-bai-files <control_bam_bai_files> \
--fasta-file <fasta_file> \
--output-tsv-file <output_tsv_file> \
[--preset PRESET] \
[--num-threads NUM_THREADS] \
[--regions REGIONS [REGIONS ...]] \
[--min-reads MIN_READS] \
[--min-mapping-quality MIN_MAPPING_QUALITY] \
[--min-base-quality MIN_BASE_QUALITY] \
[--min-total-depth MIN_TOTAL_DEPTH] \
[--min-alt-allele-fraction MIN_ALT_ALLELE_FRACTION] \
[--min-size-proportion MIN_SIZE_PROPORTION] \
[--max-ins-norm-edit-distance MAX_INS_NORM_EDIT_DISTANCE] \
[--max-intrachromosomal-distance MAX_INTRACHROMOSOMAL_DISTANCE] \
[--max-intrachromosomal-distance-tau MAX_INTRACHROMOSOMAL_DISTANCE_TAU] \
[--max-interchromosomal-distance MAX_INTERCHROMOSOMAL_DISTANCE] \
[--max-slippage-repeat-length MAX_SLIPPAGE_REPEAT_LENGTH] \
[--apply-infinite-sites-assumption APPLY_INFINITE_SITES_ASSUMPTION] \
[--chunk-size CHUNK_SIZE] \
[--max-records MAX_RECORDS] \
[--expected-variant-allele-fraction EXPECTED_VARIANT_ALLELE_FRACTION] \
[--expected-mutation-rate EXPECTED_MUTATION_RATE] \
[--expected-sequencing-error EXPECTED_SEQUENCING_ERROR] \
[--expected-slippage-probability EXPECTED_SLIPPAGE_PROBABILITY] \
[--max-f1-fraction MAX_F1_FRACTION] \
[--max-fpr MAX_FPR] \
[--temp-dir TEMP_DIR]Example:
exacto call-somatic-dna-vars \
--bam-file tumor_dna.sorted.bam \
--bam-bai-file tumor_dna.sorted.bam.bai \
--control-bam-files normal_dna.sorted.bam \
--control-bam-bai-files normal_dna.sorted.bam.bai \
--fasta-file reference_genome.fasta \
--preset pb \
--output-tsv-file tumor_somatic_dna_variants.tsvDescription
Call somatic DNA variants in a long-read WGS BAM file.
NoteAt a glance
Inputs: *.bam, *.bam.bai, *.fasta, plus one or more control *.bam / *.bam.bai
Outputs: *.tsv (one row per somatic DNA variant call)
Typical next step: annotate-vars
Required arguments
| Flag | Type | Description |
|---|---|---|
--bam-file |
str |
Input BAM file. |
--bam-bai-file |
str |
Input BAM.BAI file. |
--control-bam-files |
str |
Input control BAM file(s) (e.g. –control-bam-files BAM_FILE_1 BAM_FILE_2). |
--control-bam-bai-files |
str |
Input control BAM.BAI file(s) (e.g. –control-bam-bai-files BAM_BAI_FILE_1 BAM_BAI_FILE_2). |
--fasta-file |
str |
Input reference genome FASTA file. |
--output-tsv-file |
str |
Output TSV file. |
Optional arguments
| Flag | Type | Default | Description |
|---|---|---|---|
--preset |
str (pb|ont) |
Sequencing-platform preset that fills platform-typical defaults for –expected-sequencing-error, and –expected-slippage-probability. Choices: ‘pb’ (PacBio HiFi) or ‘ont’ (Oxford Nanopore). Any explicit parameter wins over the preset. | |
--num-threads |
int |
4 |
Number of threads. |
--regions |
str |
Genomic regions in which to identify variants (e.g. –regions chr1 chr2 or –regions chr1:1-1000000 chr2:1-1000000). If unspecified, Exacto identifies variants in all contigs found in the BAM file (–bam-file BAM_FILE). | |
--min-reads |
int |
3 |
Minimum number of supporting reads. |
--min-mapping-quality |
int |
4 |
Minimum mapping quality. |
--min-base-quality |
int |
30 |
Minimum base quality. |
--min-total-depth |
int |
3 |
Minimum total depth. |
--min-alt-allele-fraction |
float |
0.2 |
Minimum alternate allele fraction. |
--min-size-proportion |
float |
0.5 |
Minimum size proportion between two variants. Size proportion = smaller variant size / longer variant size. |
--max-ins-norm-edit-distance |
float |
0.5 |
Maximum insertion normalized edit (Levenshtein) distance. Normalized edit distance = edit distance / longer insertion size. |
--max-intrachromosomal-distance |
int |
1000 |
Maximum distance for clustering intrachromomsomal variants. |
--max-intrachromosomal-distance-tau |
int |
2000 |
Maximum distance tau for clustering intrachromomsomal variants. |
--max-interchromosomal-distance |
int |
1000 |
Maximum distance for clustering intrachromomsomal variants. |
--max-slippage-repeat-length |
int |
30 |
Maximum slippage repeat length. |
--apply-infinite-sites-assumption |
str2bool |
yes |
If ‘yes’, apply infinite sites assumption to the variant calling. That is, if a variant in the BAM file shares breakpoint with any of the variant inany of the control BAM files, filter it out. |
--chunk-size |
int |
100000 |
Chunk size for variant calling. |
--max-records |
int |
7 |
Maximum number of records. Read names having more than this value will be excluded. |
--expected-variant-allele-fraction |
float |
0.25 |
Expected variant allele fraction. |
--expected-mutation-rate |
float |
1e-06 |
Expected mutation rate. |
--expected-sequencing-error |
float |
0.01 |
Expected sequencing error rate. |
--expected-slippage-probability |
float |
0.02 |
Expected slippage probability. Suggested: 2x the expected sequencing error rate. |
--max-f1-fraction |
float |
0.99 |
Maximum F1 value fraction. |
--max-fpr |
float |
1e-06 |
Maximum false positive rate. |
--temp-dir |
str |
Temp directory. |