annotate-vars

Annotate variants.

Usage

Template:

exacto annotate-vars \
    --tsv-file <tsv_file> \
    --reference-gene-annotation-file <reference_gene_annotation_file> \
    --reference-gene-annotation-source <reference_gene_annotation_source> \
    --reference-gene-annotation-assembly <reference_gene_annotation_assembly> \
    --reference-gene-annotation-version <reference_gene_annotation_version> \
    --output-tsv-file <output_tsv_file> \
    [--num-threads NUM_THREADS] \
    [--gene-types GENE_TYPES [GENE_TYPES ...]] \
    [--gene-levels GENE_LEVELS [GENE_LEVELS ...]] \
    [--transcript-types TRANSCRIPT_TYPES [TRANSCRIPT_TYPES ...]] \
    [--transcript-levels TRANSCRIPT_LEVELS [TRANSCRIPT_LEVELS ...]]

Example:

exacto annotate-vars \
    --tsv-file tumor_specific_dna_variants.tsv \
    --reference-gene-annotation-file gencode.gtf.gz \
    --reference-gene-annotation-source gencode \
    --reference-gene-annotation-assembly hg38 \
    --reference-gene-annotation-version v44 \
    --output-tsv-file tumor_specific_dna_variants.annotated.tsv

Description

Annotate variants.

NoteAt a glance

Inputs: *.tsv (DNA or RNA variant calls), *.gtf.gz (gene annotation)

Outputs: *.tsv (variants with gene-level annotation)

Typical next step: integrate-vars, build-genome-var-graph

Required arguments

Flag Type Description
--tsv-file str Input TSV file. Expected columns: ‘variant_call_id’, ‘chromosome_1’, ‘position_1’, ‘chromosome_2’, ‘position_2’, ‘variant_type’, ‘variant_sequence’.
--reference-gene-annotation-file str Reference gene annotation file.
--reference-gene-annotation-source str Reference gene annotation source.
--reference-gene-annotation-assembly str Reference gene annotation assembly (e.g. ‘hg38’).
--reference-gene-annotation-version str Reference gene annotation version (e.g. ‘v41’).
--output-tsv-file str Output TSV file.

Optional arguments

Flag Type Default Description
--num-threads int 4 Number of threads.
--gene-types str protein_coding Reference gene types to include in annotation.
--gene-levels int 1, 2 Reference gene levels to include in annotation.
--transcript-types str protein_coding Reference transcript types to include in annotation.
--transcript-levels int 1, 2 Reference transcript levels to include in annotation.