build-genome-var-graph

Build a genome variation graph.

Usage

Template:

exacto build-genome-var-graph \
    --variants-tsv-file <variants_tsv_file> \
    --fasta-file <fasta_file> \
    --output-fasta-file <output_fasta_file> \
    --sequence-prefix <sequence_prefix> \
    [--remove-unknown-bases REMOVE_UNKNOWN_BASES] \
    [--only-variant-sequences ONLY_VARIANT_SEQUENCES] \
    [--graph-type GRAPH_TYPE] \
    [--num-threads NUM_THREADS]

Example:

exacto build-genome-var-graph \
    --variants-tsv-file tumor_dna_rna_variants_integrated.tsv \
    --fasta-file reference_genome.fasta \
    --output-dir genome_var_graph_outputs/

Description

Build a genome variation graph.

NoteAt a glance

Inputs: *.tsv (variants), *.fasta (reference genome)

Outputs: Graph files written to --output-dir

Typical next step: Endpoint — feeds downstream graph-aware analyses

Required arguments

Flag Type Description
--variants-tsv-file str Variants TSV file. Expected columns: ‘variant_id’, ‘chromosome_1’, ‘position_1’, ‘operation_1’, ‘strand_1’, ‘chromosome_2’, ‘position_2’, ‘operation_2’, ‘strand_2’, ‘sequence’.
--fasta-file str Reference genome FASTA file (variation graph backbone).
--output-fasta-file str Output FASTA file.
--sequence-prefix str Sequence prefix.

Optional arguments

Flag Type Default Description
--remove-unknown-bases str2bool yes If ‘yes’, then unknown nucleotides (‘N’ or ‘n’) are removed.
--only-variant-sequences str2bool no If ‘yes’, then only variant sequences will be output.
--graph-type str individual Variation graph type. Either ‘individual’ or ‘population’.
--num-threads int 4 Number of threads.