API Reference

The arguments for the CLI and Python API are the same. The CLI is just a wrapper around the Python API.

Command line interface

dreem vector

Hook the command line interface to the `run` function.

dreem vector [OPTIONS]

Options

--fasta <fasta>

FASTA file of all reference sequences in the project

--bamf <bamf>

BAM file

--bamd <bamd>

Directory of BAM files

--library <library>

Library CSV file

--autosect, --no-autosect

Whether, for every reference that was not explicitly given at least one section (by –initial_coords or –primers), to generate coordinates covering the entire reference sequence automatically

-c, --coords <coords>

Reference name, 5’ end, and 3’ end of a section; coordinates are 1-indexed and include both ends

-p, --primers <primers>

Reference name, forward primer, and reverse primer of a section; reverse primer must be given 5’ to 3’

--primer-gap <primer_gap>

Number of bases to leave as a gap between the end of a primer and the end of the section

--out-dir <out_dir>

Where to output all finished files

--temp-dir <temp_dir>

Where to write all temporary files

--phred-enc <phred_enc>

Phred score encoding in FASTQ/SAM/BAM files

--min-phred <min_phred>

Minimum Phred score to use a base call

--ambid, --no-ambid

Whether to find and label all ambiguous insertions and deletions (improves accuracy but runs slower)

--strict-pairs, --no-strict-pairs

Whether to require that every paired read that maps to a section also have a mate that maps to the section

-z, --batch-size <batch_size>

Maximum size of each batch of mutation vectors, in millions of base calls

--max-procs <max_procs>

Maximum number of simultaneous processes

--parallel, --no-parallel

Whether to run multiple jobs in parallel

--rerun, --no-rerun

Whether to regenerate files that already exist

--save-temp, --erase-temp

Whether to save or erase temporary files after the program exits

Python interface

dreem.vector.run(fasta: str, *, bamf: tuple[str] = (), bamd: tuple[str] = (), library: str = '', autosect: bool = False, coords: tuple[tuple[str, int, int], ...] = (), primers: tuple[tuple[str, str, str], ...] = (), primer_gap: int = 2, out_dir: str = './output', temp_dir: str = './temp', phred_enc: int = 33, min_phred: int = 25, ambid: bool = True, strict_pairs: bool = True, batch_size: float = 32.0, max_procs: int = 2, parallel: bool = True, rerun: bool = False, save_temp: bool = False)
Run the vectoring step. Generate a vector encoding mutations for

each read (or read pair, if paired-end).

Parameters
  • fasta (str) – FASTA file of all reference sequences in the project [positional or keyword]

  • bamf (tuple) – BAM file [keyword-only, default: ()]

  • bamd (tuple) – Directory of BAM files [keyword-only, default: ()]

  • library (str) – Library CSV file [keyword-only, default: ‘’]

  • autosect (bool) – Whether, for every reference that was not explicitly given at least one section (by –initial_coords or –primers), to generate coordinates covering the entire reference sequence automatically [keyword-only, default: False]

  • coords (tuple) – Reference name, 5’ end, and 3’ end of a section; coordinates are 1-indexed and include both ends [keyword-only, default: ()]

  • primers (tuple) – Reference name, forward primer, and reverse primer of a section; reverse primer must be given 5’ to 3’ [keyword-only, default: ()]

  • primer_gap (int) – Number of bases to leave as a gap between the end of a primer and the end of the section [keyword-only, default: 2]

  • out_dir (str) – Where to output all finished files [keyword-only, default: ‘./output’]

  • temp_dir (str) – Where to write all temporary files [keyword-only, default: ‘./temp’]

  • phred_enc (int) – Phred score encoding in FASTQ/SAM/BAM files [keyword-only, default: 33]

  • min_phred (int) – Minimum Phred score to use a base call [keyword-only, default: 25]

  • ambid (bool) – Whether to find and label all ambiguous insertions and deletions (improves accuracy but runs slower) [keyword-only, default: True]

  • strict_pairs (bool) – Whether to require that every paired read that maps to a section also have a mate that maps to the section [keyword-only, default: True]

  • batch_size (float) – Maximum size of each batch of mutation vectors, in millions of base calls [keyword-only, default: 32.0]

  • max_procs (int) – Maximum number of simultaneous processes [keyword-only, default: 2]

  • parallel (bool) – Whether to run multiple jobs in parallel [keyword-only, default: True]

  • rerun (bool) – Whether to regenerate files that already exist [keyword-only, default: False]

  • save_temp (bool) – Whether to save or erase temporary files after the program exits [keyword-only, default: False]