API Reference
The arguments for the CLI and Python API are the same. The CLI is just a wrapper around the Python API.
Command line interface
dreem vector
Hook the command line interface to the `run`
function.
dreem vector [OPTIONS]
Options
- --fasta <fasta>
FASTA file of all reference sequences in the project
- --bamf <bamf>
BAM file
- --bamd <bamd>
Directory of BAM files
- --library <library>
Library CSV file
- --autosect, --no-autosect
Whether, for every reference that was not explicitly given at least one section (by –initial_coords or –primers), to generate coordinates covering the entire reference sequence automatically
- -c, --coords <coords>
Reference name, 5’ end, and 3’ end of a section; coordinates are 1-indexed and include both ends
- -p, --primers <primers>
Reference name, forward primer, and reverse primer of a section; reverse primer must be given 5’ to 3’
- --primer-gap <primer_gap>
Number of bases to leave as a gap between the end of a primer and the end of the section
- --out-dir <out_dir>
Where to output all finished files
- --temp-dir <temp_dir>
Where to write all temporary files
- --phred-enc <phred_enc>
Phred score encoding in FASTQ/SAM/BAM files
- --min-phred <min_phred>
Minimum Phred score to use a base call
- --ambid, --no-ambid
Whether to find and label all ambiguous insertions and deletions (improves accuracy but runs slower)
- --strict-pairs, --no-strict-pairs
Whether to require that every paired read that maps to a section also have a mate that maps to the section
- -z, --batch-size <batch_size>
Maximum size of each batch of mutation vectors, in millions of base calls
- --max-procs <max_procs>
Maximum number of simultaneous processes
- --parallel, --no-parallel
Whether to run multiple jobs in parallel
- --rerun, --no-rerun
Whether to regenerate files that already exist
- --save-temp, --erase-temp
Whether to save or erase temporary files after the program exits
Python interface
- dreem.vector.run(fasta: str, *, bamf: tuple[str] = (), bamd: tuple[str] = (), library: str = '', autosect: bool = False, coords: tuple[tuple[str, int, int], ...] = (), primers: tuple[tuple[str, str, str], ...] = (), primer_gap: int = 2, out_dir: str = './output', temp_dir: str = './temp', phred_enc: int = 33, min_phred: int = 25, ambid: bool = True, strict_pairs: bool = True, batch_size: float = 32.0, max_procs: int = 2, parallel: bool = True, rerun: bool = False, save_temp: bool = False)
- Run the vectoring step. Generate a vector encoding mutations for
each read (or read pair, if paired-end).
- Parameters
fasta (
str
) – FASTA file of all reference sequences in the project [positional or keyword]bamf (
tuple
) – BAM file [keyword-only, default: ()]bamd (
tuple
) – Directory of BAM files [keyword-only, default: ()]library (
str
) – Library CSV file [keyword-only, default: ‘’]autosect (
bool
) – Whether, for every reference that was not explicitly given at least one section (by –initial_coords or –primers), to generate coordinates covering the entire reference sequence automatically [keyword-only, default: False]coords (
tuple
) – Reference name, 5’ end, and 3’ end of a section; coordinates are 1-indexed and include both ends [keyword-only, default: ()]primers (
tuple
) – Reference name, forward primer, and reverse primer of a section; reverse primer must be given 5’ to 3’ [keyword-only, default: ()]primer_gap (
int
) – Number of bases to leave as a gap between the end of a primer and the end of the section [keyword-only, default: 2]out_dir (
str
) – Where to output all finished files [keyword-only, default: ‘./output’]temp_dir (
str
) – Where to write all temporary files [keyword-only, default: ‘./temp’]phred_enc (
int
) – Phred score encoding in FASTQ/SAM/BAM files [keyword-only, default: 33]min_phred (
int
) – Minimum Phred score to use a base call [keyword-only, default: 25]ambid (
bool
) – Whether to find and label all ambiguous insertions and deletions (improves accuracy but runs slower) [keyword-only, default: True]strict_pairs (
bool
) – Whether to require that every paired read that maps to a section also have a mate that maps to the section [keyword-only, default: True]batch_size (
float
) – Maximum size of each batch of mutation vectors, in millions of base calls [keyword-only, default: 32.0]max_procs (
int
) – Maximum number of simultaneous processes [keyword-only, default: 2]parallel (
bool
) – Whether to run multiple jobs in parallel [keyword-only, default: True]rerun (
bool
) – Whether to regenerate files that already exist [keyword-only, default: False]save_temp (
bool
) – Whether to save or erase temporary files after the program exits [keyword-only, default: False]