API Reference

The arguments for the CLI and Python API are the same. The CLI is just a wrapper around the Python API.

CLI cmds

dreem align

dreem align [OPTIONS]

Options

--fasta <fasta>: FASTA file of all reference sequences in the project

--fastqs <fastqs>: FASTQ files of single-end reads

--fastqi <fastqi>: FASTQ files of interleaved paired reads

--fastq1 <fastq1>: FASTQ files of mate 1 paired-end reads

--fastq2 <fastq2>: FASTQ files of mate 2 paired-end reads

--fastqs-dir <fastqs_dir>: Directory containing demultiplexed FASTQ files of single-end reads from one sample

--fastqi-dir <fastqi_dir>: Directory containing demultiplexed FASTQ files of interleaved paired-end reads from one sample

--fastq12-dir <fastq12_dir>: Directory containing demultiplexed pairs of FASTQ files of mate 1 and mate 2 reads from one sample

--phred-enc <phred_enc>: Phred score encoding in FASTQ/SAM/BAM files

--out-dir <out_dir>: Where to output all finished files

--temp-dir <temp_dir>: Where to write all temporary files

--rerun, --no-rerun: Whether to regenerate files that already exist

--save-temp, --erase-temp: Whether to save or erase temporary files after the program exits

--parallel, --no-parallel: Whether to run multiple jobs in parallel

--max-procs <max_procs>: Maximum number of simultaneous processes

--fastqc, --no-fastqc: Whether to check quality of FASTQ files

--qc-extract, --qc-no-extract: Whether to unzip FASTQC reports

--cut, --no-cut: Whether to trim reads with Cutadapt before alignment

--cut-a1 <cut_a1>: 3’ adapter for read 1

--cut-g1 <cut_g1>: 5’ adapter for read 1

--cut-a2 <cut_a2>: 3’ adapter for read 2

--cut-g2 <cut_g2>: 5’ adapter for read 2

--cut-O <cut_o>: Minimum overlap of read and adapter

--cut-e <cut_e>: Error tolerance for adapters

--cut-q1 <cut_q1>: Phred score for read 1 quality trimming

--cut-q2 <cut_q2>: Phred score for read 2 quality trimming

--cut-m <cut_m>: Discard reads shorter than this length after trimming

--cut-indels, --cut-no-indels: Whether to allow indels in adapters

--cut-discard-trimmed, --cut-keep-trimmed: Whether to discard reads in which an adapter was found

--cut-discard-untrimmed, --cut-keep-untrimmed: Whether to discard reads in which no adapter was found

--cut-nextseq, --cut-no-nextseq: Whether to trim high-quality Gs from 3’ end

--bt2-local, --bt2-end-to-end: Whether to perform local or end-to-end alignment

--bt2-discordant, --bt2-no-discordant: Whether to output discordant alignments

--bt2-mixed, --bt2-no-mixed: Whether to align individual mates of unaligned pairs

--bt2-dovetail, --bt2-no-dovetail: Whether to treat dovetailed mate pairs as concordant

--bt2-contain, --bt2-no-contain: Whether to treat nested mate pairs as concordant

--bt2-unal, --bt2-no-unal: Whether to output unaligned reads

--bt2-I <bt2_i>: Minimum fragment length for valid paired-end alignments

--bt2-X <bt2_x>: Maximum fragment length for valid paired-end alignments

--bt2-score-min <bt2_score_min>: Minimum score for a valid alignment

--bt2-i <bt2_s>: Seed interval

--bt2-L <bt2_l>: Seed length

--bt2-gbar <bt2_gbar>: Minimum distance of a gap from end of a read

--bt2-D <bt2_d>: Maximum number of failed seed extensions

--bt2-R <bt2_r>: Maximum number of times to re-seed

--bt2-dpad <bt2_dpad>: Width of padding on alignment matrix, to allow gaps

--bt2-orient <bt2_orient>

Valid orientations of paired-end mates

Options: fr | rf | ff

Python args

dreem.align.run(*, fasta: str, fastqs: tuple[str] = (), fastqi: tuple[str] = (), fastq1: tuple[str] = (), fastq2: tuple[str] = (), fastqs_dir: tuple[str] = (), fastqi_dir: tuple[str] = (), fastq12_dir: tuple[str] = (), phred_enc: int = 33, out_dir: str = './output', temp_dir: str = './temp', save_temp: bool = False, rerun: bool = False, max_procs: int = 2, parallel: bool = True, fastqc: bool = True, qc_extract: bool = False, cut: bool = True, cut_q1: int = 25, cut_q2: int = 25, cut_g1: tuple[str] = (), cut_a1: tuple[str] = ('AGATCGGAAGAGC',), cut_g2: tuple[str] = (), cut_a2: tuple[str] = ('AGATCGGAAGAGC',), cut_o: int = 6, cut_e: float = 0.1, cut_indels: bool = True, cut_nextseq: bool = False, cut_discard_trimmed: bool = False, cut_discard_untrimmed: bool = False, cut_m: int = 20, bt2_local: bool = True, bt2_discordant: bool = False, bt2_mixed: bool = False, bt2_dovetail: bool = False, bt2_contain: bool = True, bt2_unal: bool = False, bt2_score_min: str = 'L,1,0.5', bt2_i: int = 0, bt2_x: int = 600, bt2_gbar: int = 4, bt2_l: int = 12, bt2_s: str = 'L,1,0.1', bt2_d: int = 4, bt2_r: int = 2, bt2_dpad: int = 2, bt2_orient: str = 'fr') → tuple[str, ...]

Run the alignment module.

Align the reads to the set of reference sequences and output one BAM file for each sample aligned to each reference in the directory ‘output’. Temporary intermediary files are written in the directory ‘temp’ and then deleted after they are no longer needed.

Parameters

fasta (str) – FASTA file of all reference sequences in the project [keyword-only]
fastqs (tuple) – FASTQ files of single-end reads [keyword-only, default: ()]
fastqi (tuple) – FASTQ files of interleaved paired reads [keyword-only, default: ()]
fastq1 (tuple) – FASTQ files of mate 1 paired-end reads [keyword-only, default: ()]
fastq2 (tuple) – FASTQ files of mate 2 paired-end reads [keyword-only, default: ()]
fastqs_dir (tuple) – Directory containing demultiplexed FASTQ files of single-end reads from one sample [keyword-only, default: ()]
fastqi_dir (tuple) – Directory containing demultiplexed FASTQ files of interleaved paired-end reads from one sample [keyword-only, default: ()]
fastq12_dir (tuple) – Directory containing demultiplexed pairs of FASTQ files of mate 1 and mate 2 reads from one sample [keyword-only, default: ()]
phred_enc (int) – Phred score encoding in FASTQ/SAM/BAM files [keyword-only, default: 33]
out_dir (str) – Where to output all finished files [keyword-only, default: ‘./output’]
temp_dir (str) – Where to write all temporary files [keyword-only, default: ‘./temp’]
save_temp (bool) – Whether to save or erase temporary files after the program exits [keyword-only, default: False]
rerun (bool) – Whether to regenerate files that already exist [keyword-only, default: False]
max_procs (int) – Maximum number of simultaneous processes [keyword-only, default: 2]
parallel (bool) – Whether to run multiple jobs in parallel [keyword-only, default: True]
fastqc (bool) – Whether to check quality of FASTQ files [keyword-only, default: True]
qc_extract (bool) – Whether to unzip FASTQC reports [keyword-only, default: False]
cut (bool) – Whether to trim reads with Cutadapt before alignment [keyword-only, default: True]
cut_q1 (int) – Phred score for read 1 quality trimming [keyword-only, default: 25]
cut_q2 (int) – Phred score for read 2 quality trimming [keyword-only, default: 25]
cut_g1 (tuple) – 5’ adapter for read 1 [keyword-only, default: ()]
cut_a1 (tuple) – 3’ adapter for read 1 [keyword-only, default: (‘AGATCGGAAGAGC’,)]
cut_g2 (tuple) – 5’ adapter for read 2 [keyword-only, default: ()]
cut_a2 (tuple) – 3’ adapter for read 2 [keyword-only, default: (‘AGATCGGAAGAGC’,)]
cut_o (int) – Minimum overlap of read and adapter [keyword-only, default: 6]
cut_e (float) – Error tolerance for adapters [keyword-only, default: 0.1]
cut_indels (bool) – Whether to allow indels in adapters [keyword-only, default: True]
cut_nextseq (bool) – Whether to trim high-quality Gs from 3’ end [keyword-only, default: False]
cut_discard_trimmed (bool) – Whether to discard reads in which an adapter was found [keyword-only, default: False]
cut_discard_untrimmed (bool) – Whether to discard reads in which no adapter was found [keyword-only, default: False]
cut_m (int) – Discard reads shorter than this length after trimming [keyword-only, default: 20]
bt2_local (bool) – Whether to perform local or end-to-end alignment [keyword-only, default: True]
bt2_discordant (bool) – Whether to output discordant alignments [keyword-only, default: False]
bt2_mixed (bool) – Whether to align individual mates of unaligned pairs [keyword-only, default: False]
bt2_dovetail (bool) – Whether to treat dovetailed mate pairs as concordant [keyword-only, default: False]
bt2_contain (bool) – Whether to treat nested mate pairs as concordant [keyword-only, default: True]
bt2_unal (bool) – Whether to output unaligned reads [keyword-only, default: False]
bt2_score_min (str) – Minimum score for a valid alignment [keyword-only, default: ‘L,1,0.5’]
bt2_i (int) – Minimum fragment length for valid paired-end alignments [keyword-only, default: 0]
bt2_x (int) – Maximum fragment length for valid paired-end alignments [keyword-only, default: 600]
bt2_gbar (int) – Minimum distance of a gap from end of a read [keyword-only, default: 4]
bt2_l (int) – Seed length [keyword-only, default: 12]
bt2_s (str) – Seed interval [keyword-only, default: ‘L,1,0.1’]
bt2_d (int) – Maximum number of failed seed extensions [keyword-only, default: 4]
bt2_r (int) – Maximum number of times to re-seed [keyword-only, default: 2]
bt2_dpad (int) – Width of padding on alignment matrix, to allow gaps [keyword-only, default: 2]
bt2_orient (str) – Valid orientations of paired-end mates [keyword-only, default: ‘fr’]