API Reference

The arguments for the CLI and Python API are the same. The CLI is just a wrapper around the Python API.

CLI cmds

dreem align

dreem align [OPTIONS]

Options

--fasta <fasta>

FASTA file of all reference sequences in the project

--fastqs <fastqs>

FASTQ files of single-end reads

--fastqi <fastqi>

FASTQ files of interleaved paired reads

--fastq1 <fastq1>

FASTQ files of mate 1 paired-end reads

--fastq2 <fastq2>

FASTQ files of mate 2 paired-end reads

--fastqs-dir <fastqs_dir>

Directory containing demultiplexed FASTQ files of single-end reads from one sample

--fastqi-dir <fastqi_dir>

Directory containing demultiplexed FASTQ files of interleaved paired-end reads from one sample

--fastq12-dir <fastq12_dir>

Directory containing demultiplexed pairs of FASTQ files of mate 1 and mate 2 reads from one sample

--phred-enc <phred_enc>

Phred score encoding in FASTQ/SAM/BAM files

--out-dir <out_dir>

Where to output all finished files

--temp-dir <temp_dir>

Where to write all temporary files

--rerun, --no-rerun

Whether to regenerate files that already exist

--save-temp, --erase-temp

Whether to save or erase temporary files after the program exits

--parallel, --no-parallel

Whether to run multiple jobs in parallel

--max-procs <max_procs>

Maximum number of simultaneous processes

--fastqc, --no-fastqc

Whether to check quality of FASTQ files

--qc-extract, --qc-no-extract

Whether to unzip FASTQC reports

--cut, --no-cut

Whether to trim reads with Cutadapt before alignment

--cut-a1 <cut_a1>

3’ adapter for read 1

--cut-g1 <cut_g1>

5’ adapter for read 1

--cut-a2 <cut_a2>

3’ adapter for read 2

--cut-g2 <cut_g2>

5’ adapter for read 2

--cut-O <cut_o>

Minimum overlap of read and adapter

--cut-e <cut_e>

Error tolerance for adapters

--cut-q1 <cut_q1>

Phred score for read 1 quality trimming

--cut-q2 <cut_q2>

Phred score for read 2 quality trimming

--cut-m <cut_m>

Discard reads shorter than this length after trimming

--cut-indels, --cut-no-indels

Whether to allow indels in adapters

--cut-discard-trimmed, --cut-keep-trimmed

Whether to discard reads in which an adapter was found

--cut-discard-untrimmed, --cut-keep-untrimmed

Whether to discard reads in which no adapter was found

--cut-nextseq, --cut-no-nextseq

Whether to trim high-quality Gs from 3’ end

--bt2-local, --bt2-end-to-end

Whether to perform local or end-to-end alignment

--bt2-discordant, --bt2-no-discordant

Whether to output discordant alignments

--bt2-mixed, --bt2-no-mixed

Whether to align individual mates of unaligned pairs

--bt2-dovetail, --bt2-no-dovetail

Whether to treat dovetailed mate pairs as concordant

--bt2-contain, --bt2-no-contain

Whether to treat nested mate pairs as concordant

--bt2-unal, --bt2-no-unal

Whether to output unaligned reads

--bt2-I <bt2_i>

Minimum fragment length for valid paired-end alignments

--bt2-X <bt2_x>

Maximum fragment length for valid paired-end alignments

--bt2-score-min <bt2_score_min>

Minimum score for a valid alignment

--bt2-i <bt2_s>

Seed interval

--bt2-L <bt2_l>

Seed length

--bt2-gbar <bt2_gbar>

Minimum distance of a gap from end of a read

--bt2-D <bt2_d>

Maximum number of failed seed extensions

--bt2-R <bt2_r>

Maximum number of times to re-seed

--bt2-dpad <bt2_dpad>

Width of padding on alignment matrix, to allow gaps

--bt2-orient <bt2_orient>

Valid orientations of paired-end mates

Options

fr | rf | ff

Python args

dreem.align.run(*, fasta: str, fastqs: tuple[str] = (), fastqi: tuple[str] = (), fastq1: tuple[str] = (), fastq2: tuple[str] = (), fastqs_dir: tuple[str] = (), fastqi_dir: tuple[str] = (), fastq12_dir: tuple[str] = (), phred_enc: int = 33, out_dir: str = './output', temp_dir: str = './temp', save_temp: bool = False, rerun: bool = False, max_procs: int = 2, parallel: bool = True, fastqc: bool = True, qc_extract: bool = False, cut: bool = True, cut_q1: int = 25, cut_q2: int = 25, cut_g1: tuple[str] = (), cut_a1: tuple[str] = ('AGATCGGAAGAGC',), cut_g2: tuple[str] = (), cut_a2: tuple[str] = ('AGATCGGAAGAGC',), cut_o: int = 6, cut_e: float = 0.1, cut_indels: bool = True, cut_nextseq: bool = False, cut_discard_trimmed: bool = False, cut_discard_untrimmed: bool = False, cut_m: int = 20, bt2_local: bool = True, bt2_discordant: bool = False, bt2_mixed: bool = False, bt2_dovetail: bool = False, bt2_contain: bool = True, bt2_unal: bool = False, bt2_score_min: str = 'L,1,0.5', bt2_i: int = 0, bt2_x: int = 600, bt2_gbar: int = 4, bt2_l: int = 12, bt2_s: str = 'L,1,0.1', bt2_d: int = 4, bt2_r: int = 2, bt2_dpad: int = 2, bt2_orient: str = 'fr') tuple[str, ...]

Run the alignment module.

Align the reads to the set of reference sequences and output one BAM file for each sample aligned to each reference in the directory ‘output’. Temporary intermediary files are written in the directory ‘temp’ and then deleted after they are no longer needed.

Parameters
  • fasta (str) – FASTA file of all reference sequences in the project [keyword-only]

  • fastqs (tuple) – FASTQ files of single-end reads [keyword-only, default: ()]

  • fastqi (tuple) – FASTQ files of interleaved paired reads [keyword-only, default: ()]

  • fastq1 (tuple) – FASTQ files of mate 1 paired-end reads [keyword-only, default: ()]

  • fastq2 (tuple) – FASTQ files of mate 2 paired-end reads [keyword-only, default: ()]

  • fastqs_dir (tuple) – Directory containing demultiplexed FASTQ files of single-end reads from one sample [keyword-only, default: ()]

  • fastqi_dir (tuple) – Directory containing demultiplexed FASTQ files of interleaved paired-end reads from one sample [keyword-only, default: ()]

  • fastq12_dir (tuple) – Directory containing demultiplexed pairs of FASTQ files of mate 1 and mate 2 reads from one sample [keyword-only, default: ()]

  • phred_enc (int) – Phred score encoding in FASTQ/SAM/BAM files [keyword-only, default: 33]

  • out_dir (str) – Where to output all finished files [keyword-only, default: ‘./output’]

  • temp_dir (str) – Where to write all temporary files [keyword-only, default: ‘./temp’]

  • save_temp (bool) – Whether to save or erase temporary files after the program exits [keyword-only, default: False]

  • rerun (bool) – Whether to regenerate files that already exist [keyword-only, default: False]

  • max_procs (int) – Maximum number of simultaneous processes [keyword-only, default: 2]

  • parallel (bool) – Whether to run multiple jobs in parallel [keyword-only, default: True]

  • fastqc (bool) – Whether to check quality of FASTQ files [keyword-only, default: True]

  • qc_extract (bool) – Whether to unzip FASTQC reports [keyword-only, default: False]

  • cut (bool) – Whether to trim reads with Cutadapt before alignment [keyword-only, default: True]

  • cut_q1 (int) – Phred score for read 1 quality trimming [keyword-only, default: 25]

  • cut_q2 (int) – Phred score for read 2 quality trimming [keyword-only, default: 25]

  • cut_g1 (tuple) – 5’ adapter for read 1 [keyword-only, default: ()]

  • cut_a1 (tuple) – 3’ adapter for read 1 [keyword-only, default: (‘AGATCGGAAGAGC’,)]

  • cut_g2 (tuple) – 5’ adapter for read 2 [keyword-only, default: ()]

  • cut_a2 (tuple) – 3’ adapter for read 2 [keyword-only, default: (‘AGATCGGAAGAGC’,)]

  • cut_o (int) – Minimum overlap of read and adapter [keyword-only, default: 6]

  • cut_e (float) – Error tolerance for adapters [keyword-only, default: 0.1]

  • cut_indels (bool) – Whether to allow indels in adapters [keyword-only, default: True]

  • cut_nextseq (bool) – Whether to trim high-quality Gs from 3’ end [keyword-only, default: False]

  • cut_discard_trimmed (bool) – Whether to discard reads in which an adapter was found [keyword-only, default: False]

  • cut_discard_untrimmed (bool) – Whether to discard reads in which no adapter was found [keyword-only, default: False]

  • cut_m (int) – Discard reads shorter than this length after trimming [keyword-only, default: 20]

  • bt2_local (bool) – Whether to perform local or end-to-end alignment [keyword-only, default: True]

  • bt2_discordant (bool) – Whether to output discordant alignments [keyword-only, default: False]

  • bt2_mixed (bool) – Whether to align individual mates of unaligned pairs [keyword-only, default: False]

  • bt2_dovetail (bool) – Whether to treat dovetailed mate pairs as concordant [keyword-only, default: False]

  • bt2_contain (bool) – Whether to treat nested mate pairs as concordant [keyword-only, default: True]

  • bt2_unal (bool) – Whether to output unaligned reads [keyword-only, default: False]

  • bt2_score_min (str) – Minimum score for a valid alignment [keyword-only, default: ‘L,1,0.5’]

  • bt2_i (int) – Minimum fragment length for valid paired-end alignments [keyword-only, default: 0]

  • bt2_x (int) – Maximum fragment length for valid paired-end alignments [keyword-only, default: 600]

  • bt2_gbar (int) – Minimum distance of a gap from end of a read [keyword-only, default: 4]

  • bt2_l (int) – Seed length [keyword-only, default: 12]

  • bt2_s (str) – Seed interval [keyword-only, default: ‘L,1,0.1’]

  • bt2_d (int) – Maximum number of failed seed extensions [keyword-only, default: 4]

  • bt2_r (int) – Maximum number of times to re-seed [keyword-only, default: 2]

  • bt2_dpad (int) – Width of padding on alignment matrix, to allow gaps [keyword-only, default: 2]

  • bt2_orient (str) – Valid orientations of paired-end mates [keyword-only, default: ‘fr’]