API Reference

The arguments for the CLI and Python API are the same. The CLI is just a wrapper around the Python API.

CLI cmds

dreem cluster

dreem cluster [OPTIONS]

Options

--mp-report <mp_report>

Path to the bit vector folder or list of paths to the bit vector folders.

--out-dir <out_dir>

Where to output all finished files

--max-procs <max_procs>

Maximum number of simultaneous processes

--max-clusters <max_clusters>

Maximum number of clusters.

-n, --num-runs <num_runs>

Number of time to run the clustering algorithm.

--signal-thresh <signal_thresh>

Minimum Mutation fraction to keep a base.

--include-gu, --exclude-gu

Whether to include G and U bases in reads.

--include-del, --exclude-del

Whether to include deletions in reads.

--polya-max <polya_max>

Maximum length of poly(A) sequences to include.

--min-iter <min_iter>

Minimum number of iteration before checking convergence of EM.

--max-iter <max_iter>

Maximum number of iteration before stopping EM.

--convergence-cutoff <convergence_cutoff>

Minimum difference between the log-likelihood of two consecutive iterations to stop EM.

--min-reads <min_reads>

Minimum number of reads to start clustering.

Python args

dreem.cluster.run(mp_report: tuple[str] = (), *, out_dir: str = './output', max_procs: int = 2, max_clusters: int = 3, num_runs: int = 10, signal_thresh: float = 0.005, include_gu: bool = False, include_del: bool = False, polya_max: int = 4, min_iter: int = 100, max_iter: int = 500, convergence_cutoff: float = 0.5, min_reads: int = 1000)

Run the clustering module.

Parameters
  • mp_report (tuple) – Path to the bit vector folder or list of paths to the bit vector folders. [positional or keyword, default: ()]

  • out_dir (str) – Where to output all finished files [keyword-only, default: ‘./output’]

  • max_procs (int) – Maximum number of simultaneous processes [keyword-only, default: 2]

  • max_clusters (int) – Maximum number of clusters. [keyword-only, default: 3]

  • num_runs (int) – Number of time to run the clustering algorithm. [keyword-only, default: 10]

  • signal_thresh (float) – Minimum Mutation fraction to keep a base. [keyword-only, default: 0.005]

  • include_gu (bool) – Whether to include G and U bases in reads. [keyword-only, default: False]

  • include_del (bool) – Whether to include deletions in reads. [keyword-only, default: False]

  • polya_max (int) – Maximum length of poly(A) sequences to include. [keyword-only, default: 4]

  • min_iter (int) – Minimum number of iteration before checking convergence of EM. [keyword-only, default: 100]

  • max_iter (int) – Maximum number of iteration before stopping EM. [keyword-only, default: 500]

  • convergence_cutoff (float) – Minimum difference between the log-likelihood of two consecutive iterations to stop EM. [keyword-only, default: 0.5]

  • min_reads (int) – Minimum number of reads to start clustering. [keyword-only, default: 1000]