seismicrna.ensembles package
Submodules
- class seismicrna.ensembles.io.EnsemblesFile
Bases:
HasRegFilePath,ABC- classmethod get_step()
Step of the workflow.
- class seismicrna.ensembles.io.EnsemblesIO
Bases:
EnsemblesFile,RegFileIO,ABC
- seismicrna.ensembles.main.run(input_path: Iterable[str | Path] = Sentinel.UNSET, *, branch: str = '', tmp_pfx: str | Path = './tmp', keep_tmp: bool = False, brotli_level: int = 10, force: bool = False, num_cpus: int = 4, tile_length: int = 0, tile_min_overlap: float = 0.5, erase_tiles: bool = True, pair_fdr: float = 0.05, min_pairs: int = 2, threshold_multiplier: float = 1.0, min_cluster_length: int = 20, max_cluster_length: int = 1200, gap_mode: str = 'omit', mask_coords: Iterable[tuple[str, int, int]] = (), mask_primers: Iterable[tuple[str, DNA, DNA]] = (), primer_gap: int = 0, mask_regions_file: str | None = None, count_del: bool = True, count_ins: bool = True, no_mut: Iterable[str] = (), only_mut: Iterable[str] = (), probe: str = 'DMS', mask_a: bool | None = None, mask_c: bool | None = None, mask_g: bool | None = None, mask_u: bool | None = None, mask_polya: int = 5, mask_pos: Iterable[tuple[str, int]] = (), mask_pos_file: Iterable[str | Path] = (), mask_read: Iterable[str] = (), mask_read_file: Iterable[str | Path] = (), mask_discontig: bool = True, min_ncov_read: int = 1, min_fcov_read: float = 0.0, min_finfo_read: float = 0.95, max_fmut_read: float = 1.0, min_mut_gap: int | None = None, mut_collisions: str = 'auto', min_ninfo_pos: int = 1000, max_fmut_pos: float = 1.0, quick_unbias: bool = True, quick_unbias_thresh: float = 0.001, max_mask_iter: int = 0, mask_pos_table: bool = True, mask_read_table: bool = True, min_clusters: int = 1, max_clusters: int = 0, min_em_runs: int = 6, max_em_runs: int = 30, jackpot: bool = True, jackpot_conf_level: float = 0.95, max_jackpot_quotient: float = 1.1, max_jackpot_sims: int = 12, jackpot_max_data: int = 268435456, min_em_iter: int = 10, max_em_iter: int = 500, em_thresh: float = 0.37, min_marcd_run: float = 0.016, max_pearson_run: float = 0.9, max_arcd_vs_ens_avg: float = 0.2, max_gini_run: float = 0.667, max_loglike_vs_best: float = 0.0, min_pearson_vs_best: float = 0.97, max_marcd_vs_best: float = 0.008, try_all_ks: bool = False, write_all_ks: bool = False, cluster_pos_table: bool = True, cluster_abundance_table: bool = True, verify_times: bool = True, seed: int | None = None)
Infer independent structure ensembles along an entire RNA.
- Parameters:
branch (
str) – Create a new branch of the workflow with this name [keyword-only, default: ‘’]tmp_pfx (
str | pathlib._local.Path) – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp’]keep_tmp (
bool) – Keep temporary files after finishing [keyword-only, default: False]brotli_level (
int) – Compress pickle files with this level of Brotli (0 - 11) [keyword-only, default: 10]force (
bool) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]num_cpus (
int) – Use up to this many CPUs simultaneously [keyword-only, default: 4]tile_length (
int) – Make each tile this length (if 0, use 2x the median read length) [keyword-only, default: 0]tile_min_overlap (
float) – Make adjacent tiles overlap by at least this fraction of length [keyword-only, default: 0.5]erase_tiles (
bool) – Erase the mask reports/batches from the tiling step [keyword-only, default: True]pair_fdr (
float) – Find correlated pairs at this false discovery rate (FDR) [keyword-only, default: 0.05]min_pairs (
int) – Cluster only the regions with at least this many correlated pairs [keyword-only, default: 2]threshold_multiplier (
float) – Multiply the threshold for detecting modules by this factor [keyword-only, default: 1.0]min_cluster_length (
int) – Cluster only the regions with at least this many positions [keyword-only, default: 20]max_cluster_length (
int) – Cluster only the regions with no more than this many positions [keyword-only, default: 1200]gap_mode (
str) – If there are gaps between regions to cluster, OMIT (do not cluster) the gaps, INSERT a new region into each gap, or EXPAND the existing regions to fill the gaps [keyword-only, default: ‘omit’]mask_coords (
Iterable) – Select a region of a reference given its 5’ and 3’ end coordinates [keyword-only, default: ()]mask_primers (
Iterable) – Select a region of a reference given its forward and reverse primers [keyword-only, default: ()]primer_gap (
int) – Leave a gap of this many bases between the primer and the region [keyword-only, default: 0]mask_regions_file (
str | None) – Select regions of references from coordinates/primers in a CSV file [keyword-only, default: None]count_del (
bool) – Count deletions as mutations [keyword-only, default: True]count_ins (
bool) – Count insertions as mutations [keyword-only, default: True]no_mut (
Iterable) – Do not count this type of mutation (overrides –count-del/ins) [keyword-only, default: ()]only_mut (
Iterable) – Count only this type of mutation (overrides other mutation settings) [keyword-only, default: ()]probe (
str) – Use default mask options for this chemical probe [keyword-only, default: ‘DMS’]mask_a (
bool | None) – Mask positions with base A [keyword-only, default: None]mask_c (
bool | None) – Mask positions with base C [keyword-only, default: None]mask_g (
bool | None) – Mask positions with base G [keyword-only, default: None]mask_u (
bool | None) – Mask positions with base U [keyword-only, default: None]mask_polya (
int) – Mask stretches of at least this many consecutive A bases (0 disables) [keyword-only, default: 5]mask_pos (
Iterable) – Mask this position in this reference [keyword-only, default: ()]mask_pos_file (
Iterable) – Mask positions in references from a file [keyword-only, default: ()]mask_read (
Iterable) – Mask the read with this name [keyword-only, default: ()]mask_read_file (
Iterable) – Mask the reads with names in this file [keyword-only, default: ()]mask_discontig (
bool) – Mask paired-end reads with discontiguous mates [keyword-only, default: True]min_ncov_read (
int) – Mask reads with fewer than this many bases covering the region [keyword-only, default: 1]min_fcov_read (
float) – Mask reads covering less than this fraction of the region [keyword-only, default: 0.0]min_finfo_read (
float) – Mask reads with less than this fraction of informative base calls [keyword-only, default: 0.95]max_fmut_read (
float) – Mask reads with more than this fraction of mutated base calls [keyword-only, default: 1.0]min_mut_gap (
int | None) – Mask reads with two mutations separated by fewer than this many bases [keyword-only, default: None]mut_collisions (
str) – If two mutations are closer than –min-mut-gap positions, MERGE the mutations, DROP the read, or AUTO-select based on the probe. [keyword-only, default: ‘auto’]min_ninfo_pos (
int) – Mask positions with fewer than this many informative base calls [keyword-only, default: 1000]max_fmut_pos (
float) – Mask positions with more than this fraction of mutated base calls [keyword-only, default: 1.0]quick_unbias (
bool) – Correct observer bias using a quick (typically linear time) heuristic [keyword-only, default: True]quick_unbias_thresh (
float) – Treat mutated fractions under this threshold as 0 with –quick-unbias [keyword-only, default: 0.001]max_mask_iter (
int) – Stop masking after this many iterations (0 for no limit) [keyword-only, default: 0]mask_pos_table (
bool) – Tabulate relationships per position for mask data [keyword-only, default: True]mask_read_table (
bool) – Tabulate relationships per read for mask data [keyword-only, default: True]min_clusters (
int) – Start at this many clusters [keyword-only, default: 1]max_clusters (
int) – Stop at this many clusters (0 for no limit) [keyword-only, default: 0]min_em_runs (
int) – Run EM (successfully) at least this number of times for each K [keyword-only, default: 6]max_em_runs (
int) – Run EM (successfully or not) at most this number of times for each K [keyword-only, default: 30]jackpot (
bool) – Calculate the jackpotting quotient to find over-represented reads [keyword-only, default: True]jackpot_conf_level (
float) – Confidence level for the jackpotting quotient confidence interval [keyword-only, default: 0.95]max_jackpot_quotient (
float) – Remove runs whose jackpotting quotient exceeds this limit [keyword-only, default: 1.1]max_jackpot_sims (
int) – Maximum number of simulations to compute the jackpotting quotient [keyword-only, default: 12]jackpot_max_data (
int) – Skip calculating the jackpotting quotient if reads × positions exceeds this limit [keyword-only, default: 268435456]min_em_iter (
int) – Run EM for at least this many iterations [keyword-only, default: 10]max_em_iter (
int) – Run EM for at most this many iterations [keyword-only, default: 500]em_thresh (
float) – Stop EM when the log likelihood increases by less than this threshold [keyword-only, default: 0.37]min_marcd_run (
float) – Remove runs with two clusters that differ by less than this MARCD [keyword-only, default: 0.016]max_pearson_run (
float) – Remove runs with two clusters more similar than this correlation [keyword-only, default: 0.9]max_arcd_vs_ens_avg (
float) – Remove runs where a cluster differs by more than this ARCD from the ensemble average at any position [keyword-only, default: 0.2]max_gini_run (
float) – Remove runs where any cluster’s Gini coefficient exceeds this limit [keyword-only, default: 0.667]max_loglike_vs_best (
float) – Remove Ks with a log likelihood gap larger than this (0 for no limit) [keyword-only, default: 0.0]min_pearson_vs_best (
float) – Remove Ks where every run has less than this correlation vs. the best [keyword-only, default: 0.97]max_marcd_vs_best (
float) – Remove Ks where every run has more than this MARCD vs. the best [keyword-only, default: 0.008]try_all_ks (
bool) – Try all numbers of clusters (Ks), even after finding the best number [keyword-only, default: False]write_all_ks (
bool) – Write all numbers of clusters (Ks), rather than only the best number [keyword-only, default: False]cluster_pos_table (
bool) – Tabulate relationships per position for cluster data [keyword-only, default: True]cluster_abundance_table (
bool) – Tabulate number of reads per cluster for cluster data [keyword-only, default: True]verify_times (
bool) – Verify that report files from later steps have later timestamps [keyword-only, default: True]seed (
int | None) – Seed for the random number generator [keyword-only, default: None]
- class seismicrna.ensembles.report.EnsemblesReport(**kwargs: Any | Callable[[Report], Any])
Bases:
RegReport,EnsemblesIO- classmethod get_file_seg_type()
Type of the last segment in the path.
- classmethod get_param_report_fields()
Parameter fields of the report.
- classmethod get_result_report_fields()
Result fields of the report.
- seismicrna.ensembles.write.ensembles(relate_report_file: Path, *, branch: str, tmp_pfx: str | Path, keep_tmp: bool, brotli_level: int, force: bool, num_cpus: int, tile_length: int, tile_min_overlap: float, erase_tiles: bool, pair_fdr: float, min_pairs: int, threshold_multiplier: float, min_cluster_length: int, max_cluster_length: int, gap_mode: str, mask_coords: Iterable[tuple[str, int, int]], mask_primers: Iterable[tuple[str, DNA, DNA]], primer_gap: int, mask_regions_file: str | None, count_del: bool, count_ins: bool, no_mut: Iterable[str], only_mut: Iterable[str], probe: str, mask_a: bool | None, mask_c: bool | None, mask_g: bool | None, mask_u: bool | None, mask_polya: int, mask_pos: Iterable[tuple[str, int]], mask_pos_file: Iterable[str | Path], mask_read: Iterable[str], mask_read_file: Iterable[str | Path], mask_discontig: bool, min_ncov_read: int, min_fcov_read: float, min_finfo_read: float, max_fmut_read: float, min_mut_gap: int | None, mut_collisions: str, min_ninfo_pos: int, max_fmut_pos: float, quick_unbias: bool, quick_unbias_thresh: float, max_mask_iter: int, mask_pos_table: bool, mask_read_table: bool, min_clusters: int, max_clusters: int, min_em_runs: int, max_em_runs: int, jackpot: bool, jackpot_conf_level: float, max_jackpot_quotient: float, max_jackpot_sims: int, jackpot_max_data: int, min_em_iter: int, max_em_iter: int, em_thresh: float, min_marcd_run: float, max_pearson_run: float, max_arcd_vs_ens_avg: float, max_gini_run: float, max_loglike_vs_best: float, min_pearson_vs_best: float, max_marcd_vs_best: float, try_all_ks: bool, write_all_ks: bool, cluster_pos_table: bool, cluster_abundance_table: bool, verify_times: bool, seed: int | None)
Run one relate report through the full ensembles pipeline.