seismicrna.mask package

Submodules

class seismicrna.mask.batch.MaskMutsBatch(*, read_nums: ndarray, **kwargs)

Bases: MaskReadBatch, SectionMutsBatch, PartialMutsBatch

property read_weights: Weights for each read when computing counts.

class seismicrna.mask.batch.MaskReadBatch(*, read_nums: ndarray, **kwargs)

Bases: PartialReadBatch

property num_reads: Number of reads.

property read_nums: Read numbers.

seismicrna.mask.batch.apply_mask(batch: SectionMutsBatch, read_nums: ndarray | None = None, section: Section | None = None, sanitize: bool = False)

class seismicrna.mask.data.JoinMaskMutsDataset(*args, **kwargs)

Bases: JoinMutsDataset, MergedUnbiasDataset

classmethod get_batch_type(): Type of batch.

classmethod get_dataset_load_func(): Function to load one constituent dataset.

classmethod get_report_type(): Type of report.

classmethod name_batch_attrs(): Name the attributes of each batch.

class seismicrna.mask.data.MaskMutsDataset(data1: MutsDataset, data2: Dataset)

Bases: ArrowDataset, UnbiasDataset

Chain mutation data with masked reads.

MASK_NAME = 'mask'

classmethod get_dataset1_load_func(): Function to load Dataset 1.

classmethod get_dataset2_type(): Type of Dataset 2.

property min_mut_gap: Minimum gap between two mutations.

property pattern: Pattern of mutations to count.

property quick_unbias: Use the quick heuristic for unbiasing.

property quick_unbias_thresh: Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing.

property section: Section of the dataset.

class seismicrna.mask.data.MaskReadDataset(report: BatchedReport, top: Path)

Bases: LoadedDataset, UnbiasDataset

Load batches of masked relation vectors.

classmethod get_batch_type(): Type of batch.

classmethod get_report_type(): Type of report.

property min_mut_gap: Minimum gap between two mutations.

property pattern: Pattern of mutations to count.

property pos_kept: Positions kept after masking.

property quick_unbias: Use the quick heuristic for unbiasing.

property quick_unbias_thresh: Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing.

class seismicrna.mask.io.MaskBatchIO(*, sect: str, **kwargs)

Bases: ReadBatchIO, MaskIO, MaskReadBatch

classmethod file_seg_type(): Type of the last segment in the path.

class seismicrna.mask.io.MaskIO(*, sect: str, **kwargs)

Bases: SectIO, ABC

classmethod auto_fields(): Names and automatic values of selected fields.

seismicrna.mask.main.load_sections(input_path: Iterable[str | Path], coords: Iterable[tuple[str, int, int]], primers: Iterable[tuple[str, DNA, DNA]], primer_gap: int, sections_file: Path | None = None): Open sections of relate reports.

seismicrna.mask.main.run(input_path: tuple[str, ...], *, mask_coords: tuple[tuple[str, int, int], ...] = (), mask_primers: tuple[tuple[str, DNA, DNA], ...] = (), primer_gap: int = 0, mask_sections_file: str | None = None, mask_del: bool = True, mask_ins: bool = True, mask_mut: tuple[str, ...] = (), mask_polya: int = 5, mask_gu: bool = True, mask_pos: tuple[tuple[str, int], ...] = (), mask_pos_file: str | None = None, mask_discontig: bool = True, min_ninfo_pos: int = 1000, max_fmut_pos: float = 1.0, min_ncov_read: int = 1, min_finfo_read: float = 0.95, max_fmut_read: int = 1.0, min_mut_gap: int = 3, quick_unbias: bool = True, quick_unbias_thresh: float = 0.001, brotli_level: int = 10, max_procs: int = 4, parallel: bool = True, force: bool = False, tmp_pfx='./tmp-', keep_tmp=False) → list[Path]

Define mutations and sections to filter reads and positions.

Parameters:

mask_coords (tuple) – Mask a section of a reference given its 5’ and 3’ end coordinates [keyword-only, default: ()]
mask_primers (tuple) – Mask a section of a reference given its forward and reverse primers [keyword-only, default: ()]
primer_gap (int) – Leave a gap of this many bases between the primer and the section [keyword-only, default: 0]
mask_sections_file (str | None) – Mask sections of references from coordinates/primers in a CSV file [keyword-only, default: None]
mask_del (bool) – Mask deletions [keyword-only, default: True]
mask_ins (bool) – Mask insertions [keyword-only, default: True]
mask_mut (tuple) – Mask this type of mutation [keyword-only, default: ()]
mask_polya (int) – Mask stretches of at least this many consecutive A bases (0 disables) [keyword-only, default: 5]
mask_gu (bool) – Mask G and U bases [keyword-only, default: True]
mask_pos (tuple) – Mask this position in this reference [keyword-only, default: ()]
mask_pos_file (str | None) – Mask positions in references from a file [keyword-only, default: None]
mask_discontig (bool) – Mask paired-end reads with discontiguous mates [keyword-only, default: True]
min_ninfo_pos (int) – Mask positions with fewer than this many unambiguous base calls [keyword-only, default: 1000]
max_fmut_pos (float) – Mask positions with more than this fraction of mutated base calls [keyword-only, default: 1.0]
min_ncov_read (int) – Mask reads with fewer than this many bases covering the section [keyword-only, default: 1]
min_finfo_read (float) – Mask reads with less than this fraction of unambiguous base calls [keyword-only, default: 0.95]
max_fmut_read (int) – Mask reads with more than this fraction of mutated base calls [keyword-only, default: 1.0]
min_mut_gap (int) – Mask reads with two mutations separated by fewer than this many bases [keyword-only, default: 3]
quick_unbias (bool) – Correct observer bias using a quick (typically linear time) heuristic [keyword-only, default: True]
quick_unbias_thresh (float) – Treat mutated fractions under this threshold as 0 with –quick-unbias [keyword-only, default: 0.001]
brotli_level (int) – Compress pickle files with this level of Brotli (0 - 11) [keyword-only, default: 10]
max_procs (int) – Run up to this many processes simultaneously [keyword-only, default: 4]
parallel (bool) – Run tasks in parallel or in series [keyword-only, default: True]
force (bool) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]
tmp_pfx – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp-‘]
keep_tmp – Keep temporary files after finishing [keyword-only, default: False]

class seismicrna.mask.report.MaskReport(**kwargs: Any | Callable[[Report], Any])

Bases: BatchedReport, MaskIO

classmethod auto_fields(): Names and automatic values of selected fields.

classmethod fields(): All fields of the report.

classmethod file_seg_type(): Type of the last segment in the path.

Mask – Write Module

class seismicrna.mask.write.Masker(dataset: RelateDataset | PoolDataset, section: Section, pattern: RelPattern, *, mask_polya: int = 5, mask_gu: bool = True, mask_pos: list[tuple[str, int]] = (), mask_pos_file: Path | None, mask_discontig: bool = True, min_ncov_read: int = 1, min_finfo_read: float = 0.95, max_fmut_read: float = 1.0, min_mut_gap: int = 3, min_ninfo_pos: int = 1000, max_fmut_pos: float = 1.0, quick_unbias: bool = True, quick_unbias_thresh: float = 0.001, brotli_level: int = 10, top: Path)

Bases: object

Mask batches of relation vectors.

CHECKSUM_KEY = 'mask'

MASK_POS_FMUT = 'pos-fmut'

MASK_POS_NINFO = 'pos-ninfo'

MASK_READ_DISCONTIG = 'read-discontig'

MASK_READ_FINFO = 'read-finfo'

MASK_READ_FMUT = 'read-fmut'

MASK_READ_GAP = 'read-gap'

MASK_READ_INIT = 'read-init'

MASK_READ_KEPT = 'read-kept'

MASK_READ_NCOV = 'read-ncov'

PATTERN_KEY = 'pattern'

create_report(began: datetime, ended: datetime)

mask()

property n_batches: Number of batches of reads.

property n_reads_discontig

property n_reads_kept: Number of reads kept.

property n_reads_max_fmut

property n_reads_min_finfo

property n_reads_min_gap

property n_reads_min_ncov

property n_reads_premask

property pos_gu: Positions masked for having a G or U base.

property pos_kept: Positions kept.

property pos_list: Positions masked arbitrarily from a list.

property pos_max_fmut: Positions masked for having too many mutations.

property pos_min_ninfo: Positions masked for having too few informative reads.

property pos_polya: Positions masked for lying in a poly(A) sequence.

seismicrna.mask.write.mask_section(dataset: RelateDataset | PoolDataset, section: Section, mask_del: bool, mask_ins: bool, mask_mut: Iterable[str], *, tmp_dir: Path, force: bool, **kwargs): Filter a section of a set of bit vectors.