seismicrna.mask package
Subpackages
- seismicrna.mask.tests package
- Submodules
TestMask
TestMask1Batch
TestMask1Sample
TestMask2Samples
TestMaskBatches
TestMaskPaired
TestMaskSingle
TestMaskSingle1Sample1Batch
TestMaskSingle1Sample1Batch.test_mask_all_muts_min_ncov_read_7()
TestMaskSingle1Sample1Batch.test_mask_discontig()
TestMaskSingle1Sample1Batch.test_mask_gu()
TestMaskSingle1Sample1Batch.test_mask_gu_min_ncov_read_5()
TestMaskSingle1Sample1Batch.test_mask_gu_min_ncov_read_6()
TestMaskSingle1Sample1Batch.test_mask_polya_3()
TestMaskSingle1Sample1Batch.test_mask_polya_4()
TestMaskSingle1Sample1Batch.test_mask_pos()
TestMaskSingle1Sample1Batch.test_mask_pos_all()
TestMaskSingle1Sample1Batch.test_mask_pos_and_mask_pos_file()
TestMaskSingle1Sample1Batch.test_mask_pos_file()
TestMaskSingle1Sample1Batch.test_mask_pos_min_ncov_read_5()
TestMaskSingle1Sample1Batch.test_mask_pos_min_ncov_read_6()
TestMaskSingle1Sample1Batch.test_mask_pos_multiple()
TestMaskSingle1Sample1Batch.test_mask_read()
TestMaskSingle1Sample1Batch.test_mask_read_and_mask_read_file()
TestMaskSingle1Sample1Batch.test_mask_read_file()
TestMaskSingle1Sample1Batch.test_min_finfo_read_1()
TestMaskSingle1Sample1Batch.test_min_ncov_read_6()
TestMaskSingle1Sample1Batch.test_min_ncov_read_7()
TestMaskSingle1Sample1Batch.test_min_ncov_read_8()
TestMaskSingle1Sample1Batch.test_nomask()
extract_positions()
extract_read_nums()
write_datasets()
- Submodules
Submodules
- class seismicrna.mask.batch.MaskMutsBatch(*, read_nums: ndarray, **kwargs)
Bases:
MaskReadBatch
,PartialRegionMutsBatch
- property read_weights
Weights for each read when computing counts.
- class seismicrna.mask.batch.MaskReadBatch(*, read_nums: ndarray, **kwargs)
Bases:
PartialReadBatch
- property num_reads
Number of reads.
- property read_nums
Read numbers.
- class seismicrna.mask.batch.PartialReadBatch(*, batch: int)
-
- property max_read
Maximum possible value for a read index.
- property read_indexes
Map each read number to its index in self.read_nums.
- class seismicrna.mask.batch.PartialRegionMutsBatch(*, region: Region, **kwargs)
Bases:
PartialReadBatch
,RegionMutsBatch
,ABC
- seismicrna.mask.batch.apply_mask(batch: RegionMutsBatch, read_nums: ndarray | None = None, region: Region | None = None, sanitize: bool = False)
- class seismicrna.mask.dataset.JoinMaskMutsDataset(*args, **kwargs)
Bases:
MaskDataset
,JoinMutsDataset
,MergedUnbiasDataset
- classmethod get_batch_type()
Type of batch.
- classmethod get_dataset_load_func()
Function to load one constituent dataset.
- classmethod get_report_type()
Type of report.
- classmethod name_batch_attrs()
Name the attributes of each batch.
- class seismicrna.mask.dataset.MaskDataset(report_file: Path, verify_times: bool = True)
Bases:
AverageDataset
,ABC
Dataset of masked data.
- class seismicrna.mask.dataset.MaskMutsDataset(dataset2_report_file: Path, **kwargs)
Bases:
MaskDataset
,MultistepDataset
,UnbiasDataset
Chain mutation data with masked reads.
- MASK_NAME = 'mask'
- classmethod get_dataset1_load_func()
Function to load Dataset 1.
- classmethod get_dataset2_type()
Type of Dataset 2.
- property min_mut_gap
Minimum gap between two mutations.
- property pattern
Pattern of mutations to count.
- property quick_unbias
Use the quick heuristic for unbiasing.
- property quick_unbias_thresh
Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing.
- property region
Region of the dataset.
- class seismicrna.mask.dataset.MaskReadDataset(*args, masked_read_nums: dict[[<class 'int'>, <class 'list'>]] | None = None, **kwargs)
Bases:
MaskDataset
,LoadedDataset
,UnbiasDataset
Load batches of masked data.
- classmethod get_batch_type()
Type of batch.
- classmethod get_report_type()
Type of report.
- property min_mut_gap
Minimum gap between two mutations.
- property pattern
Pattern of mutations to count.
- property pos_kept
Positions kept after masking.
- property quick_unbias
Use the quick heuristic for unbiasing.
- property quick_unbias_thresh
Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing.
- class seismicrna.mask.io.MaskBatchIO(*, reg: str, **kwargs)
Bases:
ReadBatchIO
,MaskIO
,MaskReadBatch
- classmethod file_seg_type()
Type of the last segment in the path.
- class seismicrna.mask.io.MaskIO(*, reg: str, **kwargs)
-
- classmethod auto_fields()
Names and automatic values of selected fields.
- seismicrna.mask.main.load_regions(input_path: Iterable[str | Path], coords: Iterable[tuple[str, int, int]], primers: Iterable[tuple[str, DNA, DNA]], primer_gap: int, regions_file: Path | None = None)
Load regions of relate reports.
- seismicrna.mask.main.run(input_path: Iterable[str | Path], *, tmp_pfx: str | Path = './tmp', keep_tmp: bool = False, mask_coords: Iterable[tuple[str, int, int]] = (), mask_primers: Iterable[tuple[str, DNA, DNA]] = (), primer_gap: int = 0, mask_regions_file: str | None = None, mask_del: bool = True, mask_ins: bool = True, mask_mut: Iterable[str] = (), mask_polya: int = 5, mask_gu: bool = True, mask_pos: Iterable[tuple[str, int]] = (), mask_pos_file: str | None = None, mask_read: Iterable[str] = (), mask_read_file: str | None = None, mask_discontig: bool = True, min_ninfo_pos: int = 1000, max_fmut_pos: float = 1.0, min_ncov_read: int = 1, min_finfo_read: float = 0.95, max_fmut_read: float = 1.0, min_mut_gap: int = 3, quick_unbias: bool = True, quick_unbias_thresh: float = 0.001, max_mask_iter: int = 0, mask_pos_table: bool = True, mask_read_table: bool = True, brotli_level: int = 10, max_procs: int = 4, force: bool = False) list[Path]
Define mutations and regions to filter reads and positions.
- Parameters:
tmp_pfx (
str | pathlib._local.Path
) – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp’]keep_tmp (
bool
) – Keep temporary files after finishing [keyword-only, default: False]mask_coords (
Iterable
) – Select a region of a reference given its 5’ and 3’ end coordinates [keyword-only, default: ()]mask_primers (
Iterable
) – Select a region of a reference given its forward and reverse primers [keyword-only, default: ()]primer_gap (
int
) – Leave a gap of this many bases between the primer and the region [keyword-only, default: 0]mask_regions_file (
str | None
) – Select regions of references from coordinates/primers in a CSV file [keyword-only, default: None]mask_del (
bool
) – Mask deletions [keyword-only, default: True]mask_ins (
bool
) – Mask insertions [keyword-only, default: True]mask_mut (
Iterable
) – Mask this type of mutation [keyword-only, default: ()]mask_polya (
int
) – Mask stretches of at least this many consecutive A bases (0 disables) [keyword-only, default: 5]mask_gu (
bool
) – Mask G and U bases [keyword-only, default: True]mask_pos (
Iterable
) – Mask this position in this reference [keyword-only, default: ()]mask_pos_file (
str | None
) – Mask positions in references from a file [keyword-only, default: None]mask_read (
Iterable
) – Mask the read with this name [keyword-only, default: ()]mask_read_file (
str | None
) – Mask the reads with names in this file [keyword-only, default: None]mask_discontig (
bool
) – Mask paired-end reads with discontiguous mates [keyword-only, default: True]min_ninfo_pos (
int
) – Mask positions with fewer than this many informative base calls [keyword-only, default: 1000]max_fmut_pos (
float
) – Mask positions with more than this fraction of mutated base calls [keyword-only, default: 1.0]min_ncov_read (
int
) – Mask reads with fewer than this many bases covering the region [keyword-only, default: 1]min_finfo_read (
float
) – Mask reads with less than this fraction of informative base calls [keyword-only, default: 0.95]max_fmut_read (
float
) – Mask reads with more than this fraction of mutated base calls [keyword-only, default: 1.0]min_mut_gap (
int
) – Mask reads with two mutations separated by fewer than this many bases [keyword-only, default: 3]quick_unbias (
bool
) – Correct observer bias using a quick (typically linear time) heuristic [keyword-only, default: True]quick_unbias_thresh (
float
) – Treat mutated fractions under this threshold as 0 with –quick-unbias [keyword-only, default: 0.001]max_mask_iter (
int
) – Stop masking after this many iterations (0 for no limit) [keyword-only, default: 0]mask_pos_table (
bool
) – Tabulate relationships per position for mask data [keyword-only, default: True]mask_read_table (
bool
) – Tabulate relationships per read for mask data [keyword-only, default: True]brotli_level (
int
) – Compress pickle files with this level of Brotli (0 - 11) [keyword-only, default: 10]max_procs (
int
) – Run up to this many processes simultaneously [keyword-only, default: 4]force (
bool
) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]
- class seismicrna.mask.report.JoinMaskReport(**kwargs: Any | Callable[[Report], Any])
Bases:
JoinReport
- classmethod auto_fields()
Names and automatic values of selected fields.
- classmethod fields()
All fields of the report.
- classmethod file_seg_type()
Type of the last segment in the path.
- class seismicrna.mask.report.MaskReport(**kwargs: Any | Callable[[Report], Any])
Bases:
BatchedReport
,MaskIO
- classmethod auto_fields()
Names and automatic values of selected fields.
- classmethod fields()
All fields of the report.
- classmethod file_seg_type()
Type of the last segment in the path.
- class seismicrna.mask.table.MaskBatchTabulator(*, get_batch_count_all: Callable, num_batches: int, max_procs: int = 1, **kwargs)
Bases:
BatchTabulator
,MaskTabulator
- class seismicrna.mask.table.MaskCountTabulator(*, batch_counts: Iterable[tuple[Any, Any, Any, Any]], **kwargs)
Bases:
CountTabulator
,MaskTabulator
- class seismicrna.mask.table.MaskDatasetTabulator(*, dataset: MutsDataset, validate: bool = False, **kwargs)
Bases:
PartialDatasetTabulator
,MaskTabulator
- classmethod load_function()
LoadFunction for all Dataset types for this Tabulator.
- class seismicrna.mask.table.MaskPositionTable
Bases:
MaskTable
,PartialPositionTable
,ABC
- class seismicrna.mask.table.MaskPositionTableLoader(table_file: Path)
Bases:
PositionTableLoader
,MaskPositionTable
- class seismicrna.mask.table.MaskPositionTableWriter(tabulator: Tabulator)
Bases:
PositionTableWriter
,MaskPositionTable
- class seismicrna.mask.table.MaskReadTable
Bases:
MaskTable
,PartialReadTable
,ABC
- class seismicrna.mask.table.MaskReadTableLoader(table_file: Path)
Bases:
ReadTableLoader
,MaskReadTable
- class seismicrna.mask.table.MaskReadTableWriter(tabulator: Tabulator)
Bases:
ReadTableWriter
,MaskReadTable
- class seismicrna.mask.table.MaskTable
Bases:
AverageTable
,ABC
- classmethod kind()
Kind of table.
- class seismicrna.mask.table.MaskTabulator(*, refseq: DNA, region: Region, pattern: RelPattern, min_mut_gap: int, quick_unbias: bool, quick_unbias_thresh: float, count_ends: bool = True, **kwargs)
Bases:
PartialTabulator
,AverageTabulator
,ABC
- classmethod table_types()
Types of tables that this tabulator can write.
- class seismicrna.mask.table.PartialDatasetTabulator(*, dataset: MutsDataset, validate: bool = False, **kwargs)
Bases:
DatasetTabulator
,PartialTabulator
,ABC
- classmethod init_kws()
Attributes of the dataset to use as keyword arguments in super().__init__().
- class seismicrna.mask.table.PartialPositionTable
Bases:
PartialTable
,PositionTable
,ABC
- classmethod path_segs()
Table’s path segments.
- class seismicrna.mask.table.PartialReadTable
Bases:
PartialTable
,ReadTable
,ABC
- classmethod path_segs()
Table’s path segments.
- class seismicrna.mask.table.PartialTabulator(*, refseq: DNA, region: Region, pattern: RelPattern, min_mut_gap: int, quick_unbias: bool, quick_unbias_thresh: float, count_ends: bool = True, **kwargs)
-
- property data_per_pos
DataFrame of per-position data.
- classmethod get_null_value()
The null value for a count: either 0 or NaN.
- property p_ends_given_clust_noclose
Probability of each end coordinate.
- seismicrna.mask.table.adjust_counts(table_per_pos: DataFrame, p_ends_given_clust_noclose: ndarray, n_reads_clust: Series | int, region: Region, min_mut_gap: int, quick_unbias: bool, quick_unbias_thresh: float)
Adjust the given table of masked/clustered counts per position to correct for observer bias.
- class seismicrna.mask.write.Masker(dataset: RelateMutsDataset | PoolDataset, region: Region, pattern: RelPattern, *, max_mask_iter: int = 0, mask_polya: int = 5, mask_gu: bool = True, mask_pos: list[tuple[str, int]] = (), mask_pos_file: Path | None, mask_read: list[str] = (), mask_read_file: Path | None, mask_discontig: bool = True, min_ncov_read: int = 1, min_finfo_read: float = 0.95, max_fmut_read: float = 1.0, min_mut_gap: int = 3, min_ninfo_pos: int = 1000, max_fmut_pos: float = 1.0, quick_unbias: bool = True, quick_unbias_thresh: float = 0.001, count_read: bool, brotli_level: int = 10, top: Path, max_procs: int = 4)
Bases:
object
Mask batches of relation vectors.
- CHECKSUM_KEY = 'mask'
- MASK_POS_FMUT = 'pos-fmut'
- MASK_POS_NINFO = 'pos-ninfo'
- MASK_READ_DISCONTIG = 'read-discontig'
- MASK_READ_FINFO = 'read-finfo'
- MASK_READ_FMUT = 'read-fmut'
- MASK_READ_GAP = 'read-gap'
- MASK_READ_INIT = 'read-init'
- MASK_READ_KEPT = 'read-kept'
- MASK_READ_LIST = 'read-exclude'
- MASK_READ_NCOV = 'read-ncov'
- PATTERN_KEY = 'pattern'
- create_report()
- mask()
- property n_reads_discontig
- property n_reads_init
- property n_reads_kept
Number of reads kept.
- property n_reads_list
- property n_reads_max_fmut
- property n_reads_min_finfo
- property n_reads_min_gap
- property n_reads_min_ncov
- property pos_gu
Positions masked for having a G or U base.
- property pos_kept
Positions kept.
- property pos_list
Positions masked arbitrarily from a list.
- property pos_max_fmut
Positions masked for having too many mutations.
- property pos_min_ninfo
Positions masked for having too few informative reads.
- property pos_polya
Positions masked for lying in a poly(A) sequence.
- property read_names_dataset
Dataset of the read names.
- seismicrna.mask.write.mask_region(dataset: RelateMutsDataset | PoolDataset, region: Region, *, mask_del: bool, mask_ins: bool, mask_mut: Iterable[str], mask_pos_table: bool, mask_read_table: bool, force: bool, n_procs: int, tmp_pfx, keep_tmp, **kwargs)
Mask out certain reads, positions, and relationships.