seismicrna.mask package
Subpackages
- seismicrna.mask.tests package- Submodules- TestMask
- TestMask1Batch
- TestMask1Sample
- TestMask2Samples
- TestMaskBatches
- TestMaskPaired
- TestMaskSingle
- TestMaskSingle1Sample1Batch- TestMaskSingle1Sample1Batch.test_mask_all_muts_min_ncov_read_7()
- TestMaskSingle1Sample1Batch.test_mask_discontig()
- TestMaskSingle1Sample1Batch.test_mask_gu()
- TestMaskSingle1Sample1Batch.test_mask_gu_min_ncov_read_5()
- TestMaskSingle1Sample1Batch.test_mask_gu_min_ncov_read_6()
- TestMaskSingle1Sample1Batch.test_mask_polya_3()
- TestMaskSingle1Sample1Batch.test_mask_polya_4()
- TestMaskSingle1Sample1Batch.test_mask_pos()
- TestMaskSingle1Sample1Batch.test_mask_pos_all()
- TestMaskSingle1Sample1Batch.test_mask_pos_and_mask_pos_file()
- TestMaskSingle1Sample1Batch.test_mask_pos_file()
- TestMaskSingle1Sample1Batch.test_mask_pos_files()
- TestMaskSingle1Sample1Batch.test_mask_pos_min_ncov_read_5()
- TestMaskSingle1Sample1Batch.test_mask_pos_min_ncov_read_6()
- TestMaskSingle1Sample1Batch.test_mask_pos_multiple()
- TestMaskSingle1Sample1Batch.test_mask_read()
- TestMaskSingle1Sample1Batch.test_mask_read_and_mask_read_file()
- TestMaskSingle1Sample1Batch.test_mask_read_file()
- TestMaskSingle1Sample1Batch.test_mask_read_files()
- TestMaskSingle1Sample1Batch.test_min_finfo_read_1()
- TestMaskSingle1Sample1Batch.test_min_ncov_read_6()
- TestMaskSingle1Sample1Batch.test_min_ncov_read_7()
- TestMaskSingle1Sample1Batch.test_min_ncov_read_8()
- TestMaskSingle1Sample1Batch.test_nomask()
 
- extract_positions()
- extract_read_nums()
- write_datasets()
 
 
- Submodules
Submodules
- class seismicrna.mask.batch.MaskMutsBatch(*, read_nums: ndarray, **kwargs)
- Bases: - MaskReadBatch,- PartialRegionMutsBatch- property read_weights
- Weights for each read when computing counts. 
 
- class seismicrna.mask.batch.MaskReadBatch(*, read_nums: ndarray, **kwargs)
- Bases: - PartialReadBatch- property num_reads
- Number of reads. 
 - property read_nums
- Read numbers. 
 
- class seismicrna.mask.batch.PartialReadBatch(*, batch: int, **kwargs)
- 
- property max_read
- Maximum possible value for a read index. 
 - property read_indexes
- Map each read number to its index in self.read_nums. 
 
- class seismicrna.mask.batch.PartialRegionMutsBatch(*, region: Region, **kwargs)
- Bases: - PartialReadBatch,- RegionMutsBatch,- ABC
- seismicrna.mask.batch.apply_mask(batch: RegionMutsBatch, read_nums: ndarray | None = None, region: Region | None = None, sanitize: bool = False)
- class seismicrna.mask.dataset.JoinMaskMutsDataset(*args, **kwargs)
- Bases: - MaskDataset,- JoinMutsDataset,- MergedUnbiasDataset- classmethod get_batch_type()
- Type of batch. 
 - classmethod get_dataset_load_func()
- Function to load one constituent dataset. 
 - classmethod get_report_type()
- Type of report. 
 - classmethod name_batch_attrs()
- Name the attributes of each batch. 
 
- class seismicrna.mask.dataset.MaskDataset(report_file: str | Path, verify_times: bool = True)
- Bases: - AverageDataset,- ABC- Dataset of masked data. 
- class seismicrna.mask.dataset.MaskMutsDataset(dataset2_report_file: Path, **kwargs)
- Bases: - MaskDataset,- MultistepDataset,- UnbiasDataset- Chain mutation data with masked reads. - MASK_NAME = 'mask'
 - classmethod get_dataset1_load_func()
- Function to load Dataset 1. 
 - classmethod get_dataset2_type()
- Type of Dataset 2. 
 - property min_mut_gap
- Minimum gap between two mutations. 
 - property pattern
- Pattern of mutations to count. 
 - property quick_unbias
- Use the quick heuristic for unbiasing. 
 - property quick_unbias_thresh
- Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing. 
 - property region
- Region of the dataset. 
 
- class seismicrna.mask.dataset.MaskReadDataset(*args, masked_read_nums: dict[[<class 'int'>, <class 'list'>]] | None = None, **kwargs)
- Bases: - MaskDataset,- LoadedDataset,- UnbiasDataset- Load batches of masked data. - classmethod get_batch_type()
- Type of batch. 
 - classmethod get_report_type()
- Type of report. 
 - property min_mut_gap
- Minimum gap between two mutations. 
 - property pattern
- Pattern of mutations to count. 
 - property pos_kept
- Positions kept after masking. 
 - property quick_unbias
- Use the quick heuristic for unbiasing. 
 - property quick_unbias_thresh
- Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing. 
 
- class seismicrna.mask.io.MaskBatchIO(*, read_nums: ndarray, **kwargs)
- Bases: - MaskReadBatch,- ReadBatchIO,- RegBrickleIO,- MaskIO- classmethod get_file_seg_type()
- Type of the last segment in the path. 
 
- class seismicrna.mask.io.MaskFile
- Bases: - HasRegFilePath,- ABC- classmethod get_step()
- Step of the workflow. 
 
- class seismicrna.mask.lists.MaskPositionList(*, reg: str, **kwargs)
- Bases: - PositionList,- MaskList- classmethod get_table_type()
- Type of table that this type of list can process. 
 - classmethod list_init_table_attrs()
- List the table attribute names to pass to __init__(). 
 
- seismicrna.mask.main.load_regions(input_path: Iterable[str | Path], coords: Iterable[tuple[str, int, int]], primers: Iterable[tuple[str, DNA, DNA]], primer_gap: int, regions_file: Path | None = None)
- Load regions of relate reports. 
- seismicrna.mask.main.run(input_path: Iterable[str | Path], *, branch: str = '', tmp_pfx: str | Path = './tmp', keep_tmp: bool = False, mask_coords: Iterable[tuple[str, int, int]] = (), mask_primers: Iterable[tuple[str, DNA, DNA]] = (), primer_gap: int = 0, mask_regions_file: str | None = None, mask_del: bool = True, mask_ins: bool = True, mask_mut: Iterable[str] = (), count_mut: Iterable[str] = (), mask_polya: int = 5, mask_gu: bool = True, mask_pos: Iterable[tuple[str, int]] = (), mask_pos_file: Iterable[str | Path] = (), mask_read: Iterable[str] = (), mask_read_file: Iterable[str | Path] = (), mask_discontig: bool = True, min_ninfo_pos: int = 1000, max_fmut_pos: float = 1.0, min_ncov_read: int = 1, min_finfo_read: float = 0.95, max_fmut_read: float = 1.0, min_mut_gap: int = 4, quick_unbias: bool = True, quick_unbias_thresh: float = 0.001, max_mask_iter: int = 0, mask_pos_table: bool = True, mask_read_table: bool = True, brotli_level: int = 10, num_cpus: int = 4, force: bool = False) list[Path]
- Define mutations and regions to filter reads and positions. - Parameters:
- branch ( - str) – Create a new branch of the workflow with this name [keyword-only, default: ‘’]
- tmp_pfx ( - str | pathlib._local.Path) – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp’]
- keep_tmp ( - bool) – Keep temporary files after finishing [keyword-only, default: False]
- mask_coords ( - Iterable) – Select a region of a reference given its 5’ and 3’ end coordinates [keyword-only, default: ()]
- mask_primers ( - Iterable) – Select a region of a reference given its forward and reverse primers [keyword-only, default: ()]
- primer_gap ( - int) – Leave a gap of this many bases between the primer and the region [keyword-only, default: 0]
- mask_regions_file ( - str | None) – Select regions of references from coordinates/primers in a CSV file [keyword-only, default: None]
- mask_del ( - bool) – Mask deletions [keyword-only, default: True]
- mask_ins ( - bool) – Mask insertions [keyword-only, default: True]
- mask_mut ( - Iterable) – Mask this type of mutation [keyword-only, default: ()]
- count_mut ( - Iterable) – Count only this type of mutation [keyword-only, default: ()]
- mask_polya ( - int) – Mask stretches of at least this many consecutive A bases (0 disables) [keyword-only, default: 5]
- mask_gu ( - bool) – Mask G and U bases [keyword-only, default: True]
- mask_pos ( - Iterable) – Mask this position in this reference [keyword-only, default: ()]
- mask_pos_file ( - Iterable) – Mask positions in references from a file [keyword-only, default: ()]
- mask_read ( - Iterable) – Mask the read with this name [keyword-only, default: ()]
- mask_read_file ( - Iterable) – Mask the reads with names in this file [keyword-only, default: ()]
- mask_discontig ( - bool) – Mask paired-end reads with discontiguous mates [keyword-only, default: True]
- min_ninfo_pos ( - int) – Mask positions with fewer than this many informative base calls [keyword-only, default: 1000]
- max_fmut_pos ( - float) – Mask positions with more than this fraction of mutated base calls [keyword-only, default: 1.0]
- min_ncov_read ( - int) – Mask reads with fewer than this many bases covering the region [keyword-only, default: 1]
- min_finfo_read ( - float) – Mask reads with less than this fraction of informative base calls [keyword-only, default: 0.95]
- max_fmut_read ( - float) – Mask reads with more than this fraction of mutated base calls [keyword-only, default: 1.0]
- min_mut_gap ( - int) – Mask reads with two mutations separated by fewer than this many bases [keyword-only, default: 4]
- quick_unbias ( - bool) – Correct observer bias using a quick (typically linear time) heuristic [keyword-only, default: True]
- quick_unbias_thresh ( - float) – Treat mutated fractions under this threshold as 0 with –quick-unbias [keyword-only, default: 0.001]
- max_mask_iter ( - int) – Stop masking after this many iterations (0 for no limit) [keyword-only, default: 0]
- mask_pos_table ( - bool) – Tabulate relationships per position for mask data [keyword-only, default: True]
- mask_read_table ( - bool) – Tabulate relationships per read for mask data [keyword-only, default: True]
- brotli_level ( - int) – Compress pickle files with this level of Brotli (0 - 11) [keyword-only, default: 10]
- num_cpus ( - int) – Use up to this many CPUs simultaneously [keyword-only, default: 4]
- force ( - bool) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]
 
 
- class seismicrna.mask.report.BaseMaskReport(**kwargs: Any | Callable[[Report], Any])
- 
- classmethod get_file_seg_type()
- Type of the last segment in the path. 
 
- class seismicrna.mask.report.JoinMaskReport(**kwargs: Any | Callable[[Report], Any])
- Bases: - JoinReport,- BaseMaskReport- classmethod get_param_report_fields()
- Parameter fields of the report. 
 
- class seismicrna.mask.report.MaskReport(**kwargs: Any | Callable[[Report], Any])
- Bases: - BatchedReport,- BaseMaskReport- classmethod get_param_report_fields()
- Parameter fields of the report. 
 - classmethod get_result_report_fields()
- Result fields of the report. 
 
- class seismicrna.mask.table.MaskBatchTabulator(*, get_batch_count_all: Callable, num_batches: int, num_cpus: int = 1, **kwargs)
- Bases: - BatchTabulator,- MaskTabulator
- class seismicrna.mask.table.MaskCountTabulator(*, batch_counts: Iterable[tuple[Any, Any, Any, Any]], **kwargs)
- Bases: - CountTabulator,- MaskTabulator
- class seismicrna.mask.table.MaskDatasetTabulator(*, dataset: MutsDataset, validate: bool = False, **kwargs)
- Bases: - PartialDatasetTabulator,- MaskTabulator
- class seismicrna.mask.table.MaskPositionTable
- Bases: - MaskTable,- PartialPositionTable,- ABC
- class seismicrna.mask.table.MaskPositionTableLoader(table_file: str | Path, **kwargs)
- Bases: - PositionTableLoader,- MaskPositionTable
- class seismicrna.mask.table.MaskPositionTableWriter(tabulator: Tabulator)
- Bases: - PositionTableWriter,- MaskPositionTable
- class seismicrna.mask.table.MaskReadTable
- Bases: - MaskTable,- PartialReadTable,- ABC
- class seismicrna.mask.table.MaskReadTableLoader(table_file: str | Path, **kwargs)
- Bases: - ReadTableLoader,- MaskReadTable
- class seismicrna.mask.table.MaskReadTableWriter(tabulator: Tabulator)
- Bases: - ReadTableWriter,- MaskReadTable
- class seismicrna.mask.table.MaskTable
- Bases: - AverageTable,- MaskFile,- ABC- classmethod get_load_function()
- LoadFunction for all Dataset types for this Table. 
 
- class seismicrna.mask.table.MaskTabulator(*, refseq: DNA, region: Region, pattern: RelPattern, min_mut_gap: int, quick_unbias: bool, quick_unbias_thresh: float, **kwargs)
- Bases: - PartialTabulator,- AverageTabulator,- ABC- classmethod table_types()
- Types of tables that this tabulator can write. 
 
- class seismicrna.mask.table.PartialDatasetTabulator(*, dataset: MutsDataset, validate: bool = False, **kwargs)
- Bases: - DatasetTabulator,- PartialTabulator,- ABC- classmethod init_kws()
- Attributes of the dataset to use as keyword arguments in super().__init__(). 
 
- class seismicrna.mask.table.PartialPositionTable
- Bases: - PartialTable,- PositionTable,- ABC
- class seismicrna.mask.table.PartialReadTable
- Bases: - PartialTable,- ReadTable,- ABC
- class seismicrna.mask.table.PartialTable
- Bases: - Table,- HasRegFilePath,- ABC- Table of filtered reads over a region of the sequence. 
- class seismicrna.mask.table.PartialTabulator(*, refseq: DNA, region: Region, pattern: RelPattern, min_mut_gap: int, quick_unbias: bool, quick_unbias_thresh: float, **kwargs)
- 
- property correct_bias
 - property data_per_pos
- DataFrame of per-position data. 
 - classmethod get_null_value()
- The null value for a count: either 0 or NaN. 
 - property p_ends_given_clust_noclose
- Probability of each end coordinate. 
 
- seismicrna.mask.table.adjust_counts(table_per_pos: DataFrame, p_ends_given_clust_noclose: ndarray, n_reads_clust: Series | int, region: Region, min_mut_gap: int, quick_unbias: bool, quick_unbias_thresh: float)
- Adjust the given table of masked/clustered counts per position to correct for observer bias. 
- class seismicrna.mask.write.Masker(dataset: RelateMutsDataset | PoolDataset, region: Region, pattern: RelPattern, *, max_mask_iter: int = 0, mask_polya: int = 5, mask_gu: bool = True, mask_pos: list[tuple[str, int]] = (), mask_pos_file: list[Path] = (), mask_read: list[str] = (), mask_read_file: list[Path] = (), mask_discontig: bool = True, min_ncov_read: int = 1, min_finfo_read: float = 0.95, max_fmut_read: float = 1.0, min_mut_gap: int = 4, min_ninfo_pos: int = 1000, max_fmut_pos: float = 1.0, quick_unbias: bool = True, quick_unbias_thresh: float = 0.001, count_read: bool, brotli_level: int = 10, top: Path, branch: str = '', num_cpus: int = 4)
- Bases: - object- Mask batches of relationships. - CHECKSUM_KEY = 'mask'
 - MASK_POS_FMUT = 'pos-fmut'
 - MASK_POS_NINFO = 'pos-ninfo'
 - MASK_READ_DISCONTIG = 'read-discontig'
 - MASK_READ_FINFO = 'read-finfo'
 - MASK_READ_FMUT = 'read-fmut'
 - MASK_READ_GAP = 'read-gap'
 - MASK_READ_INIT = 'read-init'
 - MASK_READ_KEPT = 'read-kept'
 - MASK_READ_LIST = 'read-exclude'
 - MASK_READ_NCOV = 'read-ncov'
 - PATTERN_KEY = 'pattern'
 - create_report()
 - mask()
 - property n_reads_discontig
 - property n_reads_init
 - property n_reads_kept
- Number of reads kept. 
 - property n_reads_list
 - property n_reads_max_fmut
 - property n_reads_min_finfo
 - property n_reads_min_gap
 - property n_reads_min_ncov
 - property pos_gu
- Positions masked for having a G or U base. 
 - property pos_kept
- Positions kept. 
 - property pos_list
- Positions masked arbitrarily from a list. 
 - property pos_max_fmut
- Positions masked for having too many mutations. 
 - property pos_min_ninfo
- Positions masked for having too few informative reads. 
 - property pos_polya
- Positions masked for lying in a poly(A) sequence. 
 - property read_names_dataset
- Dataset of the read names. 
 
- seismicrna.mask.write.get_pattern(mask_del: bool, mask_ins: bool, mask_mut: Iterable[str], count_mut: Iterable[str])
- seismicrna.mask.write.mask_region(dataset: RelateMutsDataset | PoolDataset, region: Region, *, branch: str, mask_del: bool, mask_ins: bool, mask_mut: Iterable[str], count_mut: Iterable[str], mask_pos_table: bool, mask_read_table: bool, force: bool, num_cpus: int, tmp_pfx, keep_tmp, **kwargs)
- Mask out certain reads, positions, and relationships.