seismicrna.mask package
Subpackages
- seismicrna.mask.tests package
- Submodules
TestMaskTestMask1BatchTestMask1SampleTestMask2SamplesTestMaskBatchesTestMaskPairedTestMaskSingleTestMaskSingle1Sample1BatchTestMaskSingle1Sample1Batch.test_dms_min_ncov_read_5()TestMaskSingle1Sample1Batch.test_dms_min_ncov_read_6()TestMaskSingle1Sample1Batch.test_mask_a()TestMaskSingle1Sample1Batch.test_mask_all_muts_min_ncov_read_7()TestMaskSingle1Sample1Batch.test_mask_c()TestMaskSingle1Sample1Batch.test_mask_discontig()TestMaskSingle1Sample1Batch.test_mask_dms()TestMaskSingle1Sample1Batch.test_mask_dms_mask_a()TestMaskSingle1Sample1Batch.test_mask_dms_mask_c()TestMaskSingle1Sample1Batch.test_mask_dms_nomask_g()TestMaskSingle1Sample1Batch.test_mask_dms_nomask_u()TestMaskSingle1Sample1Batch.test_mask_etc()TestMaskSingle1Sample1Batch.test_mask_etc_mask_g()TestMaskSingle1Sample1Batch.test_mask_etc_mask_u()TestMaskSingle1Sample1Batch.test_mask_etc_nomask_a()TestMaskSingle1Sample1Batch.test_mask_etc_nomask_c()TestMaskSingle1Sample1Batch.test_mask_g()TestMaskSingle1Sample1Batch.test_mask_polya_3()TestMaskSingle1Sample1Batch.test_mask_polya_4()TestMaskSingle1Sample1Batch.test_mask_pos()TestMaskSingle1Sample1Batch.test_mask_pos_all()TestMaskSingle1Sample1Batch.test_mask_pos_and_mask_pos_file()TestMaskSingle1Sample1Batch.test_mask_pos_file()TestMaskSingle1Sample1Batch.test_mask_pos_files()TestMaskSingle1Sample1Batch.test_mask_pos_min_ncov_read_5()TestMaskSingle1Sample1Batch.test_mask_pos_min_ncov_read_6()TestMaskSingle1Sample1Batch.test_mask_pos_multiple()TestMaskSingle1Sample1Batch.test_mask_read()TestMaskSingle1Sample1Batch.test_mask_read_and_mask_read_file()TestMaskSingle1Sample1Batch.test_mask_read_file()TestMaskSingle1Sample1Batch.test_mask_read_files()TestMaskSingle1Sample1Batch.test_mask_shape()TestMaskSingle1Sample1Batch.test_mask_u()TestMaskSingle1Sample1Batch.test_min_fcov_read_amplicons()TestMaskSingle1Sample1Batch.test_min_finfo_read_1()TestMaskSingle1Sample1Batch.test_min_ncov_read_6()TestMaskSingle1Sample1Batch.test_min_ncov_read_7()TestMaskSingle1Sample1Batch.test_min_ncov_read_8()TestMaskSingle1Sample1Batch.test_nomask()
extract_positions()extract_read_nums()write_datasets()
- Submodules
Submodules
- class seismicrna.mask.batch.MaskMutsBatch(*, read_nums: ndarray, **kwargs)
Bases:
MaskReadBatch,PartialRegionMutsBatch- property read_weights
Weights for each read when computing counts.
- class seismicrna.mask.batch.MaskReadBatch(*, read_nums: ndarray, **kwargs)
Bases:
PartialReadBatch- property num_reads
Number of reads.
- property read_nums
Read numbers.
- class seismicrna.mask.batch.PartialReadBatch(*, batch: int, **kwargs)
-
- property max_read
Maximum possible value for a read index.
- property read_indexes
Map each read number to its index in self.read_nums.
- class seismicrna.mask.batch.PartialRegionMutsBatch(*, region: Region, **kwargs)
Bases:
PartialReadBatch,RegionMutsBatch,ABC
- seismicrna.mask.batch.apply_mask(batch: RegionMutsBatch, read_nums: ndarray | None = None, region: Region | None = None, sanitize: bool = False)
Apply a read/position mask to a batch, returning a MaskMutsBatch.
- Parameters:
batch (
RegionMutsBatch) – Source batch to mask.read_nums (
np.ndarrayorNone, optional) – Array of read numbers to keep; if None, all reads are kept.region (
RegionorNone, optional) – Region to clip reads to; if None, the batch’s existing region is used.sanitize (
bool, optional) – Whether to run extra validation checks on the new batch (default False).
- Returns:
A new batch containing only the selected reads and positions.
- Return type:
- class seismicrna.mask.dataset.JoinMaskMutsDataset(*args, **kwargs)
Bases:
MaskDataset,JoinMutsDataset,MergedUnbiasDataset- classmethod get_batch_type()
Type of batch.
- classmethod get_dataset_load_func()
Function to load one constituent dataset.
- classmethod get_report_type()
Type of report.
- classmethod name_batch_attrs()
Name the attributes of each batch.
- class seismicrna.mask.dataset.MaskDataset(report_file: str | Path, verify_times: bool = True)
Bases:
AverageDataset,ABCDataset of masked data.
- class seismicrna.mask.dataset.MaskMutsDataset(dataset2_report_file: Path, **kwargs)
Bases:
MaskDataset,MultistepDataset,UnbiasDatasetChain mutation data with masked reads.
- MASK_NAME = 'mask'
- classmethod get_dataset1_load_func()
Function to load Dataset 1.
- classmethod get_dataset2_type()
Type of Dataset 2.
- property min_mut_gap
Minimum gap between two mutations.
- property mut_collisions
Method for handling mutations that are too close.
- property pattern
Pattern of mutations to count.
- property probe
Chemical probe used for the experiment.
- property quick_unbias
Use the quick heuristic for unbiasing.
- property quick_unbias_thresh
Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing.
- property region
Region of the dataset.
- class seismicrna.mask.dataset.MaskReadDataset(*args, masked_read_nums: dict[int, list] | None = None, **kwargs)
Bases:
MaskDataset,LoadedDataset,UnbiasDatasetLoad batches of masked data.
- classmethod get_batch_type()
Type of batch.
- classmethod get_report_type()
Type of report.
- property min_mut_gap
Minimum gap between two mutations.
- property mut_collisions
Method for handling mutations that are too close.
- property pattern
Pattern of mutations to count.
- property pos_kept
Positions kept after masking.
- property probe
Chemical probe used for the experiment.
- property quick_unbias
Use the quick heuristic for unbiasing.
- property quick_unbias_thresh
Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing.
- class seismicrna.mask.io.MaskBatchIO(*, read_nums: ndarray, **kwargs)
Bases:
MaskReadBatch,ReadBatchIO,RegBrickleIO,MaskIO- classmethod get_file_seg_type()
Type of the last segment in the path.
- class seismicrna.mask.io.MaskFile
Bases:
HasRegFilePath,ABC- classmethod get_step()
Step of the workflow.
- class seismicrna.mask.lists.MaskPositionList(*, reg: str, **kwargs)
Bases:
PositionList,MaskList- classmethod get_table_type()
Type of table that this type of list can process.
- classmethod list_init_table_attrs()
List the table attribute names to pass to __init__().
- seismicrna.mask.main.load_regions(input_path: Iterable[str | Path], coords: Iterable[tuple[str, int, int]], primers: Iterable[tuple[str, DNA, DNA]], primer_gap: int, regions_file: Path | None = None)
Load regions of relate reports.
- seismicrna.mask.main.run(input_path: Iterable[str | Path] = Sentinel.UNSET, *, branch: str = '', tmp_pfx: str | Path = './tmp', keep_tmp: bool = False, mask_coords: Iterable[tuple[str, int, int]] = (), mask_primers: Iterable[tuple[str, DNA, DNA]] = (), primer_gap: int = 0, mask_regions_file: str | None = None, count_del: bool = True, count_ins: bool = True, no_mut: Iterable[str] = (), only_mut: Iterable[str] = (), probe: str = 'DMS', mask_a: bool | None = None, mask_c: bool | None = None, mask_g: bool | None = None, mask_u: bool | None = None, mask_polya: int = 5, mask_pos: Iterable[tuple[str, int]] = (), mask_pos_file: Iterable[str | Path] = (), mask_read: Iterable[str] = (), mask_read_file: Iterable[str | Path] = (), mask_discontig: bool = True, min_ninfo_pos: int = 1000, max_fmut_pos: float = 1.0, min_ncov_read: int = 1, min_fcov_read: float = 0.0, min_finfo_read: float = 0.95, max_fmut_read: float = 1.0, min_mut_gap: int | None = None, mut_collisions: str = 'auto', quick_unbias: bool = True, quick_unbias_thresh: float = 0.001, max_mask_iter: int = 0, mask_pos_table: bool = True, mask_read_table: bool = True, brotli_level: int = 10, num_cpus: int = 4, force: bool = False) list[Path]
Define mutations and regions to filter reads and positions.
- Parameters:
branch (
str) – Create a new branch of the workflow with this name [keyword-only, default: ‘’]tmp_pfx (
str | pathlib._local.Path) – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp’]keep_tmp (
bool) – Keep temporary files after finishing [keyword-only, default: False]mask_coords (
Iterable) – Select a region of a reference given its 5’ and 3’ end coordinates [keyword-only, default: ()]mask_primers (
Iterable) – Select a region of a reference given its forward and reverse primers [keyword-only, default: ()]primer_gap (
int) – Leave a gap of this many bases between the primer and the region [keyword-only, default: 0]mask_regions_file (
str | None) – Select regions of references from coordinates/primers in a CSV file [keyword-only, default: None]count_del (
bool) – Count deletions as mutations [keyword-only, default: True]count_ins (
bool) – Count insertions as mutations [keyword-only, default: True]no_mut (
Iterable) – Do not count this type of mutation (overrides –count-del/ins) [keyword-only, default: ()]only_mut (
Iterable) – Count only this type of mutation (overrides other mutation settings) [keyword-only, default: ()]probe (
str) – Use default mask options for this chemical probe [keyword-only, default: ‘DMS’]mask_a (
bool | None) – Mask positions with base A [keyword-only, default: None]mask_c (
bool | None) – Mask positions with base C [keyword-only, default: None]mask_g (
bool | None) – Mask positions with base G [keyword-only, default: None]mask_u (
bool | None) – Mask positions with base U [keyword-only, default: None]mask_polya (
int) – Mask stretches of at least this many consecutive A bases (0 disables) [keyword-only, default: 5]mask_pos (
Iterable) – Mask this position in this reference [keyword-only, default: ()]mask_pos_file (
Iterable) – Mask positions in references from a file [keyword-only, default: ()]mask_read (
Iterable) – Mask the read with this name [keyword-only, default: ()]mask_read_file (
Iterable) – Mask the reads with names in this file [keyword-only, default: ()]mask_discontig (
bool) – Mask paired-end reads with discontiguous mates [keyword-only, default: True]min_ninfo_pos (
int) – Mask positions with fewer than this many informative base calls [keyword-only, default: 1000]max_fmut_pos (
float) – Mask positions with more than this fraction of mutated base calls [keyword-only, default: 1.0]min_ncov_read (
int) – Mask reads with fewer than this many bases covering the region [keyword-only, default: 1]min_fcov_read (
float) – Mask reads covering less than this fraction of the region [keyword-only, default: 0.0]min_finfo_read (
float) – Mask reads with less than this fraction of informative base calls [keyword-only, default: 0.95]max_fmut_read (
float) – Mask reads with more than this fraction of mutated base calls [keyword-only, default: 1.0]min_mut_gap (
int | None) – Mask reads with two mutations separated by fewer than this many bases [keyword-only, default: None]mut_collisions (
str) – If two mutations are closer than –min-mut-gap positions, MERGE the mutations, DROP the read, or AUTO-select based on the probe. [keyword-only, default: ‘auto’]quick_unbias (
bool) – Correct observer bias using a quick (typically linear time) heuristic [keyword-only, default: True]quick_unbias_thresh (
float) – Treat mutated fractions under this threshold as 0 with –quick-unbias [keyword-only, default: 0.001]max_mask_iter (
int) – Stop masking after this many iterations (0 for no limit) [keyword-only, default: 0]mask_pos_table (
bool) – Tabulate relationships per position for mask data [keyword-only, default: True]mask_read_table (
bool) – Tabulate relationships per read for mask data [keyword-only, default: True]brotli_level (
int) – Compress pickle files with this level of Brotli (0 - 11) [keyword-only, default: 10]num_cpus (
int) – Use up to this many CPUs simultaneously [keyword-only, default: 4]force (
bool) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]
- seismicrna.mask.main.set_mask_acgu(probe: str, mask_a: bool | None = None, mask_c: bool | None = None, mask_g: bool | None = None, mask_u: bool | None = None)
Resolve per-base masking flags based on the probe type.
- Parameters:
probe (
str) – Probe type (one of the values inPROBES), used to set defaults when a flag is None.mask_a (
boolorNone, optional) – Whether to mask adenine positions; if None, inferred fromprobe.mask_c (
boolorNone, optional) – Whether to mask cytosine positions; if None, inferred fromprobe.mask_g (
boolorNone, optional) – Whether to mask guanine positions; if None, inferred fromprobe.mask_u (
boolorNone, optional) – Whether to mask uracil/thymine positions; if None, inferred fromprobe.
- Returns:
Resolved
(mask_a, mask_c, mask_g, mask_u)flags.- Return type:
- seismicrna.mask.main.set_mut_gap_params(probe: str, min_mut_gap: int | None = None, mut_collisions: str = 'auto')
Resolve mutation-gap and collision parameters based on the probe type.
- Parameters:
probe (
str) – Probe type (one of the values inPROBES), used to set defaults when a parameter isNone/MUT_COLLISIONS_AUTO.min_mut_gap (
intorNone, optional) – Minimum gap (in nucleotides) between two mutations in the same read; if None, a probe-specific default is used.mut_collisions (
str, optional) – How to handle reads with mutations closer thanmin_mut_gap; ifMUT_COLLISIONS_AUTO, a probe-specific default is used.
- Returns:
Resolved
(min_mut_gap, mut_collisions)values.- Return type:
tuple[int,str]
- class seismicrna.mask.report.BaseMaskReport(**kwargs: Any | Callable[[Report], Any])
-
- classmethod get_file_seg_type()
Type of the last segment in the path.
- class seismicrna.mask.report.JoinMaskReport(**kwargs: Any | Callable[[Report], Any])
Bases:
JoinReport,BaseMaskReport- classmethod get_param_report_fields()
Parameter fields of the report.
- class seismicrna.mask.report.MaskReport(**kwargs: Any | Callable[[Report], Any])
Bases:
BatchedReport,BaseMaskReport- classmethod get_param_report_fields()
Parameter fields of the report.
- classmethod get_result_report_fields()
Result fields of the report.
- class seismicrna.mask.table.MaskBatchTabulator(*, get_batch_count_all: Callable, num_batches: int, num_cpus: int = 1, **kwargs)
Bases:
BatchTabulator,MaskTabulator
- class seismicrna.mask.table.MaskCountTabulator(*, batch_counts: Iterable[tuple[Any, Any, Any, Any]], **kwargs)
Bases:
CountTabulator,MaskTabulator
- class seismicrna.mask.table.MaskDatasetTabulator(*, dataset: MutsDataset, validate: bool = False, **kwargs)
Bases:
PartialDatasetTabulator,MaskTabulator
- class seismicrna.mask.table.MaskPositionTable
Bases:
MaskTable,PartialPositionTable,ABC
- class seismicrna.mask.table.MaskPositionTableLoader(table_file: str | Path, **kwargs)
Bases:
PositionTableLoader,MaskPositionTable
- class seismicrna.mask.table.MaskPositionTableWriter(tabulator: Tabulator)
Bases:
PositionTableWriter,MaskPositionTable
- class seismicrna.mask.table.MaskReadTable
Bases:
MaskTable,PartialReadTable,ABC
- class seismicrna.mask.table.MaskReadTableLoader(table_file: str | Path, **kwargs)
Bases:
ReadTableLoader,MaskReadTable
- class seismicrna.mask.table.MaskReadTableWriter(tabulator: Tabulator)
Bases:
ReadTableWriter,MaskReadTable
- class seismicrna.mask.table.MaskTable
Bases:
AverageTable,MaskFile,ABC- classmethod get_load_function()
LoadFunction for all Dataset types for this Table.
- class seismicrna.mask.table.MaskTabulator(*, refseq: DNA, region: Region, pattern: RelPattern, min_mut_gap: int, mut_collisions: str, quick_unbias: bool, quick_unbias_thresh: float, count_ends: bool = True, **kwargs)
Bases:
PartialTabulator,AverageTabulator,ABC- classmethod table_types()
Types of tables that this tabulator can write.
- class seismicrna.mask.table.PartialDatasetTabulator(*, dataset: MutsDataset, validate: bool = False, **kwargs)
Bases:
DatasetTabulator,PartialTabulator,ABC- classmethod init_kws()
Attributes of the dataset to use as keyword arguments in super().__init__().
- class seismicrna.mask.table.PartialPositionTable
Bases:
PartialTable,PositionTable,ABC
- class seismicrna.mask.table.PartialReadTable
Bases:
PartialTable,ReadTable,ABC
- class seismicrna.mask.table.PartialTable
Bases:
Table,HasRegFilePath,ABCTable of filtered reads over a region of the sequence.
- class seismicrna.mask.table.PartialTabulator(*, refseq: DNA, region: Region, pattern: RelPattern, min_mut_gap: int, mut_collisions: str, quick_unbias: bool, quick_unbias_thresh: float, count_ends: bool = True, **kwargs)
-
- property correct_bias
Whether to correct for observer bias.
- property data_per_pos
DataFrame of per-position data.
- classmethod get_null_value()
The null value for a count: either 0 or NaN.
- property p_ends_given_clust_noclose
Probability of each end coordinate.
- seismicrna.mask.table.adjust_counts(table_per_pos: DataFrame, p_ends_given_clust_noclose: ndarray, n_reads_clust: Series | int, region: Region, min_mut_gap: int, mut_collisions: str, quick_unbias: bool, quick_unbias_thresh: float)
Adjust the given table of masked/clustered counts per position to correct for observer bias.
- class seismicrna.mask.write.Masker(dataset: RelateMutsDataset | PoolMutsDataset, region: Region, pattern: RelPattern, *, max_mask_iter: int, mask_polya: int, mask_a: bool, mask_c: bool, mask_g: bool, mask_u: bool, mask_pos: list[tuple[str, int]], mask_pos_file: list[Path], mask_read: list[str], mask_read_file: list[Path], mask_discontig: bool, min_ncov_read: int, min_fcov_read: float, min_finfo_read: float, max_fmut_read: float, min_mut_gap: int, mut_collisions: str, probe: str, min_ninfo_pos: int, max_fmut_pos: float, quick_unbias: bool, quick_unbias_thresh: float, count_read: bool, brotli_level: int, top: Path, branch: str, num_cpus: int = 1)
Bases:
objectMask batches of relationships.
- CHECKSUM_KEY = 'mask'
- MASK_POS_FMUT = 'pos-fmut'
- MASK_POS_NINFO = 'pos-ninfo'
- MASK_READ_DISCONTIG = 'read-discontig'
- MASK_READ_FCOV = 'read-fcov'
- MASK_READ_FINFO = 'read-finfo'
- MASK_READ_FMUT = 'read-fmut'
- MASK_READ_GAP = 'read-gap'
- MASK_READ_INIT = 'read-init'
- MASK_READ_KEPT = 'read-kept'
- MASK_READ_LIST = 'read-exclude'
- MASK_READ_NCOV = 'read-ncov'
- PATTERN_KEY = 'pattern'
- create_report()
- mask()
- property n_reads_discontig
- property n_reads_init
- property n_reads_kept
Number of reads kept.
- property n_reads_list
- property n_reads_max_fmut
- property n_reads_min_fcov
- property n_reads_min_finfo
- property n_reads_min_gap
- property n_reads_min_ncov
- property pos_a
Positions masked for having base A.
- property pos_c
Positions masked for having base C.
- property pos_g
Positions masked for having base G.
- property pos_kept
Positions kept.
- property pos_list
Positions masked arbitrarily from a list.
- property pos_max_fmut
Positions masked for having too many mutations.
- property pos_min_ninfo
Positions masked for having too few informative reads.
- property pos_n
Positions masked for having base N.
- property pos_polya
Positions masked for lying in a poly(A) sequence.
- property pos_u
Positions masked for having base T or U.
- property read_names_dataset
Dataset of the read names.
- seismicrna.mask.write.mask_region(dataset: RelateMutsDataset | PoolMutsDataset, region: Region, *, branch: str, count_del: bool, count_ins: bool, no_mut: Iterable[str], only_mut: Iterable[str], mask_pos_table: bool, mask_read_table: bool, force: bool, num_cpus: int, tmp_pfx, keep_tmp, **kwargs)
Mask out certain reads, positions, and relationships.