seismicrna.core.batch package
Subpackages
- seismicrna.core.batch.tests package
- Submodules
TestAccumulateBatches
get_batch_count_all_func()
TestCalcCountPerPos
TestCalcCountPerRead
TestCalcCoverage
TestCalcCoverage.test_0_positions()
TestCalcCoverage.test_0_positions_weighted()
TestCalcCoverage.test_1_position()
TestCalcCoverage.test_1_segment()
TestCalcCoverage.test_1_segment_redundant()
TestCalcCoverage.test_1_segment_redundant_weighted()
TestCalcCoverage.test_1_segment_weighted()
TestCalcCoverage.test_2_segments_mask()
TestCalcCoverage.test_2_segments_nomask()
TestCalcRelsPerPos
TestCalcRelsPerRead
TestCalcUniqReadWeights
TestCountEndCoords
TestCheckNoCoverageReads
TestCountReadsSegments
TestFindContiguousReads
TestFindReadEnd3s
TestFindReadEnd5s
TestMaskSegmentEnds
TestMatchReadsSegments
TestMergeReadEnds
TestMergeSegmentEnds
TestSortSegmentEnds
TestCalcMutsMatrix
TestCalcMutsMatrix.test_full_reads_no_muts()
TestCalcMutsMatrix.test_full_reads_no_muts_some_masked()
TestCalcMutsMatrix.test_paired_reads_masked_segments()
TestCalcMutsMatrix.test_paired_reads_no_muts()
TestCalcMutsMatrix.test_partial_reads_muts()
TestCalcMutsMatrix.test_partial_reads_no_muts()
TestCalcMutsMatrix.test_partial_reads_no_muts_some_masked()
- Submodules
Submodules
- seismicrna.core.batch.accum.accumulate_batches(get_batch_count_all: Callable[[int], tuple[Any, Any, Any, Any]], num_batches: int, refseq: DNA, pos_nums: ndarray, patterns: dict[str, RelPattern], ks: Iterable[int] | None = None, *, count_ends: bool = True, count_pos: bool = True, count_read: bool = True, validate: bool = True, max_procs: int = 1)
- seismicrna.core.batch.accum.accumulate_counts(batch_counts: Iterable[tuple[Any, Any, Any, Any]], refseq: DNA, pos_nums: ndarray, patterns: dict[str, RelPattern], ks: Iterable[int] | None = None, *, count_ends: bool = True, count_pos: bool = True, count_read: bool = True, validate: bool = True)
- seismicrna.core.batch.count.calc_count_per_pos(pattern: RelPattern, cover_per_pos: Series | DataFrame, rels_per_pos: dict[int, Series | DataFrame])
Count the reads that fit a pattern at each position.
- seismicrna.core.batch.count.calc_count_per_read(pattern: RelPattern, cover_per_read: DataFrame, rels_per_read: dict[int, DataFrame])
Count the positions that fit a pattern in each read.
- seismicrna.core.batch.count.calc_coverage(pos_index: Index, read_nums: ndarray, seg_end5s: ndarray, seg_end3s: ndarray, read_weights: DataFrame | None = None)
Number of positions covered by each read.
- seismicrna.core.batch.count.calc_reads_per_pos(pattern: RelPattern, mutations: dict[int, dict[int, ndarray]], pos_index: Index)
For each position, find all reads matching a pattern.
- seismicrna.core.batch.count.calc_rels_per_pos(mutations: dict[int, dict[int, ndarray]], num_reads: int | Series, cover_per_pos: Series | DataFrame, read_indexes: ndarray | None = None, read_weights: DataFrame | None = None)
For each relationship, the number of reads at each position.
- seismicrna.core.batch.count.calc_rels_per_read(mutations: dict[int, dict[int, ndarray]], pos_index: Index, cover_per_read: DataFrame, read_indexes: ndarray)
For each relationship, the number of positions in each read.
- seismicrna.core.batch.count.count_end_coords(end5s: ndarray, end3s: ndarray, weights: DataFrame | None = None)
Count each pair of 5’ and 3’ end coordinates.
- class seismicrna.core.batch.ends.EndCoords(*, region: Region, seg_end5s: ndarray, seg_end3s: ndarray, sanitize: bool = True, **kwargs)
Bases:
object
Collection of 5’ and 3’ segment end coordinates.
- property contiguous
Whether the segments of each read are contiguous.
- property num_contiguous
Number of contiguous reads.
- property num_discontiguous
Number of discontiguous reads.
- property num_reads
Number of reads.
- property num_segments
Number of segments in each read.
- property pos_dtype
Data type for positions.
- property read_end3s
3’ end of each read.
- property read_end5s
5’ end of each read.
- property read_lengths
Length of each read.
- property seg_end3s
3’ end of each segment in each read.
- property seg_end5s
5’ end of each segment in each read.
- seismicrna.core.batch.ends.count_reads_segments(seg_ends: ndarray, what: str = 'seg_ends') tuple[int, int]
- seismicrna.core.batch.ends.find_contiguous_reads(seg_end5s: ndarray, seg_end3s: ndarray)
Whether the segments of each read are contiguous.
- seismicrna.core.batch.ends.find_read_end3s(seg_end3s: ndarray)
Find the 3’ end of each read.
- seismicrna.core.batch.ends.find_read_end5s(seg_end5s: ndarray)
Find the 5’ end of each read.
- seismicrna.core.batch.ends.mask_segment_ends(seg_end5s: ndarray, seg_end3s: ndarray)
Mask segments with no coverage (5’ end > 3’ end).
- seismicrna.core.batch.ends.match_reads_segments(seg_end5s: ndarray, seg_end3s: ndarray)
Number of segments for the given end coordinates.
- seismicrna.core.batch.ends.merge_read_ends(read_end5s: ndarray, read_end3s: ndarray)
Return the 5’ and 3’ ends as one 2D array.
- seismicrna.core.batch.ends.merge_segment_ends(seg_end5s: ndarray, seg_end3s: ndarray, fill_value: int | None = None)
- seismicrna.core.batch.ends.sanitize_segment_ends(seg_end5s: ndarray, seg_end3s: ndarray, min_pos: int, max_pos: int, check_values: bool = True)
Sanitize end coordinates.
- Parameters:
seg_end5s (
np.ndarray
) – 5’ end coordinate of each segment in each read.seg_end3s (
np.ndarray
) – 3’ end coordinate of each segment in each read.min_pos (
int
) – Minimum allowed value of a position.max_pos (
int
) – Maximum allowed value of a position.check_values (
bool = True
) – Whether to check the bounds of the values, which is the most expensive operation in this function. Can be set to False if the only desired effect is to ensure the output is a positive, even number of arrays in the proper data type.
- Returns:
Sanitized end coordinates: encoded in the most efficient data type, and if check_values is True then all between min_pos and max_pos (inclusive).
- Return type:
tuple[np.ndarray
,np.ndarray]
- seismicrna.core.batch.ends.simulate_segment_ends(uniq_end5s: ndarray, uniq_end3s: ndarray, p_ends: ndarray, num_reads: int, read_length: int = 0, p_rev: float = 0.5)
Simulate segment end coordinates from their probabilities.
- Parameters:
uniq_end5s (
np.ndarray
) – Unique read 5’ end coordinates.uniq_end3s (
np.ndarray
) – Unique read 3’ end coordinates.p_ends (
np.ndarray
) – Probability of each set of unique end coordinates.num_reads (
int
) – Number of reads to simulate.read_length (
int = 0
) – If == 0, then generate single-end reads (1 segment per read); if > 0, then generate paired-end reads (2 segments per read) with at most this number of base calls in each segment.p_rev (
float = 0.5
) – For paired-end reads, the probability that mate 1 aligns in the
- Returns:
5’ and 3’ segment end coordinates of the simulated reads.
- Return type:
tuple[np.ndarray
,np.ndarray]
- seismicrna.core.batch.ends.sort_segment_ends(seg_end5s: ndarray, seg_end3s: ndarray, zero_indexed: bool = True, fill_mask: bool = False)
Sort the segment end coordinates and label the 3’ end of each contiguous set of segments.
- Parameters:
seg_end5s (
np.ndarray
) – 5’ end of each segment in each read; may be masked.seg_end3s (
np.ndarray
) – 3’ end of each segment in each read; may be masked.zero_indexed (
bool = True
) – In the return array, make the 5’ ends 0-indexed; if False, then they will be 1-indexed (like the input).fill_mask (
bool = False
) – If seg_end5s or seg_end3s is a masked array, then return a regular array with all masked coordinates set to 0 (or 1 for 5’ ends if one_indexed is True) rather than a masked array.
- Returns:
Sorted 5’ and 3’ coordinates of the segments in each read
Labels of whether each coordinate is a 5’ end of a segment
Labels of whether each coordinate is a 3’ end of a contiguous segment
- Return type:
tuple[np.ndarray
,np.ndarray
,np.ndarray]
- seismicrna.core.batch.index.count_base_types(base_pos_index: Index)
Return the number of each type of base in the index of positions and bases.
- seismicrna.core.batch.index.iter_base_types(base_pos_index: Index)
For each type of base in the index of positions and bases, yield the positions in the index with that type of base.
- class seismicrna.core.batch.muts.MutsBatch(*, region: Region, sanitize: bool = True, muts: dict[int, dict[int, list[int] | ndarray]], masked_read_nums: ndarray | list[int] | None = None, **kwargs)
Bases:
EndCoords
,ReadBatch
,ABC
Batch of mutational data.
- property muts
Reads with each type of mutation at each position.
- property pos_nums
Positions in use.
- property read_end_counts
Counts of read end coordinates.
- class seismicrna.core.batch.muts.RegionMutsBatch(*, region: Region, **kwargs)
-
Batch of mutational data that knows its region.
- calc_min_mut_dist(pattern: RelPattern)
For each read, calculate the smallest distance (i.e. the gap plus 1) between any two mutations.
- count_all(patterns: dict[str, RelPattern], ks: Iterable[int] | None = None, *, count_ends: bool = True, count_pos: bool = True, count_read: bool = True)
Calculate all counts.
- count_per_pos(pattern: RelPattern)
Count the reads that fit a relationship pattern at each position in a region.
- count_per_read(pattern: RelPattern)
Count the positions in a region that fit a relationship pattern in each read.
- property cover_per_pos
Number of reads covering each position.
- property cover_per_read
Number of positions covered by each read.
- iter_reads(pattern: RelPattern, only_read_ends: bool = False, require_contiguous: bool = False)
End coordinates and mutated positions in each read.
- property matrix
Matrix of relationships at each position in each read.
- property pos_index
Index of unmasked positions and bases.
- reads_noclose_muts(pattern: RelPattern, min_gap: int)
List the reads with no two mutations too close.
- reads_per_pos(pattern: RelPattern)
For each position, find all reads matching a relationship pattern.
- property rels_per_pos
For each relationship, the number of reads at each position with that relationship.
- property rels_per_read
For each relationship, the number of positions in each read with that relationship.
- seismicrna.core.batch.muts.calc_muts_matrix(region: Region, read_nums: ndarray, seg_end5s: ndarray, seg_end3s: ndarray, muts: dict[int, dict[int, ndarray]])
Matrix of relationships at each position in each read.
- seismicrna.core.batch.muts.sanitize_muts(muts: dict[int, dict[int, list[int] | ndarray]], region: Region, data_type: type, sanitize: bool = True)
- seismicrna.core.batch.muts.simulate_muts(pmut: DataFrame, seg_end5s: ndarray, seg_end3s: ndarray)
Simulate mutation data.
- Parameters:
pmut (
pd.DataFrame
) – Rate of each type of mutation at each position.seg_end5s – 5’ end coordinate of each segment.
seg_end3s – 3’ end coordinate of each segment.
- class seismicrna.core.batch.read.ReadBatch(*, batch: int)
Bases:
ABC
Batch of reads.
- property batch_read_index
MultiIndex of the batch number and read numbers.
- property masked_reads_bool
- property read_dtype
Data type for read numbers.
- property read_indexes: ndarray
Map each read number to its index in self.read_nums.
- property read_nums: ndarray
Read numbers.