seismicrna.core.batch package

Subpackages

Submodules

seismicrna.core.batch.accum.accumulate_batches(get_batch_count_all: Callable[[int], tuple[Any, Any, Any, Any]], num_batches: int, refseq: DNA, pos_nums: ndarray, patterns: dict[str, RelPattern], ks: Iterable[int] | None = None, *, count_ends: bool = True, count_pos: bool = True, count_read: bool = True, validate: bool = True, max_procs: int = 1)
seismicrna.core.batch.accum.accumulate_counts(batch_counts: Iterable[tuple[Any, Any, Any, Any]], refseq: DNA, pos_nums: ndarray, patterns: dict[str, RelPattern], ks: Iterable[int] | None = None, *, count_ends: bool = True, count_pos: bool = True, count_read: bool = True, validate: bool = True)
seismicrna.core.batch.count.calc_count_per_pos(pattern: RelPattern, cover_per_pos: Series | DataFrame, rels_per_pos: dict[int, Series | DataFrame])

Count the reads that fit a pattern at each position.

seismicrna.core.batch.count.calc_count_per_read(pattern: RelPattern, cover_per_read: DataFrame, rels_per_read: dict[int, DataFrame])

Count the positions that fit a pattern in each read.

seismicrna.core.batch.count.calc_coverage(pos_index: Index, read_nums: ndarray, seg_end5s: ndarray, seg_end3s: ndarray, read_weights: DataFrame | None = None)

Number of positions covered by each read.

seismicrna.core.batch.count.calc_reads_per_pos(pattern: RelPattern, mutations: dict[int, dict[int, ndarray]], pos_index: Index)

For each position, find all reads matching a pattern.

seismicrna.core.batch.count.calc_rels_per_pos(mutations: dict[int, dict[int, ndarray]], num_reads: int | Series, cover_per_pos: Series | DataFrame, read_indexes: ndarray | None = None, read_weights: DataFrame | None = None)

For each relationship, the number of reads at each position.

seismicrna.core.batch.count.calc_rels_per_read(mutations: dict[int, dict[int, ndarray]], pos_index: Index, cover_per_read: DataFrame, read_indexes: ndarray)

For each relationship, the number of positions in each read.

seismicrna.core.batch.count.count_end_coords(end5s: ndarray, end3s: ndarray, weights: DataFrame | None = None)

Count each pair of 5’ and 3’ end coordinates.

class seismicrna.core.batch.ends.EndCoords(*, region: Region, seg_end5s: ndarray, seg_end3s: ndarray, sanitize: bool = True, **kwargs)

Bases: object

Collection of 5’ and 3’ segment end coordinates.

property contiguous

Whether the segments of each read are contiguous.

property num_contiguous

Number of contiguous reads.

property num_discontiguous

Number of discontiguous reads.

property num_reads

Number of reads.

property num_segments

Number of segments in each read.

property pos_dtype

Data type for positions.

property read_end3s

3’ end of each read.

property read_end5s

5’ end of each read.

property read_lengths

Length of each read.

property seg_end3s

3’ end of each segment in each read.

property seg_end5s

5’ end of each segment in each read.

seismicrna.core.batch.ends.count_reads_segments(seg_ends: ndarray, what: str = 'seg_ends') tuple[int, int]
seismicrna.core.batch.ends.find_contiguous_reads(seg_end5s: ndarray, seg_end3s: ndarray)

Whether the segments of each read are contiguous.

seismicrna.core.batch.ends.find_read_end3s(seg_end3s: ndarray)

Find the 3’ end of each read.

seismicrna.core.batch.ends.find_read_end5s(seg_end5s: ndarray)

Find the 5’ end of each read.

seismicrna.core.batch.ends.mask_segment_ends(seg_end5s: ndarray, seg_end3s: ndarray)

Mask segments with no coverage (5’ end > 3’ end).

seismicrna.core.batch.ends.match_reads_segments(seg_end5s: ndarray, seg_end3s: ndarray)

Number of segments for the given end coordinates.

seismicrna.core.batch.ends.merge_read_ends(read_end5s: ndarray, read_end3s: ndarray)

Return the 5’ and 3’ ends as one 2D array.

seismicrna.core.batch.ends.merge_segment_ends(seg_end5s: ndarray, seg_end3s: ndarray, fill_value: int | None = None)
seismicrna.core.batch.ends.sanitize_segment_ends(seg_end5s: ndarray, seg_end3s: ndarray, min_pos: int, max_pos: int, check_values: bool = True)

Sanitize end coordinates.

Parameters:
  • seg_end5s (np.ndarray) – 5’ end coordinate of each segment in each read.

  • seg_end3s (np.ndarray) – 3’ end coordinate of each segment in each read.

  • min_pos (int) – Minimum allowed value of a position.

  • max_pos (int) – Maximum allowed value of a position.

  • check_values (bool = True) – Whether to check the bounds of the values, which is the most expensive operation in this function. Can be set to False if the only desired effect is to ensure the output is a positive, even number of arrays in the proper data type.

Returns:

Sanitized end coordinates: encoded in the most efficient data type, and if check_values is True then all between min_pos and max_pos (inclusive).

Return type:

tuple[np.ndarray, np.ndarray]

seismicrna.core.batch.ends.simulate_segment_ends(uniq_end5s: ndarray, uniq_end3s: ndarray, p_ends: ndarray, num_reads: int, read_length: int = 0, p_rev: float = 0.5)

Simulate segment end coordinates from their probabilities.

Parameters:
  • uniq_end5s (np.ndarray) – Unique read 5’ end coordinates.

  • uniq_end3s (np.ndarray) – Unique read 3’ end coordinates.

  • p_ends (np.ndarray) – Probability of each set of unique end coordinates.

  • num_reads (int) – Number of reads to simulate.

  • read_length (int = 0) – If == 0, then generate single-end reads (1 segment per read); if > 0, then generate paired-end reads (2 segments per read) with at most this number of base calls in each segment.

  • p_rev (float = 0.5) – For paired-end reads, the probability that mate 1 aligns in the

Returns:

5’ and 3’ segment end coordinates of the simulated reads.

Return type:

tuple[np.ndarray, np.ndarray]

seismicrna.core.batch.ends.sort_segment_ends(seg_end5s: ndarray, seg_end3s: ndarray, zero_indexed: bool = True, fill_mask: bool = False)

Sort the segment end coordinates and label the 3’ end of each contiguous set of segments.

Parameters:
  • seg_end5s (np.ndarray) – 5’ end of each segment in each read; may be masked.

  • seg_end3s (np.ndarray) – 3’ end of each segment in each read; may be masked.

  • zero_indexed (bool = True) – In the return array, make the 5’ ends 0-indexed; if False, then they will be 1-indexed (like the input).

  • fill_mask (bool = False) – If seg_end5s or seg_end3s is a masked array, then return a regular array with all masked coordinates set to 0 (or 1 for 5’ ends if one_indexed is True) rather than a masked array.

Returns:

  • Sorted 5’ and 3’ coordinates of the segments in each read

  • Labels of whether each coordinate is a 5’ end of a segment

  • Labels of whether each coordinate is a 3’ end of a contiguous segment

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray]

seismicrna.core.batch.index.count_base_types(base_pos_index: Index)

Return the number of each type of base in the index of positions and bases.

seismicrna.core.batch.index.iter_base_types(base_pos_index: Index)

For each type of base in the index of positions and bases, yield the positions in the index with that type of base.

seismicrna.core.batch.index.list_batch_nums(num_batches: int)

List the batch numbers.

class seismicrna.core.batch.muts.MutsBatch(*, region: Region, sanitize: bool = True, muts: dict[int, dict[int, list[int] | ndarray]], masked_read_nums: ndarray | list[int] | None = None, **kwargs)

Bases: EndCoords, ReadBatch, ABC

Batch of mutational data.

property muts

Reads with each type of mutation at each position.

property pos_nums

Positions in use.

property read_end_counts

Counts of read end coordinates.

abstract property read_weights: DataFrame | None

Weights for each read when computing counts.

class seismicrna.core.batch.muts.RegionMutsBatch(*, region: Region, **kwargs)

Bases: MutsBatch, ABC

Batch of mutational data that knows its region.

calc_min_mut_dist(pattern: RelPattern)

For each read, calculate the smallest distance (i.e. the gap plus 1) between any two mutations.

count_all(patterns: dict[str, RelPattern], ks: Iterable[int] | None = None, *, count_ends: bool = True, count_pos: bool = True, count_read: bool = True)

Calculate all counts.

count_per_pos(pattern: RelPattern)

Count the reads that fit a relationship pattern at each position in a region.

count_per_read(pattern: RelPattern)

Count the positions in a region that fit a relationship pattern in each read.

property cover_per_pos

Number of reads covering each position.

property cover_per_read

Number of positions covered by each read.

iter_reads(pattern: RelPattern, only_read_ends: bool = False, require_contiguous: bool = False)

End coordinates and mutated positions in each read.

property matrix

Matrix of relationships at each position in each read.

property pos_index

Index of unmasked positions and bases.

reads_noclose_muts(pattern: RelPattern, min_gap: int)

List the reads with no two mutations too close.

reads_per_pos(pattern: RelPattern)

For each position, find all reads matching a relationship pattern.

property rels_per_pos

For each relationship, the number of reads at each position with that relationship.

property rels_per_read

For each relationship, the number of positions in each read with that relationship.

seismicrna.core.batch.muts.calc_muts_matrix(region: Region, read_nums: ndarray, seg_end5s: ndarray, seg_end3s: ndarray, muts: dict[int, dict[int, ndarray]])

Matrix of relationships at each position in each read.

seismicrna.core.batch.muts.sanitize_muts(muts: dict[int, dict[int, list[int] | ndarray]], region: Region, data_type: type, sanitize: bool = True)
seismicrna.core.batch.muts.simulate_muts(pmut: DataFrame, seg_end5s: ndarray, seg_end3s: ndarray)

Simulate mutation data.

Parameters:
  • pmut (pd.DataFrame) – Rate of each type of mutation at each position.

  • seg_end5s – 5’ end coordinate of each segment.

  • seg_end3s – 3’ end coordinate of each segment.

class seismicrna.core.batch.read.ReadBatch(*, batch: int)

Bases: ABC

Batch of reads.

property batch_read_index

MultiIndex of the batch number and read numbers.

property masked_reads_bool
property max_read: int

Maximum possible value for a read index.

property num_reads: int | Series

Number of reads.

property read_dtype

Data type for read numbers.

property read_indexes: ndarray

Map each read number to its index in self.read_nums.

property read_nums: ndarray

Read numbers.