seismicrna.relate.py package

Subpackages

Submodules

class seismicrna.relate.py.ambindel.Deletion(opposite: int, lateral3: int, pod: IndelPod)

Bases: Indel

lateral3
opposite
pod
try_move(rels: dict[int, int], pods: list[IndelPod], insert3: bool, ref_seq: str, read_seq: str, read_qual: str, min_qual: str, ref_end5: int, ref_end3: int, read_end5: int, read_end3: int, move5to3: bool)

Try to move this deletion one step in the given direction.

Parameters:
  • rels (dict[int, int]) – Mapping of reference position to relationship code; updated in place if the deletion moves.

  • pods (list[IndelPod]) – All indel pods in the read, used for collision detection.

  • insert3 (bool) – Whether insertions are marked on the 3’ (True) or 5’ (False) flanking reference position.

  • ref_seq (str) – Full reference sequence string.

  • read_seq (str) – Full read sequence string.

  • read_qual (str) – Full read quality string.

  • min_qual (str) – Minimum quality character threshold.

  • ref_end5 (int) – 1-indexed 5’ boundary of the reference region covered.

  • ref_end3 (int) – 1-indexed 3’ boundary of the reference region covered.

  • read_end5 (int) – 1-indexed 5’ boundary of the read (after soft-clipping).

  • read_end3 (int) – 1-indexed 3’ boundary of the read (after soft-clipping).

  • move5to3 (bool) – Direction of movement: True to move 3’, False to move 5’.

Returns:

True if the deletion moved, False otherwise.

Return type:

bool

class seismicrna.relate.py.ambindel.DeletionPod

Bases: IndelPod

classmethod indel_type()

All indels in the pod must be of this type.

indels: list[Indel]
class seismicrna.relate.py.ambindel.Indel(opposite: int, lateral3: int, pod: IndelPod)

Bases: ABC

Insertion or Deletion.

get_lateral(insert3: bool)
lateral3
property lateral5
move(opposite: int, lateral3: int)

Move the indel to a new position.

opposite
pod
abstractmethod try_move(rels: dict[int, int], pods: list[IndelPod], insert3: bool, ref_seq: str, read_seq: str, read_qual: str, min_qual: str, ref_end5: int, ref_end3: int, read_end5: int, read_end3: int, move5to3: bool) bool

Try to move the indel a step in one direction.

class seismicrna.relate.py.ambindel.IndelPod

Bases: ABC

add(indel: Indel)

Add an indel to the pod, performing validation.

get_indel_by_opp(opposite: int)

Return the indel that lies opposite the given position (opposite), or None if such an indel does not exist.

abstractmethod classmethod indel_type() type[Indel]

All indels in the pod must be of this type.

indels: list[Indel]
sort()

Sort the indels in the pod by their positions.

class seismicrna.relate.py.ambindel.Insertion(opposite: int, lateral3: int, pod: IndelPod)

Bases: Indel

lateral3
opposite
pod
try_move(rels: dict[int, int], pods: list[IndelPod], insert3: bool, ref_seq: str, read_seq: str, read_qual: str, min_qual: str, ref_end5: int, ref_end3: int, read_end5: int, read_end3: int, move5to3: bool)

Try to move this insertion one step in the given direction.

Parameters:
  • rels (dict[int, int]) – Mapping of reference position to relationship code; updated in place if the insertion moves.

  • pods (list[IndelPod]) – All indel pods in the read, used for collision detection.

  • insert3 (bool) – Whether insertions are marked on the 3’ (True) or 5’ (False) flanking reference position.

  • ref_seq (str) – Full reference sequence string.

  • read_seq (str) – Full read sequence string.

  • read_qual (str) – Full read quality string.

  • min_qual (str) – Minimum quality character threshold.

  • ref_end5 (int) – 1-indexed 5’ boundary of the reference region covered.

  • ref_end3 (int) – 1-indexed 3’ boundary of the reference region covered.

  • read_end5 (int) – 1-indexed 5’ boundary of the read (after soft-clipping).

  • read_end3 (int) – 1-indexed 3’ boundary of the read (after soft-clipping).

  • move5to3 (bool) – Direction of movement: True to move 3’, False to move 5’.

Returns:

True if the insertion moved, False otherwise.

Return type:

bool

class seismicrna.relate.py.ambindel.InsertionPod

Bases: IndelPod

classmethod indel_type()

All indels in the pod must be of this type.

indels: list[Indel]
seismicrna.relate.py.ambindel.calc_lateral5(lateral3: int)
seismicrna.relate.py.ambindel.find_ambindels(rels: dict[int, int], pods: list[IndelPod], insert3: bool, ref_seq: str, read_seq: str, read_qual: str, min_qual: str, ref_end5: int, ref_end3: int, read_end5: int, read_end3: int)

Find and annotate all ambiguous positions of indels.

For each indel, slide it as far 5’ as possible, then traverse 3’ to identify every reference position at which it could equivalently be placed given the read and reference sequences. Each ambiguous position is added to rels so that downstream steps can report it.

Parameters:
  • rels (dict[int, int]) – Mapping of reference position to relationship code; updated in place with ambiguous positions for each indel.

  • pods (list[IndelPod]) – All indel pods in the read, ordered 5’ to 3’.

  • insert3 (bool) – Whether insertions are marked on the 3’ (True) or 5’ (False) flanking reference position.

  • ref_seq (str) – Full reference sequence string (1-indexed via [pos-1]).

  • read_seq (str) – Full read sequence string (1-indexed via [pos-1]).

  • read_qual (str) – Full read quality string (1-indexed via [pos-1]).

  • min_qual (str) – Minimum quality character threshold for base calls.

  • ref_end5 (int) – 1-indexed 5’ boundary of the aligned reference region.

  • ref_end3 (int) – 1-indexed 3’ boundary of the aligned reference region.

  • read_end5 (int) – 1-indexed 5’ boundary of the read (after soft-clipping).

  • read_end3 (int) – 1-indexed 3’ boundary of the read (after soft-clipping).

seismicrna.relate.py.ambindel.get_ins_rel(insert3: bool)
seismicrna.relate.py.ambindel.get_lateral(lateral3: int, insert3: int)
seismicrna.relate.py.cigar.op_consumes_read(op: str)

Whether the CIGAR operation consumes the read.

seismicrna.relate.py.cigar.op_consumes_ref(op: str)

Whether the CIGAR operation consumes the reference.

seismicrna.relate.py.cigar.parse_cigar(cigar_string: str)

Yield the fields of a CIGAR string as pairs of (operation, length), where operation is 1 byte indicating the CIGAR operation and length is a positive integer indicating the number of bases from the read that the operation consumes. Note that in the CIGAR string itself, each length precedes its corresponding operation.

Parameters:

cigar_string (bytes) – CIGAR string from a SAM file. For full documentation, refer to https://samtools.github.io/hts-specs/

Yields:
  • bytes (length = 1) – Current CIGAR operation

  • int (≥ 1) – Length of current CIGAR operation

seismicrna.relate.py.encode.encode_relate(ref_base: str, read_base: str, read_qual: str, min_qual: str)

Encode the relationship between a base in the read and a base in the reference sequence.

Parameters:
  • ref_base (DNA) – Base in the reference sequence.

  • read_base (DNA) – Base in the read sequence.

  • read_qual (str) – ASCII encoding for the Phred quality score of the read base.

  • min_qual (str) – Minimum value of read_qual to not call the relation ambiguous.

seismicrna.relate.py.encode.is_acgt(base: str)

Check whether a character is a standard DNA base.

exception seismicrna.relate.py.error.RelateError

Bases: RuntimeError

Any error that occurs during the relate algorithm.

class seismicrna.relate.py.relate.SamFlag(flag: int)

Bases: object

Represents the set of 12 boolean flags for a SAM record.

flag
paired
proper
read1
read2
rev
class seismicrna.relate.py.relate.SamRead(line: str)

Bases: object

One read in a SAM file.

cigar
flag
mapq
name
pos
quals
ref
seq
seismicrna.relate.py.relate.calc_rels_lines(line1: str, line2: str, ref: str, refseq: str, min_mapq: int, min_qual: int, insert3: bool, ambindel: bool, overhangs: bool, clip_end5: int = 0, clip_end3: int = 0)

Calculate relationships for one SAM record (single or paired-end).

Parameters:
  • line1 (str) – First SAM alignment line (always present).

  • line2 (str) – Second SAM alignment line for paired-end reads; empty string for single-end reads or improperly paired reads.

  • ref (str) – Expected reference name.

  • refseq (str) – Full reference sequence string.

  • min_mapq (int) – Minimum acceptable mapping quality score.

  • min_qual (int) – Minimum Phred quality score (as an integer) to accept a base call; converted to a character internally.

  • insert3 (bool) – Whether to mark insertions on the 3’ (True) or 5’ (False) flanking reference position.

  • ambindel (bool) – Whether to find and label all ambiguous indel positions.

  • overhangs (bool) – Whether to allow paired-end mates to overhang one another.

  • clip_end5 (int) – Number of bases to clip from the 5’ end of each read.

  • clip_end3 (int) – Number of bases to clip from the 3’ end of each read.

Returns:

Segment end coordinates (end5s, end3s) and relationship codes for mutated positions.

Return type:

tuple[tuple[list[int], list[int]], dict[int, int]]

seismicrna.relate.py.relate.merge_mates(end5sf: list[int], end3sf: list[int], relsf: dict[int, int], end5sr: list[int], end3sr: list[int], relsr: dict[int, int], overhangs: bool)

Merge segment coordinates and relationships from a paired-end read.

Optionally trims overhanging ends so that the forward mate does not extend past the reverse mate and vice versa.

Parameters:
  • end5sf (list[int]) – 5’ segment ends for the forward mate.

  • end3sf (list[int]) – 3’ segment ends for the forward mate.

  • relsf (dict[int, int]) – Relationship codes for the forward mate.

  • end5sr (list[int]) – 5’ segment ends for the reverse mate.

  • end3sr (list[int]) – 3’ segment ends for the reverse mate.

  • relsr (dict[int, int]) – Relationship codes for the reverse mate.

  • overhangs (bool) – Whether to allow one mate to overhang the other. If False, overhanging regions are trimmed before merging.

Returns:

Combined (end5s, end3s) for all segments and the merged relationship codes for mutated positions.

Return type:

tuple[tuple[list[int], list[int]], dict[int, int]]

seismicrna.relate.py.relate.trim_segs_end(seg5s: list[int], seg3s: list[int], max_end: int) tuple[list[int], list[int]]

For each segment (start, end), the segment’s end is replaced by min(start, max_end).

seismicrna.relate.py.relate.trim_segs_start(seg5s: list[int], seg3s: list[int], min_start: int) tuple[list[int], list[int]]

For each segment (start, end), the segment’s start is replaced by max(start, min_start).