seismicrna.relate.py package
Subpackages
Submodules
- class seismicrna.relate.py.ambindel.Deletion(opposite: int, lateral3: int, pod: IndelPod)
Bases:
Indel- lateral3
- opposite
- pod
- try_move(rels: dict[int, int], pods: list[IndelPod], insert3: bool, ref_seq: str, read_seq: str, read_qual: str, min_qual: str, ref_end5: int, ref_end3: int, read_end5: int, read_end3: int, move5to3: bool)
Try to move this deletion one step in the given direction.
- Parameters:
rels (
dict[int,int]) – Mapping of reference position to relationship code; updated in place if the deletion moves.pods (
list[IndelPod]) – All indel pods in the read, used for collision detection.insert3 (
bool) – Whether insertions are marked on the 3’ (True) or 5’ (False) flanking reference position.ref_seq (
str) – Full reference sequence string.read_seq (
str) – Full read sequence string.read_qual (
str) – Full read quality string.min_qual (
str) – Minimum quality character threshold.ref_end5 (
int) – 1-indexed 5’ boundary of the reference region covered.ref_end3 (
int) – 1-indexed 3’ boundary of the reference region covered.read_end5 (
int) – 1-indexed 5’ boundary of the read (after soft-clipping).read_end3 (
int) – 1-indexed 3’ boundary of the read (after soft-clipping).move5to3 (
bool) – Direction of movement: True to move 3’, False to move 5’.
- Returns:
True if the deletion moved, False otherwise.
- Return type:
- class seismicrna.relate.py.ambindel.DeletionPod
Bases:
IndelPod- classmethod indel_type()
All indels in the pod must be of this type.
- class seismicrna.relate.py.ambindel.Indel(opposite: int, lateral3: int, pod: IndelPod)
Bases:
ABCInsertion or Deletion.
- lateral3
- property lateral5
- opposite
- pod
- class seismicrna.relate.py.ambindel.IndelPod
Bases:
ABC- get_indel_by_opp(opposite: int)
Return the indel that lies opposite the given position (opposite), or None if such an indel does not exist.
- sort()
Sort the indels in the pod by their positions.
- class seismicrna.relate.py.ambindel.Insertion(opposite: int, lateral3: int, pod: IndelPod)
Bases:
Indel- lateral3
- opposite
- pod
- try_move(rels: dict[int, int], pods: list[IndelPod], insert3: bool, ref_seq: str, read_seq: str, read_qual: str, min_qual: str, ref_end5: int, ref_end3: int, read_end5: int, read_end3: int, move5to3: bool)
Try to move this insertion one step in the given direction.
- Parameters:
rels (
dict[int,int]) – Mapping of reference position to relationship code; updated in place if the insertion moves.pods (
list[IndelPod]) – All indel pods in the read, used for collision detection.insert3 (
bool) – Whether insertions are marked on the 3’ (True) or 5’ (False) flanking reference position.ref_seq (
str) – Full reference sequence string.read_seq (
str) – Full read sequence string.read_qual (
str) – Full read quality string.min_qual (
str) – Minimum quality character threshold.ref_end5 (
int) – 1-indexed 5’ boundary of the reference region covered.ref_end3 (
int) – 1-indexed 3’ boundary of the reference region covered.read_end5 (
int) – 1-indexed 5’ boundary of the read (after soft-clipping).read_end3 (
int) – 1-indexed 3’ boundary of the read (after soft-clipping).move5to3 (
bool) – Direction of movement: True to move 3’, False to move 5’.
- Returns:
True if the insertion moved, False otherwise.
- Return type:
- class seismicrna.relate.py.ambindel.InsertionPod
Bases:
IndelPod- classmethod indel_type()
All indels in the pod must be of this type.
- seismicrna.relate.py.ambindel.find_ambindels(rels: dict[int, int], pods: list[IndelPod], insert3: bool, ref_seq: str, read_seq: str, read_qual: str, min_qual: str, ref_end5: int, ref_end3: int, read_end5: int, read_end3: int)
Find and annotate all ambiguous positions of indels.
For each indel, slide it as far 5’ as possible, then traverse 3’ to identify every reference position at which it could equivalently be placed given the read and reference sequences. Each ambiguous position is added to rels so that downstream steps can report it.
- Parameters:
rels (
dict[int,int]) – Mapping of reference position to relationship code; updated in place with ambiguous positions for each indel.pods (
list[IndelPod]) – All indel pods in the read, ordered 5’ to 3’.insert3 (
bool) – Whether insertions are marked on the 3’ (True) or 5’ (False) flanking reference position.ref_seq (
str) – Full reference sequence string (1-indexed via [pos-1]).read_seq (
str) – Full read sequence string (1-indexed via [pos-1]).read_qual (
str) – Full read quality string (1-indexed via [pos-1]).min_qual (
str) – Minimum quality character threshold for base calls.ref_end5 (
int) – 1-indexed 5’ boundary of the aligned reference region.ref_end3 (
int) – 1-indexed 3’ boundary of the aligned reference region.read_end5 (
int) – 1-indexed 5’ boundary of the read (after soft-clipping).read_end3 (
int) – 1-indexed 3’ boundary of the read (after soft-clipping).
- seismicrna.relate.py.cigar.op_consumes_read(op: str)
Whether the CIGAR operation consumes the read.
- seismicrna.relate.py.cigar.op_consumes_ref(op: str)
Whether the CIGAR operation consumes the reference.
- seismicrna.relate.py.cigar.parse_cigar(cigar_string: str)
Yield the fields of a CIGAR string as pairs of (operation, length), where operation is 1 byte indicating the CIGAR operation and length is a positive integer indicating the number of bases from the read that the operation consumes. Note that in the CIGAR string itself, each length precedes its corresponding operation.
- Parameters:
cigar_string (
bytes) – CIGAR string from a SAM file. For full documentation, refer to https://samtools.github.io/hts-specs/- Yields:
bytes (length = 1)– Current CIGAR operationint (≥ 1)– Length of current CIGAR operation
- seismicrna.relate.py.encode.encode_relate(ref_base: str, read_base: str, read_qual: str, min_qual: str)
Encode the relationship between a base in the read and a base in the reference sequence.
- exception seismicrna.relate.py.error.RelateError
Bases:
RuntimeErrorAny error that occurs during the relate algorithm.
- class seismicrna.relate.py.relate.SamFlag(flag: int)
Bases:
objectRepresents the set of 12 boolean flags for a SAM record.
- flag
- paired
- proper
- read1
- read2
- rev
- class seismicrna.relate.py.relate.SamRead(line: str)
Bases:
objectOne read in a SAM file.
- cigar
- flag
- mapq
- name
- pos
- quals
- ref
- seq
- seismicrna.relate.py.relate.calc_rels_lines(line1: str, line2: str, ref: str, refseq: str, min_mapq: int, min_qual: int, insert3: bool, ambindel: bool, overhangs: bool, clip_end5: int = 0, clip_end3: int = 0)
Calculate relationships for one SAM record (single or paired-end).
- Parameters:
line1 (
str) – First SAM alignment line (always present).line2 (
str) – Second SAM alignment line for paired-end reads; empty string for single-end reads or improperly paired reads.ref (
str) – Expected reference name.refseq (
str) – Full reference sequence string.min_mapq (
int) – Minimum acceptable mapping quality score.min_qual (
int) – Minimum Phred quality score (as an integer) to accept a base call; converted to a character internally.insert3 (
bool) – Whether to mark insertions on the 3’ (True) or 5’ (False) flanking reference position.ambindel (
bool) – Whether to find and label all ambiguous indel positions.overhangs (
bool) – Whether to allow paired-end mates to overhang one another.clip_end5 (
int) – Number of bases to clip from the 5’ end of each read.clip_end3 (
int) – Number of bases to clip from the 3’ end of each read.
- Returns:
Segment end coordinates (end5s, end3s) and relationship codes for mutated positions.
- Return type:
tuple[tuple[list[int],list[int]],dict[int,int]]
- seismicrna.relate.py.relate.merge_mates(end5sf: list[int], end3sf: list[int], relsf: dict[int, int], end5sr: list[int], end3sr: list[int], relsr: dict[int, int], overhangs: bool)
Merge segment coordinates and relationships from a paired-end read.
Optionally trims overhanging ends so that the forward mate does not extend past the reverse mate and vice versa.
- Parameters:
end5sf (
list[int]) – 5’ segment ends for the forward mate.end3sf (
list[int]) – 3’ segment ends for the forward mate.relsf (
dict[int,int]) – Relationship codes for the forward mate.end5sr (
list[int]) – 5’ segment ends for the reverse mate.end3sr (
list[int]) – 3’ segment ends for the reverse mate.relsr (
dict[int,int]) – Relationship codes for the reverse mate.overhangs (
bool) – Whether to allow one mate to overhang the other. If False, overhanging regions are trimmed before merging.
- Returns:
Combined (end5s, end3s) for all segments and the merged relationship codes for mutated positions.
- Return type:
tuple[tuple[list[int],list[int]],dict[int,int]]