seismicrna.relate.aux package

Subpackages

Submodules

class seismicrna.relate.aux.cigarop.CigarOp(op: str)

Bases: object

Represent one operation in a CIGAR string.

__str__()

Text that goes into the CIGAR string.

lengthen()

Lengthen the operation by 1 base call.

property op

CIGAR operation as a character.

seismicrna.relate.aux.cigarop.count_cigar_muts(cigar_string: str)

Count the mutations in a CIGAR string.

seismicrna.relate.aux.cigarop.count_cigar_read(cigar_string: str)

Count the read positions consumed by a CIGAR string.

seismicrna.relate.aux.cigarop.count_cigar_ref(cigar_string: str)

Count the reference positions consumed by a CIGAR string.

seismicrna.relate.aux.cigarop.find_cigar_op_pos_read(cigar_string: str, find_op: str)

Yield the position in the read of every base with a type of operation specified by a CIGAR string.

seismicrna.relate.aux.cigarop.find_cigar_op_pos_ref(cigar_string: str, find_op: str, end5: int)

Yield the position in the reference of every base with a type of operation specified by a CIGAR string.

seismicrna.relate.aux.cigarop.op_is_mutation(op: str)

Whether the CIGAR operation is a mutation.

seismicrna.relate.aux.infer.infer_read(refseq: DNA, end5: int, end3: int, muts: dict[int, int], hi_qual: str = 'I', lo_qual: str = '!', ins_len: int | Sequence[int] = 1)

Infer the sequence and quality string of a read from a reference sequence and relationships.

Parameters:
  • refseq (DNA) – Sequence of the reference.

  • end5 (int) – 5’ end of the read with respect to the reference.

  • end3 (int) – 3’ end of the read with respect to the reference.

  • muts (dict[int, int]) – Mutations in the read, keyed by their positions.

  • hi_qual (str = MAX_QUAL) – Character to put in the quality string at every position that is high-quality according to the relation vector.

  • lo_qual (str = MIN_QUAL) – Character to put in the quality string at every position that is low-quality according to the relation vector.

  • ins_len (int | Sequence[int] = 1) – Number of bases to insert into the read and quality strings upon finding an insertion in the relation vector. If an integer, then insert that number of bases for every insertion. If a sequence of integers, then the ith insertion gets a number of bases equal to the ith element of ins_len.

seismicrna.relate.aux.iterread.iter_alignments(*args, **kwargs)

For a given reference sequence, find every read that could come from the reference (with up to 2 bases inserted). For each read, yield the (possibly ambiguous) relation vector and every possible CIGAR string.

seismicrna.relate.aux.iterread.ref_to_alignments(refseq: DNA, *, insert3: bool, max_ins: int = 0, max_ins_len: int = 1, max_ins_bases: int | None = None)

For a given reference sequence, map every possible read to its CIGAR string(s) and (possibly ambiguous) relation vector.

Parameters:
  • refseq (DNA) – Sequence of the reference.

  • insert3 (bool:) – Whether to mark the base 5’ or 3’ of an insertion.

  • max_ins (int) – Maximum number of insertions in the read. Must be ≥ 0.

  • max_ins_len (int) – Maximum length of (i.e. number of bases in) one insertion. Must be ≥ 1.

  • max_ins_bases (int | None) – Maximum total number of bases inserted. Must be ≥ max_ins. If None, there is no limit.

seismicrna.relate.aux.iterrel.iter_relvecs_all(refseq: DNA, insert3: bool = True, max_ins: int | None = None)

For a given reference sequence, yield every possible unambiguous relation vector that has at most two insertions.

Parameters:
  • refseq (DNA) – Sequence of the reference.

  • insert3 (bool) – Whether to mark the base 5’ or 3’ of an insertion.

  • max_ins (int | None) – Maximum number of insertions in a read.

seismicrna.relate.aux.iterrel.iter_relvecs_q53(refseq: DNA, low_qual: Sequence[int] = (), end5: int | None = None, end3: int | None = None, insert3: bool = True, max_ins: int | None = None)

For a given reference sequence, yield every possible unambiguous relation vector between positions end5 and end3 that follows the given low-quality positions and has at most two insertions.

Parameters:
  • refseq (DNA) – Sequence of the reference.

  • low_qual (Sequence[int]) – List of positions in the read that are low-quality.

  • end5 (int | None) – 5’ end of the read; 1-indexed with respect to refseq.

  • end3 (int | None) – 3’ end of the read; 1-indexed with respect to refseq.

  • insert3 (bool) – Whether to mark the base 5’ or 3’ of an insertion.

  • max_ins (int | None) – Maximum number of insertions in the read.