seismicrna.core.seq package
Subpackages
- seismicrna.core.seq.tests package
- Submodules
TestFormat
TestParseFasta
TestValidFastaSeqname
TestWriteFasta
TestConstants
TestGetSharedIndex
TestGetWindows
TestGetWindows.compare()
TestGetWindows.test_1_series_size_1_min_0_excl_nan()
TestGetWindows.test_1_series_size_1_min_0_incl_nan()
TestGetWindows.test_1_series_size_1_min_1_excl_nan()
TestGetWindows.test_1_series_size_1_min_1_incl_nan()
TestGetWindows.test_1_series_size_2_min_1_excl_nan()
TestGetWindows.test_1_series_size_2_min_1_incl_nan()
TestGetWindows.test_1_series_size_2_min_2_excl_nan()
TestGetWindows.test_1_series_size_2_min_2_incl_nan()
TestGetWindows.test_2_series_size_1_min_0_excl_nan()
TestGetWindows.test_2_series_size_1_min_0_incl_nan()
TestGetWindows.test_2_series_size_1_min_1_excl_nan()
TestGetWindows.test_2_series_size_1_min_1_incl_nan()
TestGetWindows.test_2_series_size_2_min_0_excl_nan()
TestGetWindows.test_2_series_size_2_min_0_incl_nan()
TestGetWindows.test_2_series_size_2_min_1_excl_nan()
TestGetWindows.test_2_series_size_2_min_1_incl_nan()
TestGetWindows.test_2_series_size_2_min_2_excl_nan()
TestGetWindows.test_2_series_size_2_min_2_incl_nan()
TestGetWindows.test_empty()
TestHyphenateEnds
TestIndexToPos
TestIndexToSeq
TestIntersect
TestIntersect.test_diff_refs()
TestIntersect.test_diff_seqs()
TestIntersect.test_empty_invalid()
TestIntersect.test_one_full()
TestIntersect.test_one_full_named()
TestIntersect.test_one_masked()
TestIntersect.test_one_slice()
TestIntersect.test_three_overlapping()
TestIntersect.test_two_disjoint()
TestIntersect.test_two_full()
TestIntersect.test_two_masked()
TestIntersect.test_two_overlapping()
TestRegionAddMask
TestRegionCopy
TestRegionEqual
TestRegionEqual.test_diff_end3()
TestRegionEqual.test_diff_end5()
TestRegionEqual.test_diff_full()
TestRegionEqual.test_diff_mask_name()
TestRegionEqual.test_diff_mask_positions()
TestRegionEqual.test_diff_name()
TestRegionEqual.test_diff_ref()
TestRegionEqual.test_diff_seq()
TestRegionEqual.test_diff_seq5()
TestRegionEqual.test_equal_full()
TestRegionEqual.test_equal_mask()
TestRegionEqual.test_equal_part()
TestRegionInit
TestRegionInit.test_full()
TestRegionInit.test_full_blank_name()
TestRegionInit.test_full_end3()
TestRegionInit.test_full_end5()
TestRegionInit.test_full_given_name()
TestRegionInit.test_partial_reflen_equal()
TestRegionInit.test_partial_reflen_greater()
TestRegionInit.test_partial_reflen_less()
TestRegionInit.test_partial_seq5_equal()
TestRegionInit.test_partial_seq5_greater()
TestRegionInit.test_partial_seq5_less()
TestRegionInit.test_partial_seq5_reflen()
TestRegionInit.test_partial_slice()
TestRegionInit.test_partial_slice_invalid_end3()
TestRegionInit.test_partial_slice_invalid_end5()
TestRegionInit.test_partial_slice_invalid_reflen()
TestRegionInit.test_partial_slice_invalid_seq5()
TestRegionInit.test_slice_end3_equal()
TestRegionInit.test_slice_end3_greater()
TestRegionInit.test_slice_end3_less()
TestRegionInit.test_slice_end5_end3()
TestRegionInit.test_slice_end5_end3_invalid()
TestRegionInit.test_slice_end5_equal()
TestRegionInit.test_slice_end5_greater()
TestRegionInit.test_slice_end5_less()
TestRegionLength
TestRegionMaskGU
TestRegionMaskList
TestRegionMaskNames
TestRegionMaskPolyA
TestRegionMasked
TestRegionRange
TestRegionUnmasked
TestSeqPosToIndex
TestSeqPosToIndex.test_invalid_dup_1()
TestSeqPosToIndex.test_invalid_empty_seq_1()
TestSeqPosToIndex.test_invalid_empty_seq_2()
TestSeqPosToIndex.test_invalid_empty_seq_3()
TestSeqPosToIndex.test_invalid_full_0()
TestSeqPosToIndex.test_invalid_greater_end_9()
TestSeqPosToIndex.test_invalid_less_start_2()
TestSeqPosToIndex.test_invalid_unsort_1()
TestSeqPosToIndex.test_valid_empty_1()
TestSeqPosToIndex.test_valid_empty_seq()
TestSeqPosToIndex.test_valid_full_1()
TestSeqPosToIndex.test_valid_full_9()
TestSeqPosToIndex.test_valid_noncontig_2()
TestSeqPosToIndex.test_valid_slice_6()
TestSubregion
TestSubregion.test_full_region_full_length_sub()
TestSubregion.test_full_region_full_sub()
TestSubregion.test_full_region_full_sub_end3()
TestSubregion.test_full_region_full_sub_end5()
TestSubregion.test_full_region_full_sub_name()
TestSubregion.test_partial_trunc_region_trunc_sub()
TestSubregion.test_trunc_region_full_sub()
TestSubregion.test_trunc_region_trunc_sub()
TestUnite
TestUnite.test_diff_refs()
TestUnite.test_diff_seqs()
TestUnite.test_empty_invalid()
TestUnite.test_one_full()
TestUnite.test_one_full_named()
TestUnite.test_one_masked()
TestUnite.test_one_slice()
TestUnite.test_two_disjoint()
TestUnite.test_two_disjoint_refseq()
TestUnite.test_two_disjoint_wrong_refseq()
TestUnite.test_two_full()
TestUnite.test_two_masked()
TestUnite.test_two_overlapping()
TestUnite.test_two_overlapping_refseq()
TestWindowToMargins
TestDNA
TestDNA.test_alph()
TestDNA.test_bool()
TestDNA.test_contains()
TestDNA.test_get_alphaset()
TestDNA.test_get_comp()
TestDNA.test_get_comptrans()
TestDNA.test_get_nonalphaset()
TestDNA.test_invalid_bases()
TestDNA.test_kmers()
TestDNA.test_picto()
TestDNA.test_random()
TestDNA.test_reverse_complement()
TestDNA.test_slice()
TestDNA.test_to_array()
TestDNA.test_transcribe()
TestDNA.test_valid()
TestExpandDegenerateSeq
TestRNA
TestRNA.test_alph()
TestRNA.test_bool()
TestRNA.test_get_alphaset()
TestRNA.test_get_comp()
TestRNA.test_get_comptrans()
TestRNA.test_get_nonalphaset()
TestRNA.test_invalid_bases()
TestRNA.test_picto()
TestRNA.test_random()
TestRNA.test_reverse_complement()
TestRNA.test_reverse_transcribe()
TestRNA.test_slice()
TestRNA.test_to_array()
TestRNA.test_valid()
TestXNA
TestXNA.test_abstract_base_class()
TestXNA.test_dict_str_dna_rna()
TestXNA.test_equal_dna_dna()
TestXNA.test_equal_rna_rna()
TestXNA.test_hashable_dna()
TestXNA.test_hashable_rna()
TestXNA.test_not_equal_dna_rna()
TestXNA.test_not_equal_dna_str()
TestXNA.test_not_equal_rna_str()
TestXNA.test_set_str_dna_rna()
- Submodules
Submodules
- exception seismicrna.core.seq.fasta.BadReferenceNameError
Bases:
ReferenceNameError
A reference name is not valid.
- exception seismicrna.core.seq.fasta.BadReferenceNameLineError
Bases:
ReferenceNameError
A line that should contain a reference name is not valid.
- exception seismicrna.core.seq.fasta.DuplicateReferenceNameError
Bases:
ReferenceNameError
A reference name occurs more than once.
- exception seismicrna.core.seq.fasta.MissingReferenceNameError
Bases:
ReferenceNameError
A reference name was expected to appear but is absent.
- exception seismicrna.core.seq.fasta.ReferenceNameError
Bases:
ValueError
Error in the name of a reference sequence.
- seismicrna.core.seq.fasta.extract_fasta_seqname(line: str)
Extract the name of a sequence from a line in FASTA format.
- seismicrna.core.seq.fasta.format_fasta_seq_lines(seq: XNA, wrap: int = 0)
Format a sequence in a FASTA file so that each line has at most wrap characters, or no limit if wrap is ≤ 0.
- seismicrna.core.seq.fasta.get_fasta_seq(fasta: Path, seq_type: type[XNA], name: str)
Get one sequence of a given name from a FASTA file.
- seismicrna.core.seq.fasta.parse_fasta(fasta: Path, seq_type: type[XNA] | None, only: Iterable[str] | None = None)
- seismicrna.core.seq.fasta.valid_fasta_seqname(line: str) str
Get a valid sequence name from a line in FASTA format.
- seismicrna.core.seq.fasta.write_fasta(fasta: Path, refs: Iterable[tuple[str, XNA]], wrap: int = 0, force: bool = False)
Write an iterable of reference names and DNA sequences to a FASTA file.
- class seismicrna.core.seq.refs.RefSeqs(seqs: Iterable[tuple[str, XNA]] = ())
Bases:
object
Store reference sequences.
- iter()
Yield every sequence and its name.
- class seismicrna.core.seq.region.RefRegions(ref_seqs: Iterable[tuple[str, DNA]], *, regs_file: Path | None = None, coords: Iterable[tuple[str, int, int]] = (), primers: Iterable[tuple[str, DNA, DNA]] = (), primer_gap: int = 0, exclude_primers: bool = False, default_full: bool = True)
Bases:
object
A collection of regions, grouped by reference.
- property count
Total number of regions.
- property dict
List the regions for every reference.
- property refs
Reference names.
- property regions
List all regions.
- class seismicrna.core.seq.region.Region(ref: str, seq: DNA, *, seq5: int = 1, reflen: int | None = None, end5: int | None = None, end3: int | None = None, name: str | None = None)
Bases:
object
Region of a reference sequence between two coordinates.
- MASK_GU = 'pos-gu'
- MASK_LIST = 'pos-list'
- MASK_POLYA = 'pos-polya'
- add_mask(name: str, positions: Iterable[int], complement: bool = False)
Mask the integer positions in the array positions.
- Parameters:
name (
str
) – Name of the mask.positions (
Iterable[int]
) – Positions to mask (1-indexed).complement (
bool = False
) – If True, then leave only positions in positions unmasked.
- property coord
Tuple of the 5’ and 3’ coordinates.
- property hyphen
- property length
Length of the entire region.
- mask_gu()
Mask positions whose base is neither A nor C.
- property mask_names
Names of the masks.
- property masked_bool: ndarray
Masked positions as a boolean array.
- property masked_int: ndarray
Masked positions as integers.
- property masked_zero: ndarray
Masked positions as integers (0-indexed with respect to the first position in the region).
- property range
Index of all positions in the region.
- property range_int
All positions in the region as integers.
- property range_one
All 1-indexed positions in the region as integers.
- property ref_reg
- renumber_from(seq5: int, name: str | None = None)
Return a new region renumbered starting from a position.
- property size
Number of relevant positions in the region.
- subregion(end5: int | None = None, end3: int | None = None, name: str | None = None)
Return a new region from part of this region.
- to_dict()
- property unmasked
Index of unmasked positions in the region.
- property unmasked_bool: ndarray
Unmasked positions as a boolean array.
- property unmasked_int: ndarray
Unmasked positions as integers (1-indexed).
- property unmasked_zero: ndarray
Unmasked positions as integers (0-indexed with respect to the first position in the region).
- class seismicrna.core.seq.region.RegionFinder(ref: str, seq: DNA, *, seq5: int = 1, end5: int | None = None, end3: int | None = None, fwd: DNA | None = None, rev: DNA | None = None, primer_gap: int = 0, exclude_primers: bool = False, **kwargs)
Bases:
Region
The 5’ and 3’ ends of a region can be given explicitly as integers, but if the sample is of an amplicon (i.e. generated by RT-PCR using site-specific primers), then it is often more convenient to enter the sequences of the PCR primers and have the software determine the coordinates. RegionFinder accepts 5’ and 3’ coordinates given as integers or primers, validates them, and stores the coordinates as integers, as follows:
- end5 = end5 if end5 is given, else the 3’ end of the forward primer
(primer_gap + 1) if fwd is given, else 1
- end3 = end3 if end3 is given, else the 5’ end of the reverse primer
(primer_gap + 1) if rev is given, else the length of refseq
- static locate(seq: DNA, primer: DNA, seq5: int) RegionTuple
Return the 5’ and 3’ positions (1-indexed) of a primer within a reference sequence. The primer must occur exactly once in the reference, otherwise an error is raised.
- Parameters:
seq (
DNA
) – The full reference sequence or a part of it.primer (
DNA
) – Sequence of the forward PCR primer or the reverse complement of the reverse PCR primerseq5 (
int = 1
) – Positional number to assign the 5’ end of the given part of the reference sequence. Must be ≥ 1.
- Returns:
Named tuple of the first and last positions that the primer occupies in the reference sequence. Positions are 1-indexed and include the first and last coordinates.
- Return type:
- class seismicrna.core.seq.region.RegionTuple(pos5, pos3)
Bases:
tuple
- pos3
Alias for field number 1
- pos5
Alias for field number 0
- seismicrna.core.seq.region.get_reg_coords_primers(regs_file: Path)
Parse a file defining each region by the name of its reference and either its 5’ and 3’ coordinates or its forward and reverse primer sequences. Return one map from each reference and 5’/3’ coordinate pair to the name of the corresponding region, and another from each reference and primer pair to the name of the corresponding region.
- Parameters:
regs_file (
Path
) – CSV file of a table that defines the regions. The table must have columns labeled “Reference”, “Region”, “5’ End”, “3’ End”, “Forward Primer”, and “Reverse Primer”. Others are ignored.- Returns:
dict[tuple[str, DNA, DNA], str]] Two mappings, the first from (ref name, 5’ coord, 3’ coord) to each region, the second from (ref name, fwd primer, rev primer) to each region. If the region is named in the “Region” column of the table, then that name will be used as the region name. Otherwise, the region name will be an empty string.
- Return type:
tuple[dict[tuple[str
,int
,int]
,str],
Get the shared index among all those given, as follows:
If indexes contains no elements and empty_ok is True, then return an empty MultiIndex with levels named ‘Positions’ and ‘Base’.
If indexes contains one element or multiple identical elements, and each has two levels named ‘Positions’ and ‘Base’, then return the first element.
Otherwise, raise an error.
- Parameters:
indexes (
Iterable[pandas.MultiIndex]
) – Indexes to compare.empty_ok (
bool = False
) – If given no indexes, then default to an empty index (if True) or raise a ValueError (if False).
- Returns:
The shared index.
- Return type:
pandas.MultiIndex
- seismicrna.core.seq.region.hyphenate_ends(end5: int, end3: int)
Return the 5’ and 3’ ends as a hyphenated string.
- seismicrna.core.seq.region.index_to_pos(index: MultiIndex)
Get the positions from a MultiIndex of (pos, base) pairs.
- seismicrna.core.seq.region.index_to_seq(index: MultiIndex, allow_gaps: bool = False)
Get the DNA sequence from a MultiIndex of (pos, base) pairs.
- seismicrna.core.seq.region.intersect(*regions: Region, name: str | None = None)
Intersect one or more regions.
- seismicrna.core.seq.region.iter_windows(*series: Series, size: int, min_count: int = 1, include_nan: bool = False)
- seismicrna.core.seq.region.seq_pos_to_index(seq: DNA, positions: Sequence[int], start: int)
Convert a sequence and positions to indexes, where each index is a tuple of (position, base).
- Parameters:
seq (
DNA
) – DNA sequence.positions (
Sequence[int]
) – Positions of the sequence from which to build the index. Every position must be an integer ≥ start.start (
int
) – Numerical position to assign to the first base in the sequence. Must be a positive integer.
- Returns:
MultiIndex of the same length as positions where each index is a tuple of (position, base).
- Return type:
pd.MultiIndex
- seismicrna.core.seq.region.unite(*regions: Region, name: str | None = None, refseq: DNA | None = None)
Unite one or more regions.
- Parameters:
*regions (
Region
) – Regions to unite.name (
str | None = None
) – Name for the region to return.refseq (
DNA | None = None
) – Reference sequence (optional) for filling any gaps in the union of the regions. If given, then it must match every region at the corresponding positions. If omitted, then any positions not covered by at least one region will be filled with N.
- Returns:
Union of all given regions.
- Return type:
- seismicrna.core.seq.region.verify_index_names(index: MultiIndex)
Verify that the names of the index are correct.
- seismicrna.core.seq.region.window_to_margins(window: int)
Compute the 5’ and 3’ margins from the size of the window.
Sequence Core Module.
Define alphabets and classes for nucleic acid sequences, and functions for reading them from and writing them to FASTA files.
- class seismicrna.core.seq.xna.CompressedSeq(seq: XNA)
Bases:
object
Compress a sequence into two bits per base.
- decompress()
Restore the original sequence.
- property type
- class seismicrna.core.seq.xna.DNA(seq: Any)
Bases:
XNA
- classmethod alph()
Sequence alphabet.
- classmethod pict()
Sequence pictograms.
- tr()
Transcribe DNA to RNA.
- class seismicrna.core.seq.xna.RNA(seq: Any)
Bases:
XNA
- classmethod alph()
Sequence alphabet.
- classmethod pict()
Sequence pictograms.
- rt()
Reverse transcribe RNA to DNA.
- class seismicrna.core.seq.xna.XNA(seq: Any)
Bases:
ABC
- __add__(other)
Allow addition (concatenation) of two sequences only if the sequences have the same class.
- __bool__()
Empty sequences return False; all else, True.
- __contains__(item)
Check if a sequence is contained in this sequence.
- __eq__(other)
Return True if both the type of the sequence and the bases in the sequence match, otherwise False.
- __getitem__(item)
If item is a slice, then return an instance of the class. Otherwise, return an instance of str.
- __hash__()
Define __hash__ so that Seq subclasses can be used as keys for dict-like mappings. Use the hash of the plain string.
- __mul__(other)
Multiply a sequence by an int like a str times an int.
- __repr__()
Encapsulate the sequence string with the class name.
- property array
NumPy array of Unicode characters for the sequence.
- compress()
Compress the sequence.
- classmethod four()
Get the four standard bases.
- classmethod get_alphaset()
Get the alphabet as a set.
- classmethod get_comp()
Get the complementary alphabet as a tuple.
- classmethod get_comptrans()
Get the translation table for complementary bases.
- classmethod get_nonalphaset()
Get the printable characters not in the alphabet.
- classmethod get_other_iupac()
Get the IUPAC extended characters not in the alphabet.
- classmethod get_pictrans()
Get the translation table for pictogram characters.
- property picto
Pictogram string.
- classmethod random(nt: int, a: float = 0.25, c: float = 0.25, g: float = 0.25, t: float = 0.25)
Return a random sequence of the given length.
- Parameters:
nt (
int
) – Number of nucleotides to simulate. Must be ≥ 0.a (
float = 0.25
) – Expected proportion of A.c (
float = 0.25
) – Expected proportion of C.g (
float = 0.25
) – Expected proportion of G.t (
float = 0.25
) – Expected proportion of T (if DNA) or U (if RNA).
- Returns:
A random sequence.
- Return type:
DNA | RNA
- property rc
Reverse complement.
- classmethod t_or_u()
Get the base that is complementary to A.
- seismicrna.core.seq.xna.decompress(seq: CompressedSeq)
Restore the original sequence from a CompressedSeq object.