seismicrna.core.seq package
Subpackages
- seismicrna.core.seq.tests package- Submodules- TestFormat
- TestParseFasta
- TestValidFastaSeqname
- TestWriteFasta
- TestConstants
- TestGetSharedIndex
- TestGetWindows- TestGetWindows.compare()
- TestGetWindows.test_1_series_size_1_min_0_excl_nan()
- TestGetWindows.test_1_series_size_1_min_0_incl_nan()
- TestGetWindows.test_1_series_size_1_min_1_excl_nan()
- TestGetWindows.test_1_series_size_1_min_1_incl_nan()
- TestGetWindows.test_1_series_size_2_min_1_excl_nan()
- TestGetWindows.test_1_series_size_2_min_1_incl_nan()
- TestGetWindows.test_1_series_size_2_min_2_excl_nan()
- TestGetWindows.test_1_series_size_2_min_2_incl_nan()
- TestGetWindows.test_2_series_size_1_min_0_excl_nan()
- TestGetWindows.test_2_series_size_1_min_0_incl_nan()
- TestGetWindows.test_2_series_size_1_min_1_excl_nan()
- TestGetWindows.test_2_series_size_1_min_1_incl_nan()
- TestGetWindows.test_2_series_size_2_min_0_excl_nan()
- TestGetWindows.test_2_series_size_2_min_0_incl_nan()
- TestGetWindows.test_2_series_size_2_min_1_excl_nan()
- TestGetWindows.test_2_series_size_2_min_1_incl_nan()
- TestGetWindows.test_2_series_size_2_min_2_excl_nan()
- TestGetWindows.test_2_series_size_2_min_2_incl_nan()
- TestGetWindows.test_empty()
 
- TestHyphenateEnds
- TestIndexToPos
- TestIndexToSeq
- TestIntersect- TestIntersect.test_diff_refs()
- TestIntersect.test_diff_seqs()
- TestIntersect.test_empty_invalid()
- TestIntersect.test_one_full()
- TestIntersect.test_one_full_named()
- TestIntersect.test_one_masked()
- TestIntersect.test_one_slice()
- TestIntersect.test_three_overlapping()
- TestIntersect.test_two_disjoint()
- TestIntersect.test_two_full()
- TestIntersect.test_two_masked()
- TestIntersect.test_two_overlapping()
 
- TestRegionAddMask
- TestRegionCopy
- TestRegionEqual- TestRegionEqual.test_diff_end3()
- TestRegionEqual.test_diff_end5()
- TestRegionEqual.test_diff_full()
- TestRegionEqual.test_diff_mask_name()
- TestRegionEqual.test_diff_mask_positions()
- TestRegionEqual.test_diff_name()
- TestRegionEqual.test_diff_ref()
- TestRegionEqual.test_diff_seq()
- TestRegionEqual.test_diff_seq5()
- TestRegionEqual.test_equal_full()
- TestRegionEqual.test_equal_mask()
- TestRegionEqual.test_equal_part()
 
- TestRegionInit- TestRegionInit.test_full()
- TestRegionInit.test_full_blank_name()
- TestRegionInit.test_full_end3()
- TestRegionInit.test_full_end5()
- TestRegionInit.test_full_given_name()
- TestRegionInit.test_partial_reflen_equal()
- TestRegionInit.test_partial_reflen_greater()
- TestRegionInit.test_partial_reflen_less()
- TestRegionInit.test_partial_seq5_equal()
- TestRegionInit.test_partial_seq5_greater()
- TestRegionInit.test_partial_seq5_less()
- TestRegionInit.test_partial_seq5_reflen()
- TestRegionInit.test_partial_slice()
- TestRegionInit.test_partial_slice_invalid_end3()
- TestRegionInit.test_partial_slice_invalid_end5()
- TestRegionInit.test_partial_slice_invalid_reflen()
- TestRegionInit.test_partial_slice_invalid_seq5()
- TestRegionInit.test_slice_end3_equal()
- TestRegionInit.test_slice_end3_greater()
- TestRegionInit.test_slice_end3_less()
- TestRegionInit.test_slice_end5_end3()
- TestRegionInit.test_slice_end5_end3_invalid()
- TestRegionInit.test_slice_end5_equal()
- TestRegionInit.test_slice_end5_greater()
- TestRegionInit.test_slice_end5_less()
 
- TestRegionLength
- TestRegionMaskGU
- TestRegionMaskList
- TestRegionMaskNames
- TestRegionMaskPolyA
- TestRegionMasked
- TestRegionRange
- TestRegionUnmasked
- TestSeqPosToIndex- TestSeqPosToIndex.test_invalid_dup_1()
- TestSeqPosToIndex.test_invalid_empty_seq_1()
- TestSeqPosToIndex.test_invalid_empty_seq_2()
- TestSeqPosToIndex.test_invalid_empty_seq_3()
- TestSeqPosToIndex.test_invalid_full_0()
- TestSeqPosToIndex.test_invalid_greater_end_9()
- TestSeqPosToIndex.test_invalid_less_start_2()
- TestSeqPosToIndex.test_invalid_unsort_1()
- TestSeqPosToIndex.test_valid_empty_1()
- TestSeqPosToIndex.test_valid_empty_seq()
- TestSeqPosToIndex.test_valid_full_1()
- TestSeqPosToIndex.test_valid_full_9()
- TestSeqPosToIndex.test_valid_noncontig_2()
- TestSeqPosToIndex.test_valid_slice_6()
 
- TestSubregion- TestSubregion.test_full_region_full_length_sub()
- TestSubregion.test_full_region_full_sub()
- TestSubregion.test_full_region_full_sub_end3()
- TestSubregion.test_full_region_full_sub_end5()
- TestSubregion.test_full_region_full_sub_name()
- TestSubregion.test_partial_trunc_region_trunc_sub()
- TestSubregion.test_trunc_region_full_sub()
- TestSubregion.test_trunc_region_trunc_sub()
 
- TestUnite- TestUnite.test_diff_refs()
- TestUnite.test_diff_seqs()
- TestUnite.test_empty_invalid()
- TestUnite.test_one_full()
- TestUnite.test_one_full_named()
- TestUnite.test_one_masked()
- TestUnite.test_one_slice()
- TestUnite.test_two_disjoint()
- TestUnite.test_two_disjoint_refseq()
- TestUnite.test_two_disjoint_wrong_refseq()
- TestUnite.test_two_full()
- TestUnite.test_two_masked()
- TestUnite.test_two_overlapping()
- TestUnite.test_two_overlapping_refseq()
 
- TestWindowToMargins
- TestDNA- TestDNA.test_alph()
- TestDNA.test_bool()
- TestDNA.test_contains()
- TestDNA.test_from_any_seq_invalid()
- TestDNA.test_from_any_seq_valid()
- TestDNA.test_get_alphaset()
- TestDNA.test_get_comp()
- TestDNA.test_get_comptrans()
- TestDNA.test_get_nonalphaset()
- TestDNA.test_invalid_bases()
- TestDNA.test_kmers()
- TestDNA.test_picto()
- TestDNA.test_random()
- TestDNA.test_reverse_complement()
- TestDNA.test_slice()
- TestDNA.test_to_array()
- TestDNA.test_transcribe()
- TestDNA.test_valid()
 
- TestExpandDegenerateSeq
- TestRNA- TestRNA.test_alph()
- TestRNA.test_bool()
- TestRNA.test_from_any_seq_invalid()
- TestRNA.test_from_any_seq_valid()
- TestRNA.test_get_alphaset()
- TestRNA.test_get_comp()
- TestRNA.test_get_comptrans()
- TestRNA.test_get_nonalphaset()
- TestRNA.test_invalid_bases()
- TestRNA.test_picto()
- TestRNA.test_random()
- TestRNA.test_reverse_complement()
- TestRNA.test_reverse_transcribe()
- TestRNA.test_slice()
- TestRNA.test_to_array()
- TestRNA.test_valid()
 
- TestXNA- TestXNA.test_abstract_base_class()
- TestXNA.test_dict_str_dna_rna()
- TestXNA.test_equal_dna_dna()
- TestXNA.test_equal_rna_rna()
- TestXNA.test_hashable_dna()
- TestXNA.test_hashable_rna()
- TestXNA.test_not_equal_dna_rna()
- TestXNA.test_not_equal_dna_str()
- TestXNA.test_not_equal_rna_str()
- TestXNA.test_set_str_dna_rna()
 
 
 
- Submodules
Submodules
- exception seismicrna.core.seq.fasta.BadReferenceNameError
- Bases: - ReferenceNameError- A reference name is not valid. 
- exception seismicrna.core.seq.fasta.BadReferenceNameLineError
- Bases: - ReferenceNameError- A line that should contain a reference name is not valid. 
- exception seismicrna.core.seq.fasta.DuplicateReferenceNameError
- Bases: - ReferenceNameError- A reference name occurs more than once. 
- exception seismicrna.core.seq.fasta.MissingReferenceNameError
- Bases: - ReferenceNameError- A reference name was expected to appear but is absent. 
- exception seismicrna.core.seq.fasta.ReferenceNameError
- Bases: - ValueError- Error in the name of a reference sequence. 
- seismicrna.core.seq.fasta.extract_fasta_seqname(line: str)
- Extract the name of a sequence from a line in FASTA format. 
- seismicrna.core.seq.fasta.format_fasta_seq_lines(seq: XNA, wrap: int = 0)
- Format a sequence in a FASTA file so that each line has at most wrap characters, or no limit if wrap is ≤ 0. 
- seismicrna.core.seq.fasta.get_fasta_seq(fasta: str | Path, seq_type: type[XNA], name: str)
- Get one sequence of a given name from a FASTA file. 
- seismicrna.core.seq.fasta.parse_fasta(fasta: str | Path, seq_type: type[XNA] | None, only: Iterable[str] | None = None)
- seismicrna.core.seq.fasta.valid_fasta_seqname(line: str) str
- Get a valid sequence name from a line in FASTA format. 
- seismicrna.core.seq.fasta.write_fasta(fasta: str | Path, refs: Iterable[tuple[str, XNA]], wrap: int = 0, force: bool = False)
- Write an iterable of reference names and DNA sequences to a FASTA file. 
- class seismicrna.core.seq.refs.RefSeqs(seqs: Iterable[tuple[str, XNA]] = ())
- Bases: - object- Store reference sequences. - iter()
- Yield every sequence and its name. 
 
- class seismicrna.core.seq.region.RefRegions(ref_seqs: Iterable[tuple[str, DNA]], *, regs_file: Path | None = None, ends: Iterable[tuple[str, int, int]] = (), primers: Iterable[tuple[str, DNA, DNA]] = (), primer_gap: int = 0, exclude_primers: bool = False, default_full: bool = True)
- Bases: - object- A collection of regions, grouped by reference. - property count
- Total number of regions. 
 - property dict
- List the regions for every reference. 
 - property refs
- Reference names. 
 - property regions
- List all regions. 
 
- class seismicrna.core.seq.region.Region(ref: str, seq: str | DNA | RNA, *, seq5: int = 1, reflen: int | None = None, end5: int | None = None, end3: int | None = None, name: str | None = None)
- Bases: - object- Region of a sequence between 5’ and 3’ end positions. - MASK_GU = 'pos-gu'
 - MASK_LIST = 'pos-list'
 - MASK_POLYA = 'pos-polya'
 - add_mask(name: str, positions: Iterable[int], complement: bool = False)
- Mask the integer positions in the array positions. - Parameters:
- name ( - str) – Name of the mask.
- positions ( - Iterable[int]) – Positions to mask (1-indexed).
- complement ( - bool = False) – If True, then leave only positions in positions unmasked.
 
 
 - property ends
- Tuple of the 5’ and 3’ ends. 
 - property hyphen
 - property length
- Length of the entire region. 
 - mask_gu()
- Mask positions whose base is neither A nor C. 
 - property mask_names
- Names of the masks. 
 - property masked_bool: ndarray
- Masked positions as a boolean array. 
 - property masked_int: ndarray
- Masked positions as integers. 
 - property masked_zero: ndarray
- Masked positions as integers (0-indexed with respect to the first position in the region). 
 - property range
- Index of all positions in the region. 
 - property range_int
- All positions in the region as integers. 
 - property range_one
- All 1-indexed positions in the region as integers. 
 - property ref_reg
 - renumber_from(seq5: int, name: str | None = None)
- Return a new region renumbered starting from a position. 
 - property size
- Number of relevant positions in the region. 
 - subregion(end5: int | None = None, end3: int | None = None, name: str | None = None)
- Return a new region from part of this region. 
 - to_dict()
 - property unmasked
- Index of unmasked positions in the region. 
 - property unmasked_bool: ndarray
- Unmasked positions as a boolean array. 
 - property unmasked_int: ndarray
- Unmasked positions as integers (1-indexed). 
 - property unmasked_zero: ndarray
- Unmasked positions as integers (0-indexed with respect to the first position in the region). 
 
- class seismicrna.core.seq.region.RegionFinder(ref: str, seq: DNA, *, seq5: int = 1, end5: int | None = None, end3: int | None = None, fwd: DNA | None = None, rev: DNA | None = None, primer_gap: int = 0, exclude_primers: bool = False, **kwargs)
- Bases: - Region- The 5’ and 3’ ends of a region can be given explicitly as integers, but if the sample is of an amplicon (i.e. generated by RT-PCR using site-specific primers), then it is often more convenient to enter the sequences of the PCR primers and have the software determine the 5’/3’ ends. RegionFinder accepts 5’ and 3’ ends given as integers or primers, validates them, and stores the ends as integers: - end5 = end5 if end5 is given, else the 3’ end of the forward primer
- (primer_gap + 1) if fwd is given, else 1 
 
- end3 = end3 if end3 is given, else the 5’ end of the reverse primer
- (primer_gap + 1) if rev is given, else the length of refseq 
 
 - static locate(seq: DNA, primer: DNA, seq5: int) RegionTuple
- Return the 5’ and 3’ positions (1-indexed) of a primer within a reference sequence. The primer must occur exactly once in the reference, otherwise an error is raised. - Parameters:
- seq ( - DNA) – The full reference sequence or a part of it.
- primer ( - DNA) – Sequence of the forward PCR primer or the reverse complement of the reverse PCR primer
- seq5 ( - int = 1) – Positional number to assign the 5’ end of the given part of the reference sequence. Must be ≥ 1.
 
- Returns:
- Named tuple of the first and last positions that the primer occupies in the reference sequence. Positions are 1-indexed and include the first and last positions. 
- Return type:
 
 
- class seismicrna.core.seq.region.RegionTuple(pos5, pos3)
- Bases: - tuple- pos3
- Alias for field number 1 
 - pos5
- Alias for field number 0 
 
- seismicrna.core.seq.region.get_reg_ends_primers(regs_file: Path)
- Parse a file defining each region by the name of its reference and either its 5’ and 3’ end positions or its forward and reverse primer sequences. Return one map from each reference and pair of 5’/3’ ends to the name of the corresponding region, and another from each reference and primer pair to the name of the corresponding region. - Parameters:
- regs_file ( - Path) – CSV file of a table that defines the regions. The table must have columns labeled “Reference”, “Region”, “5’ End”, “3’ End”, “Forward Primer”, and “Reverse Primer”. Others are ignored.
- Returns:
- dict[tuple[str, DNA, DNA], str]] Two mappings, the first from (ref name, 5’ end, 3’ end) to each region, the second from (ref name, fwd primer, rev primer) to each region. If the region is named in the “Region” column of the table, then that name will be used as the region name. Otherwise, the region name will be an empty string. 
- Return type:
- tuple[dict[tuple[str,- int,- int],- str],
 
- Get the shared index among all those given, as follows: - If indexes contains no elements and empty_ok is True, then return an empty MultiIndex with levels named ‘Positions’ and ‘Base’. 
- If indexes contains one element or multiple identical elements, and each has two levels named ‘Positions’ and ‘Base’, then return the first element. 
- Otherwise, raise an error. 
 - Parameters:
- indexes ( - Iterable[pandas.MultiIndex]) – Indexes to compare.
- empty_ok ( - bool = False) – If given no indexes, then default to an empty index (if True) or raise a ValueError (if False).
 
- Returns:
- The shared index. 
- Return type:
- pandas.MultiIndex
 
- seismicrna.core.seq.region.hyphenate_ends(end5: int, end3: int)
- Return the 5’ and 3’ ends as a hyphenated string. 
- seismicrna.core.seq.region.index_to_pos(index: MultiIndex)
- Get the positions from a MultiIndex of (pos, base) pairs. 
- seismicrna.core.seq.region.index_to_seq(index: MultiIndex, allow_gaps: bool = False)
- Get the DNA sequence from a MultiIndex of (pos, base) pairs. 
- seismicrna.core.seq.region.intersect(*regions: Region, name: str | None = None)
- Intersect one or more regions. 
- seismicrna.core.seq.region.iter_windows(*series: Series, size: int, min_count: int = 1, include_nan: bool = False)
- seismicrna.core.seq.region.seq_pos_to_index(seq: DNA, positions: Sequence[int], start: int)
- Convert a sequence and positions to indexes, where each index is a tuple of (position, base). - Parameters:
- seq ( - DNA) – DNA sequence.
- positions ( - Sequence[int]) – Positions of the sequence from which to build the index. Every position must be an integer ≥ start.
- start ( - int) – Numerical position to assign to the first base in the sequence. Must be a positive integer.
 
- Returns:
- MultiIndex of the same length as positions where each index is a tuple of (position, base). 
- Return type:
- pd.MultiIndex
 
- seismicrna.core.seq.region.unite(*regions: Region, name: str | None = None, refseq: DNA | None = None)
- Unite one or more regions. - Parameters:
- *regions ( - Region) – Regions to unite.
- name ( - str | None = None) – Name for the region to return.
- refseq ( - DNA | None = None) – Reference sequence (optional) for filling any gaps in the union of the regions. If given, then it must match every region at the corresponding positions. If omitted, then any positions not covered by at least one region will be filled with N.
 
- Returns:
- Union of all given regions. 
- Return type:
 
- seismicrna.core.seq.region.verify_index_names(index: MultiIndex)
- Verify that the names of the index are correct. 
- seismicrna.core.seq.region.window_to_margins(window: int)
- Compute the 5’ and 3’ margins from the size of the window. 
- class seismicrna.core.seq.xna.CompressedSeq(seq: XNA)
- Bases: - object- Compress a sequence into two bits per base. - decompress()
- Restore the original sequence. 
 - property type
 
- class seismicrna.core.seq.xna.DNA(seq: Any)
- Bases: - XNA- classmethod alph()
- Sequence alphabet. 
 - classmethod from_any_seq(seq: str | XNA)
- Create a sequence from a string or other sequence, possibly of a different type. 
 - classmethod pict()
- Sequence pictograms. 
 - tr()
- Transcribe DNA to RNA. 
 
- exception seismicrna.core.seq.xna.InvalidBaseError
- Bases: - ValueError- Invalid base for a sequence. 
- class seismicrna.core.seq.xna.RNA(seq: Any)
- Bases: - XNA- classmethod alph()
- Sequence alphabet. 
 - classmethod from_any_seq(seq: str | XNA)
- Create a sequence from a string or other sequence, possibly of a different type. 
 - classmethod pict()
- Sequence pictograms. 
 - rt()
- Reverse transcribe RNA to DNA. 
 
- class seismicrna.core.seq.xna.XNA(seq: Any)
- Bases: - ABC- __add__(other)
- Allow addition (concatenation) of two sequences only if the sequences have the same class. 
 - __bool__()
- Empty sequences return False; all else, True. 
 - __contains__(item)
- Check if a sequence is contained in this sequence. 
 - __eq__(other)
- Return True if both the type of the sequence and the bases in the sequence match, otherwise False. 
 - __getitem__(item)
- If item is a slice, then return an instance of the class. Otherwise, return an instance of str. 
 - __hash__()
- Define __hash__ so that Seq subclasses can be used as keys for dict-like mappings. Use the hash of the plain string. 
 - __mul__(other)
- Multiply a sequence by an int like a str times an int. 
 - __repr__()
- Encapsulate the sequence string with the class name. 
 - property array
- NumPy array of Unicode characters for the sequence. 
 - compress()
- Compress the sequence. 
 - classmethod four()
- Get the four standard bases. 
 - abstractmethod classmethod from_any_seq(seq: str | XNA) Self
- Create a sequence from a string or other sequence, possibly of a different type. 
 - classmethod get_alphaset()
- Get the alphabet as a set. 
 - classmethod get_comp()
- Get the complementary alphabet as a tuple. 
 - classmethod get_comptrans()
- Get the translation table for complementary bases. 
 - classmethod get_nonalphaset()
- Get the printable characters not in the alphabet. 
 - classmethod get_other_iupac()
- Get the IUPAC extended characters not in the alphabet. 
 - classmethod get_pictrans()
- Get the translation table for pictogram characters. 
 - property picto
- Pictogram string. 
 - classmethod random(nt: int, a: float = 0.25, c: float = 0.25, g: float = 0.25, t: float = 0.25)
- Return a random sequence of the given length. - Parameters:
- nt ( - int) – Number of nucleotides to simulate. Must be ≥ 0.
- a ( - float = 0.25) – Expected proportion of A.
- c ( - float = 0.25) – Expected proportion of C.
- g ( - float = 0.25) – Expected proportion of G.
- t ( - float = 0.25) – Expected proportion of T (if DNA) or U (if RNA).
 
- Returns:
- A random sequence. 
- Return type:
- DNA | RNA
 
 - property rc
- Reverse complement. 
 - classmethod t_or_u()
- Get the base that is complementary to A. 
 
- seismicrna.core.seq.xna.decompress(seq: CompressedSeq)
- Restore the original sequence from a CompressedSeq object.