seismicrna.core.rna package
Subpackages
- seismicrna.core.rna.tests package
- Submodules
TestFormatDbString
TestFormatDbString.test_deep_pairs()
TestFormatDbString.test_excess_pseudoknot()
TestFormatDbString.test_invalid_pair()
TestFormatDbString.test_invalid_pos3()
TestFormatDbString.test_invalid_pos5()
TestFormatDbString.test_multi_pseudoknot()
TestFormatDbString.test_no_pairs()
TestFormatDbString.test_one_pair()
TestFormatDbString.test_pseudoknot()
TestFormatDbString.test_repeat_pos3()
TestFormatDbString.test_repeat_pos5()
TestFormatDbString.test_shallow_pairs()
TestPairedMarks
TestParseDbString
TestParseDbString.test_dangling_closer()
TestParseDbString.test_dangling_opener()
TestParseDbString.test_deep_pairs()
TestParseDbString.test_multi_marks()
TestParseDbString.test_multi_pseudoknot()
TestParseDbString.test_no_pairs()
TestParseDbString.test_pseudoknot()
TestParseDbString.test_shallow_pairs()
TestConstants
TestDictToPairs
TestDictToTable
TestFindEnclosingPairs
TestPairsToDict
TestPairsToTable
TestTableToDict
TestTableToPairs
TestComputeAucRoc
TestComputeAucRoc.test_all_false_positives()
TestComputeAucRoc.test_all_positions_unpaired()
TestComputeAucRoc.test_empty_prediction()
TestComputeAucRoc.test_empty_prediction_with_unpaired()
TestComputeAucRoc.test_invalid_dtype()
TestComputeAucRoc.test_invalid_dtype_for_unpaired()
TestComputeAucRoc.test_mixed_prediction()
TestComputeAucRoc.test_neither_paired_nor_unpaired()
TestComputeAucRoc.test_no_positions_unpaired()
TestComputeAucRoc.test_perfect_prediction()
TestComputeFprTpr
TestComputeFprTpr.test_all_false_positives()
TestComputeFprTpr.test_all_positions_unpaired()
TestComputeFprTpr.test_empty_prediction()
TestComputeFprTpr.test_empty_prediction_with_unpaired()
TestComputeFprTpr.test_invalid_dtype()
TestComputeFprTpr.test_invalid_dtype_for_unpaired()
TestComputeFprTpr.test_mixed_paired_unpaired()
TestComputeFprTpr.test_mixed_prediction()
TestComputeFprTpr.test_neither_paired_nor_unpaired()
TestComputeFprTpr.test_no_positions_unpaired()
TestComputeFprTpr.test_perfect_prediction()
TestComputeRocCurve
TestComputeRocCurve.test_all_false_positives()
TestComputeRocCurve.test_all_positions_unpaired()
TestComputeRocCurve.test_empty_prediction()
TestComputeRocCurve.test_empty_prediction_with_unpaired()
TestComputeRocCurve.test_invalid_dtype()
TestComputeRocCurve.test_invalid_dtype_for_unpaired()
TestComputeRocCurve.test_mixed_prediction()
TestComputeRocCurve.test_neither_paired_nor_unpaired()
TestComputeRocCurve.test_no_positions_unpaired()
TestComputeRocCurve.test_perfect_prediction()
TestCalcWfmi
TestCalcWfmi.test_all_unpaired()
TestCalcWfmi.test_all_vs_internal_pairs_1()
TestCalcWfmi.test_all_vs_internal_pairs_2()
TestCalcWfmi.test_completely_different_structures()
TestCalcWfmi.test_different_regions()
TestCalcWfmi.test_empty_region()
TestCalcWfmi.test_identical_structures()
TestCalcWfmi.test_internal_pairs_only()
TestCalcWfmi.test_mixed_paired_unpaired()
TestCalcWfmi.test_mixed_paired_unpaired_subset()
TestCalcWfmi.test_one_empty_one_paired()
TestCalcWfmi.test_pseudoknot_structures()
TestRNAStructure
- Submodules
Submodules
- class seismicrna.core.rna.base.RNARegion(*, region: Region, **kwargs)
Bases:
object
Region of an RNA sequence.
- property end3
Position of the 3’ end of the region.
- property end5
Position of the 5’ end of the region.
- property init_args
Arguments needed to initialize a new instance.
- property ref
Name of the reference sequence.
- property reg
Name of the region.
- property seq
Sequence of the region as RNA.
- property seq_record
- seismicrna.core.rna.convert.run_ct_to_db(input_path: Iterable[str | Path], *, force: bool = False, num_cpus: int = 4)
Convert connectivity table (CT) to dot-bracket (DB) files.
- seismicrna.core.rna.convert.run_db_to_ct(input_path: Iterable[str | Path], *, force: bool = False, num_cpus: int = 4)
Convert dot-bracket (DB) to connectivity table (CT) files.
- seismicrna.core.rna.ct.parse_ct_file(ct_path: str | Path)
Yield the title, region, and base pairs for each structure in a connectivity table (CT) file.
- Parameters:
ct_path (
str | Path
) – Path of the CT file.- Return type:
Generator[tuple[str
,Region
,list[tuple[int
,int]]]
,Any
,None]
- seismicrna.core.rna.db.format_db_string(pairs: Iterable[tuple[int, int]], length: int, seq5: int = 1)
Create a dot-bracket string from a list of base pairs.
- seismicrna.core.rna.db.parse_db_file_as_pairs(db_path: str | Path, seq5: int = 1)
Yield the title, region, and base pairs for each structure in a dot-bracket (DB) file.
- Parameters:
db_path (
str | Path
) – Path of the DB file.seq5 (
int = 1
) – Number to give the 5’ position of the sequence.
- Return type:
Generator[tuple[str
,Region
,list[tuple[int
,int]]]
,Any
,None]
- seismicrna.core.rna.db.parse_db_file_as_strings(db_path: str | Path)
Return the sequence and dot-bracket strings from a dot-bracket file.
- seismicrna.core.rna.db.parse_db_string(db_string: str, seq5: int = 1)
Parse a dot-bracket string into a list of base pairs.
- seismicrna.core.rna.io.ct_to_db(ct_path: Path, db_path: Path | None = None, force: bool = False)
Write a dot-bracket (DB) file of structures in a connectivity table (CT) file.
- seismicrna.core.rna.io.db_to_ct(db_path: Path, ct_path: Path | None = None, force: bool = False)
Write a connectivity table (CT) file of structures in a dot-bracket (DB) file.
- seismicrna.core.rna.io.find_ct_region(ct_path: Path) Region
Region shared among all structures in a CT file.
- seismicrna.core.rna.io.from_ct(ct_path: str | Path, branch: str = '')
Yield an instance of an RNAStructure for each structure in a connectivity table (CT) file.
- Parameters:
ct_path (
Path
) – Path of the CT file.branch (
str
) – Branch of the workflow for folding (optional).
- Returns:
RNA secondary structures from the CT file.
- Return type:
Generator[RNAStructure
,Any
,None]
- seismicrna.core.rna.io.from_db(db_path: str | Path, branch: str = '', seq5: int = 1)
Yield an instance of an RNAStructure for each structure in a dot-bracket (DB) file.
- Parameters:
db_path (
Path
) – Path of the DB file.branch (
str
) – Branch of the workflow for folding (optional).seq5 (
int = 1
) – Number to give the 5’ position of the sequence.
- Returns:
RNA secondary structures from the CT file.
- Return type:
Generator[RNAStructure
,Any
,None]
- seismicrna.core.rna.io.renumber_ct(ct_in: Path, ct_out: Path, seq5: int, force: bool = False)
Renumber the last column of a connectivity table (CT) file.
- Parameters:
ct_in (
Path
) – Path of the input CT file.ct_out (
Path
) – Path of the output CT file.seq5 (
int
) – Number to give the 5’ position in the renumbered CT file.force (
bool = False
) – Overwrite the output CT file if it already exists.
- seismicrna.core.rna.io.to_ct(structures: Iterable[RNAStructure], ct_path: Path, force: bool = False)
Write a connectivity table (CT) file of RNA structures.
- Parameters:
structures (
Iterable[RNAStructure]
) – RNA structures to write to the CT file.ct_path (
Path
) – Path of the CT file.force (
bool = False
) – Overwrite the output CT file if it already exists.
- seismicrna.core.rna.io.to_db(structures: Iterable[RNAStructure], db_path: Path, force: bool = False)
Write a dot-bracket (DB) file of RNA structures.
- Parameters:
structures (
Iterable[RNAStructure]
) – RNA structures to write to the CT file.db_path (
Path
) – Path of the DB file.force (
bool = False
) – Overwrite the output DB file if it already exists.
- seismicrna.core.rna.pair.dict_to_pairs(pair_dict: dict[int, int])
Tuples of the 5’ and 3’ position in each pair.
- seismicrna.core.rna.pair.dict_to_table(pair_dict: dict[int, int], region: Region)
Series of every position in the region and the base to which it pairs, or 0 if it does not pair.
- seismicrna.core.rna.pair.find_enclosing_pairs(table: Series)
Find the base pair that encloses each position.
- seismicrna.core.rna.pair.find_root_pairs(pairs: Iterable[tuple[int, int]])
Return all pairs that are not contained by any other pair.
- seismicrna.core.rna.pair.map_nested(pairs: Iterable[tuple[int, int]])
Map each pair to the pair in which it is nested.
- seismicrna.core.rna.pair.pairs_to_dict(pairs: Iterable[tuple[int, int]])
Return a dictionary that maps each position to the base to which it pairs and contains no key for unpaired positions.
- seismicrna.core.rna.pair.pairs_to_table(pairs: Iterable[tuple[int, int]], region: Region)
Series of every position in the region and the base to which it pairs, or 0 if it does not pair.
- seismicrna.core.rna.pair.renumber_pairs(pairs: Iterable[tuple[int, int]], offset: int)
Renumber pairs by offsetting each number.
- Parameters:
pairs (
Iterable[tuple[int
,int]]
) – Pairs to renumber.offset (
int
) – Offset by which to chage the numbering.
- Returns:
Renumbered pairs, in the same order as given.
- Return type:
Generator[tuple[int
,int]
,Any
,None]
- seismicrna.core.rna.pair.table_to_dict(table: Series)
Dictionary of the 5’ and 3’ position in each pair.
- seismicrna.core.rna.pair.table_to_pairs(table: Series)
Tuples of the 5’ and 3’ position in each pair.
- class seismicrna.core.rna.profile.RNAProfile(*, sample: str, branches: dict[str, str], mus_reg: str, mus_name: str, mus: Series, **kwargs)
Bases:
RNARegion
Mutational profile of an RNA.
- property init_args
Arguments needed to initialize a new instance.
- property profile
Name of the mutational profile.
- seismicrna.core.rna.roc.compute_auc(fpr: ndarray, tpr: ndarray)
Compute the area under the curve (AUC) of the receiver operating characteristic (ROC).
- Parameters:
fpr (
numpy.ndarray
) – False positive rate (FPR) of the ROC curve.tpr (
numpy.ndarray
) – True positive rate (TPR) of the ROC curve.
- Returns:
AUC-ROC
- Return type:
- seismicrna.core.rna.roc.compute_auc_roc(profile: Series, is_paired: Series, is_unpaired: Series | None = None)
Compute the receiver operating characteristic (ROC) and the area under the curve (AUC) to indicate how well mutation data agree with a structure.
- Parameters:
profile (
pandas.Series
) – Mutational profile with one index per position, where each value is the mutation rate at the position.is_paired (
pandas.Series
) – Boolean series with one index per position, where each value is True if the base at the position is paired, otherwise False.is_unpaired (
pandas.Series | None
) – Boolean series with one index per position, where each value is True if the base at the position is unpaired, otherwise False.
- Returns:
AUC-ROC
- Return type:
- seismicrna.core.rna.roc.compute_roc_curve(profile: Series, is_paired: Series, is_unpaired: Series | None = None)
Compute the receiver operating characteristic (ROC) curve to indicate how well mutation data agree with a structure.
- Parameters:
profile (
pandas.Series
) – Mutational profile with one index per position, where each value is the mutation rate at the position.is_paired (
pandas.Series
) – Boolean series with one index per position, where each value is True if the base at the position is paired, otherwise False.is_unpaired (
pandas.Series | None
) – Boolean series with one index per position, where each value is True if the base at the position is unpaired, otherwise False.
- Returns:
FPR and TPR axes, respectively, of the ROC curve.
- Return type:
tuple[numpy.ndarray
,numpy.ndarray]
- seismicrna.core.rna.roc.compute_rolling_auc(profile: Series, *structs: Series, size: int, min_data: int = 2)
Compute the area under the curve (AUC) of the receiver operating characteristic (ROC) at each position using a sliding window.
- Parameters:
profile (
pandas.Series
) – Mutational profile with one index per position, where each value is the mutation rate at the position.*structs (
pandas.Series
) – Boolean series with one index per position; each element is the pairing status of a base at that position. Ignore values that are None.size (
int
) – Size of the window.min_data (
int = 2
) – Minimum number of data in a window to use it, otherwise NaN.
- Returns:
AUC-ROC at each position.
- Return type:
pandas.Series
- class seismicrna.core.rna.state.RNAState(*, title: str, pairs: Iterable[tuple[int, int]], branch: str = '', **kwargs)
Bases:
RNAStructure
,RNAProfile
RNA secondary structure with mutation rates.
- calc_auc(terminal_pairs: bool = True)
Calculate the area under the ROC curve (AUC-ROC).
- Parameters:
terminal_pairs (
bool
) – Whether to count terminal base pairs as paired (if True) or as neither paired nor unpaired (if False).
- calc_auc_rolling(size: int, min_data: int = 2, terminal_pairs: bool = True)
Calculate the area under the ROC curve (AUC-ROC).
- calc_roc(terminal_pairs: bool = True)
Calculate the receiver operating characteristic (ROC) curve.
- Parameters:
terminal_pairs (
bool
) – Whether to count terminal base pairs as paired (if True) or as neither paired nor unpaired (if False).
- classmethod from_struct_profile(struct: RNAStructure, profile: RNAProfile)
Make an RNAState from an RNAStructure and an RNAProfile.
- class seismicrna.core.rna.struct.RNAStructure(*, title: str, pairs: Iterable[tuple[int, int]], branch: str = '', **kwargs)
Bases:
RNARegion
Secondary structure of an RNA.
- property ct_data
Convert the connectivity table to a DataFrame.
- property ct_text
Connectivity table as text.
- property ct_title
Header line for the CT file.
- property db_string
Dot-bracket string (structure only).
- property db_title
Header line for the DB file.
- property dict
Map from each paired base to its partner.
- classmethod from_db_string(db_string: str, seq: DNA | RNA | str, *, seq5: int = 1, ref: str, reg: str, **kwargs)
Create an RNAStructure from a dot-bracket string.
- property init_args
Arguments needed to initialize a new instance.
- property is_paired
Whether each base is paired.
- property is_paired_internally
Whether each base is paired and between two other base pairs (no bulges or other unpaired bases next to it).
- property is_paired_terminally
Whether each base is paired and terminates a consecutive stretch of base pairs (i.e. is not internally paired).
- property is_unpaired
Whether each base is unpaired.
- iter_root_modules()
- property pairs
Base pairs in the structure.
- property roots
All pairs that are not contained by any other pair.
- class seismicrna.core.rna.struct.Rna2dPart(*regions: RNARegion, **kwargs)
Bases:
object
Part of an RNA secondary structure.
- class seismicrna.core.rna.struct.Rna2dStem(side1: RNARegion, side2: RNARegion, **kwargs)
Bases:
Rna2dPart
An RNA stem (contiguous double helix).
- property region3
- property region5
- class seismicrna.core.rna.struct.Rna2dStemLoop(region: RNARegion, **kwargs)
Bases:
RnaJunction
An RNA loop at the end of a stem.
- property region
- class seismicrna.core.rna.struct.RnaJunction(*regions: RNARegion, **kwargs)
Bases:
Rna2dPart
A junction between stems in an RNA structure.
- seismicrna.core.rna.struct.calc_wfmi(struct1: RNAStructure, struct2: RNAStructure, terminal_pairs: bool = True)
Weighted Fowlkes-Mallows index between two structures.