seismicrna.core.rna package

Subpackages

Submodules

class seismicrna.core.rna.base.RNARegion(*, region: Region, **kwargs)

Bases: object

Region of an RNA sequence.

property end3

Position of the 3’ end of the region.

property end5

Position of the 5’ end of the region.

property init_args

Arguments needed to initialize a new instance.

property ref

Name of the reference sequence.

property reg

Name of the region.

renumber_from(seq5: int)

Return a new RNARegion renumbered starting from a position.

Parameters:

seq5 (int) – Position from which to start the new numbering system.

Returns:

RNARegion with renumbered positions.

Return type:

RNARegion

property seq

Sequence of the region as RNA.

property seq_record
subregion(end5: int, end3: int)
seismicrna.core.rna.convert.run_ct_to_db(input_path: Iterable[str | Path], *, force: bool = False, max_procs: int = 4)

Convert connectivity table (CT) to dot-bracket (DB) files.

Parameters:
  • force (bool) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]

  • max_procs (int) – Run up to this many processes simultaneously [keyword-only, default: 4]

seismicrna.core.rna.convert.run_db_to_ct(input_path: Iterable[str | Path], *, force: bool = False, max_procs: int = 4)

Convert dot-bracket (DB) to connectivity table (CT) files.

Parameters:
  • force (bool) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]

  • max_procs (int) – Run up to this many processes simultaneously [keyword-only, default: 4]

seismicrna.core.rna.ct.parse_ct(ct_path: Path)

Yield the title, region, and base pairs for each structure in a connectivity table (CT) file.

Parameters:

ct_path (Path) – Path of the CT file.

Return type:

Generator[tuple[str, Region, list[tuple[int, int]]], Any, None]

seismicrna.core.rna.db.format_db_structure(pairs: Iterable[tuple[int, int]], length: int, seq5: int = 1)

Create a dot-bracket string from a list of base pairs.

seismicrna.core.rna.db.parse_db(db_path: Path, seq5: int = 1)

Yield the title, region, and base pairs for each structure in a dot-bracket (DB) file.

Parameters:
  • db_path (Path) – Path of the DB file.

  • seq5 (int = 1) – Number to give the 5’ position of the sequence.

Return type:

Generator[tuple[str, Region, list[tuple[int, int]]], Any, None]

seismicrna.core.rna.db.parse_db_strings(db_path: Path)

Return the sequence and structures from a dot-bracket file.

seismicrna.core.rna.db.parse_db_structure(struct: str, seq5: int = 1)

Parse a dot-bracket structure into a list of base pairs.

seismicrna.core.rna.io.ct_to_db(ct_path: Path, db_path: Path | None = None, force: bool = False)

Write a dot-bracket (DB) file of structures in a connectivity table (CT) file.

seismicrna.core.rna.io.db_to_ct(db_path: Path, ct_path: Path | None = None, force: bool = False)

Write a connectivity table (CT) file of structures in a dot-bracket (DB) file.

seismicrna.core.rna.io.find_ct_region(ct_path: Path) Region

Region shared among all structures in a CT file.

seismicrna.core.rna.io.from_ct(ct_path: Path)

Yield an instance of an RNAStructure for each structure in a connectivity table (CT) file.

Parameters:

ct_path (Path) – Path of the CT file.

Returns:

RNA secondary structures from the CT file.

Return type:

Generator[RNAStructure, Any, None]

seismicrna.core.rna.io.from_db(db_path: Path, seq5: int = 1)

Yield an instance of an RNAStructure for each structure in a dot-bracket (DB) file.

Parameters:
  • db_path (Path) – Path of the DB file.

  • seq5 (int = 1) – Number to give the 5’ position of the sequence.

Returns:

RNA secondary structures from the CT file.

Return type:

Generator[RNAStructure, Any, None]

seismicrna.core.rna.io.renumber_ct(ct_in: Path, ct_out: Path, seq5: int, force: bool = False)

Renumber the last column of a connectivity table (CT) file.

Parameters:
  • ct_in (Path) – Path of the input CT file.

  • ct_out (Path) – Path of the output CT file.

  • seq5 (int) – Number to give the 5’ position in the renumbered CT file.

  • force (bool = False) – Overwrite the output CT file if it already exists.

seismicrna.core.rna.io.to_ct(structures: Iterable[RNAStructure], ct_path: Path, force: bool = False)

Write a connectivity table (CT) file of RNA structures.

Parameters:
  • structures (Iterable[RNAStructure]) – RNA structures to write to the CT file.

  • ct_path (Path) – Path of the CT file.

  • force (bool = False) – Overwrite the output CT file if it already exists.

seismicrna.core.rna.io.to_db(structures: Iterable[RNAStructure], db_path: Path, force: bool = False)

Write a dot-bracket (DB) file of RNA structures.

Parameters:
  • structures (Iterable[RNAStructure]) – RNA structures to write to the CT file.

  • db_path (Path) – Path of the DB file.

  • force (bool = False) – Overwrite the output DB file if it already exists.

seismicrna.core.rna.pair.dict_to_pairs(pair_dict: dict[int, int])

Tuples of the 5’ and 3’ position in each pair.

seismicrna.core.rna.pair.dict_to_table(pair_dict: dict[int, int], region: Region)

Series of every position in the region and the base to which it pairs, or 0 if it does not pair.

seismicrna.core.rna.pair.find_enclosing_pairs(table: Series)

Find the base pair that encloses each position.

seismicrna.core.rna.pair.find_root_pairs(pairs: Iterable[tuple[int, int]], assume_nested: bool = False)

Return all pairs which are not contained any other pair.

seismicrna.core.rna.pair.map_nested(pairs: Iterable[tuple[int, int]])

Map each pair to the pair in which it is nested.

seismicrna.core.rna.pair.pairs_to_dict(pairs: Iterable[tuple[int, int]])

Return a dictionary that maps each position to the base to which it pairs and contains no key for unpaired positions.

seismicrna.core.rna.pair.pairs_to_table(pairs: Iterable[tuple[int, int]], region: Region)

Series of every position in the region and the base to which it pairs, or 0 if it does not pair.

seismicrna.core.rna.pair.renumber_pairs(pairs: Iterable[tuple[int, int]], offset: int)

Renumber pairs by offsetting each number.

Parameters:
  • pairs (Iterable[tuple[int, int]]) – Pairs to renumber.

  • offset (int) – Offset by which to chage the numbering.

Returns:

Renumbered pairs, in the same order as given.

Return type:

Generator[tuple[int, int], Any, None]

seismicrna.core.rna.pair.table_to_dict(table: Series)

Dictionary of the 5’ and 3’ position in each pair.

seismicrna.core.rna.pair.table_to_pairs(table: Series)

Tuples of the 5’ and 3’ position in each pair.

class seismicrna.core.rna.profile.RNAProfile(*, sample: str, data_reg: str, data_name: str, data: Series, **kwargs)

Bases: RNARegion

Mutational profile of an RNA.

get_ct_file(top: Path)

Get the path to the connectivity table (CT) file.

Parameters:

top (pathlib.Path) – Top-level directory.

Returns:

Path of the file.

Return type:

pathlib.Path

get_db_file(top: Path)

Get the path to the dot-bracket (DB) file.

Parameters:

top (pathlib.Path) – Top-level directory.

Returns:

Path of the file.

Return type:

pathlib.Path

get_dms_file(top: Path)

Get the path to the DMS data file.

Parameters:

top (pathlib.Path) – Top-level directory.

Returns:

DMS data file.

Return type:

pathlib.Path

get_fasta(top: Path)

Get the path to the FASTA file.

Parameters:

top (pathlib.Path) – Top-level directory.

Returns:

Path of the file.

Return type:

pathlib.Path

get_varna_color_file(top: Path)

Get the path to the VARNA color file.

Parameters:

top (pathlib.Path) – Top-level directory.

Returns:

Path of the file.

Return type:

pathlib.Path

property init_args

Arguments needed to initialize a new instance.

property profile

Name of the mutational profile.

to_dms(top: Path)

Write the DMS reactivities to a DMS file.

Parameters:

top (pathlib.Path) – Top-level directory.

Returns:

File into which the DMS reactivities were written.

Return type:

pathlib.Path

to_fasta(top: Path)

Write the RNA sequence to a FASTA file.

Parameters:

top (pathlib.Path) – Top-level directory.

Returns:

File into which the RNA sequence was written.

Return type:

pathlib.Path

to_varna_color_file(top: Path)

Write the VARNA colors to a file.

Parameters:

top (pathlib.Path) – Top-level directory.

Returns:

File into which the VARNA colors were written.

Return type:

pathlib.Path

seismicrna.core.rna.roc.compute_auc(fpr: ndarray, tpr: ndarray)

Compute the area under the curve (AUC) of the receiver operating characteristic (ROC).

Parameters:
  • fpr (numpy.ndarray) – False positive rate (FPR) of the ROC curve.

  • tpr (numpy.ndarray) – True positive rate (TPR) of the ROC curve.

Returns:

AUC-ROC

Return type:

float

seismicrna.core.rna.roc.compute_auc_roc(paired: Series, profile: Series)

Compute the receiver operating characteristic (ROC) and the area under the curve (AUC) to indicate how well mutation data agree with a structure.

Parameters:
  • paired (pandas.Series) – Boolean series with one index per position, where each value is True if the base at the position is paired, otherwise False.

  • profile (pandas.Series) – Mutational profile with one index per position, where each value is the mutation rate at the position.

Returns:

AUC-ROC

Return type:

float

seismicrna.core.rna.roc.compute_roc_curve(paired: Series, profile: Series)

Compute the receiver operating characteristic (ROC) curve to indicate how well mutation data agree with a structure.

Parameters:
  • paired (pandas.Series) – Boolean series with one index per position, where each value is True if the base at the position is paired, otherwise False.

  • profile (pandas.Series) – Mutational profile with one index per position, where each value is the mutation rate at the position.

Returns:

FPR and TPR axes, respectively, of the ROC curve.

Return type:

tuple[numpy.ndarray, numpy.ndarray]

seismicrna.core.rna.roc.compute_rolling_auc(paired: Series, profile: Series, size: int, min_data: int = 2)

Compute the area under the curve (AUC) of the receiver operating characteristic (ROC) at each position using a sliding window.

Parameters:
  • paired (pandas.Series) – Boolean series with one index per position, where each value is True if the base at the position is paired, otherwise False.

  • profile (pandas.Series) – Mutational profile with one index per position, where each value is the mutation rate at the position.

  • size (int) – Size of the window.

  • min_data (int = 2) – Minimum number of data in a window to use it (otherwise NaN).

Returns:

AUC-ROC at each position.

Return type:

pandas.Series

class seismicrna.core.rna.state.RNAState(*, title: str, pairs: Iterable[tuple[int, int]], **kwargs)

Bases: RNAStructure, RNAProfile

RNA secondary structure with mutation rates.

property auc
classmethod from_struct_profile(struct: RNAStructure, profile: RNAProfile)

Make an RNAState from an RNAStructure and an RNAProfile.

property roc
rolling_auc(size: int, min_data: int = 2)
class seismicrna.core.rna.struct.RNAStructure(*, title: str, pairs: Iterable[tuple[int, int]], **kwargs)

Bases: RNARegion

Secondary structure of an RNA.

property ct_data

Convert the connectivity table to a DataFrame.

property ct_text

Connectivity table as text.

property ct_title

Header line for the CT file.

property db_structure

Dot-bracket string (structure only).

property db_title

Header line for the DB file.

property dict
get_db_text(sequence: bool)

Dot-bracket record.

property init_args

Arguments needed to initialize a new instance.

property is_paired

Series where each index is a position and each value is True if the corresponding base is paired, otherwise False.

iter_root_modules()
property pairs

Base pairs in the structure.

property roots
class seismicrna.core.rna.struct.Rna2dPart(*regions: RNARegion, **kwargs)

Bases: object

Part of an RNA secondary structure.

class seismicrna.core.rna.struct.Rna2dStem(side1: RNARegion, side2: RNARegion, **kwargs)

Bases: Rna2dPart

An RNA stem (contiguous double helix).

property region3
property region5
class seismicrna.core.rna.struct.Rna2dStemLoop(region: RNARegion, **kwargs)

Bases: RnaJunction

An RNA loop at the end of a stem.

property region
class seismicrna.core.rna.struct.RnaJunction(*regions: RNARegion, **kwargs)

Bases: Rna2dPart

A junction between stems in an RNA structure.