seismicrna.fold package

Submodules

seismicrna.fold.datapath.run_datapath()

Guess the DATAPATH for RNAstructure.

seismicrna.fold.load.find_ct_files(files: Iterable[str | Path])

Yield a file for each given file/directory of a table.

seismicrna.fold.load.load_ct_structs(files: Iterable[str | Path])

Yield an RNA structure generator for each CT file.

seismicrna.fold.main.fold_profile(table: MaskPositionTableLoader | ClusterPositionTableLoader, regions: list[Region], quantile: float, n_procs: int, **kwargs)

Fold an RNA molecule from one table of reactivities.

seismicrna.fold.main.fold_region(rna: RNAProfile, *, out_dir: Path, quantile: float, fold_temp: float, fold_constraint: Path | None, fold_md: int, fold_mfe: bool, fold_max: int, fold_percent: float, force: bool, n_procs: int, tmp_pfx, keep_tmp, **kwargs)

Fold a region of an RNA from one mutational profile.

seismicrna.fold.main.load_foldable_tables(input_path: Iterable[str | Path])

Load tables that can be folded.

seismicrna.fold.main.run(input_path: Iterable[str | Path], *, fold_coords: Iterable[tuple[str, int, int]] = (), fold_primers: Iterable[tuple[str, DNA, DNA]] = (), fold_regions_file: str | None = None, fold_full: bool = True, quantile: float = 0.0, fold_temp: float = 310.15, fold_constraint: str | None = None, fold_md: int = 0, fold_mfe: bool = False, fold_max: int = 20, fold_percent: float = 20.0, tmp_pfx: str | Path = './tmp', keep_tmp: bool = False, max_procs: int = 4, force: bool = False)

Predict RNA secondary structures using mutation rates.

Parameters:
  • fold_coords (Iterable) – Fold a region of a reference given its 5’ and 3’ end coordinates [keyword-only, default: ()]

  • fold_primers (Iterable) – Fold a region of a reference given its forward and reverse primers [keyword-only, default: ()]

  • fold_regions_file (str | None) – Fold regions of references from coordinates/primers in a CSV file [keyword-only, default: None]

  • fold_full (bool) – If no regions are specified, whether to default to the full region or to the table’s region [keyword-only, default: True]

  • quantile (float) – Normalize and winsorize ratios to this quantile (0.0 disables) [keyword-only, default: 0.0]

  • fold_temp (float) – Predict structures at this temperature (Kelvin) [keyword-only, default: 310.15]

  • fold_constraint (str | None) – Force bases to be paired/unpaired from a file of constraints [keyword-only, default: None]

  • fold_md (int) – Limit base pair distances to this number of bases (0 for no limit) [keyword-only, default: 0]

  • fold_mfe (bool) – Predict only the minimum free energy (MFE) structure [keyword-only, default: False]

  • fold_max (int) – Output at most this many structures (overriden by –fold-mfe) [keyword-only, default: 20]

  • fold_percent (float) – Stop outputting structures when the % difference in energy exceeds this value (overriden by –fold-mfe) [keyword-only, default: 20.0]

  • tmp_pfx (str | pathlib._local.Path) – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp’]

  • keep_tmp (bool) – Keep temporary files after finishing [keyword-only, default: False]

  • max_procs (int) – Run up to this many processes simultaneously [keyword-only, default: 4]

  • force (bool) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]

class seismicrna.fold.report.FoldReport(**kwargs: Any | Callable[[Report], Any])

Bases: Report

classmethod auto_fields()

Names and automatic values of selected fields.

classmethod dir_seg_types()

Types of the directory segments in the path.

classmethod fields()

All fields of the report.

classmethod file_seg_type()

Type of the last segment in the path.

Wrapper around RNAstructure from the Mathews Lab at U of Rochester: https://rna.urmc.rochester.edu/RNAstructure.html

exception seismicrna.fold.rnastructure.ConnectivityTableAlreadyRetitledError

Bases: RuntimeError

A CT file was already retitled.

exception seismicrna.fold.rnastructure.RNAStructureConnectivityTableTitleLineFormatError

Bases: ValueError

Error in the format of a CT title line from RNAStructure.

seismicrna.fold.rnastructure.check_data_path(data_path: str | Path | None = None) Path

Confirm the DATAPATH environment variable indicates the correct directory.

seismicrna.fold.rnastructure.fold(rna: RNAProfile, *, fold_temp: float = 310.15, fold_constraint: Path | None = None, fold_md: int = 0, fold_mfe: bool = False, fold_max: int = 20, fold_percent: float = 20.0, out_dir: Path = './out', tmp_dir: Path, keep_tmp: bool = False, n_procs: int = 4)

Run the ‘Fold’ or ‘Fold-smp’ program of RNAstructure.

Parameters:
  • fold_temp (float) – Predict structures at this temperature (Kelvin) [keyword-only, default: 310.15]

  • fold_constraint (pathlib._local.Path | None) – Force bases to be paired/unpaired from a file of constraints [keyword-only, default: None]

  • fold_md (int) – Limit base pair distances to this number of bases (0 for no limit) [keyword-only, default: 0]

  • fold_mfe (bool) – Predict only the minimum free energy (MFE) structure [keyword-only, default: False]

  • fold_max (int) – Output at most this many structures (overriden by –fold-mfe) [keyword-only, default: 20]

  • fold_percent (float) – Stop outputting structures when the % difference in energy exceeds this value (overriden by –fold-mfe) [keyword-only, default: 20.0]

  • out_dir (Path) – Write all output files to this directory [keyword-only, default: ‘./out’]

  • keep_tmp (bool) – Keep temporary files after finishing [keyword-only, default: False]

seismicrna.fold.rnastructure.format_retitled_ct_line(length: int, ref: str, uniqid: int, energy: float)

Format a new CT title line including unique identifiers:

{length} {ref} #{uniqid}: {energy}

where {length} is the number of positions in the structure (required for all CT files), {ref} is the name of the reference, {uniqid} is the unique identifier, and {energy} is the free energy of folding.

Parameters:
  • length (int) – Number of positions in the structure.

  • uniqid (int) – Unique identifier (non-negative integer).

  • ref (str) – Name of the reference.

  • energy (float) – Free energy of folding (kcal/mol).

Returns:

Formatted CT title line.

Return type:

str

seismicrna.fold.rnastructure.guess_data_path()

Guess the DATAPATH.

seismicrna.fold.rnastructure.make_fold_cmd(fasta_file: Path, ct_file: Path, *, dms_file: Path | None, fold_constraint: Path | None, fold_temp: float, fold_md: int, fold_mfe: bool, fold_max: int, fold_percent: float, n_procs: int = 1)
seismicrna.fold.rnastructure.parse_energy(line: str)

Parse the predicted free energy of folding from a line in format

{length} {ref} #{uniqid}: {energy}

where {length} is the number of positions in the structure (required for all CT files), {ref} is the name of the reference, {uniqid} is the unique identifier, and {energy} is the free energy of folding.

Parameters:

line (str) – Line from which to parse the energy.

Returns:

Free energy of folding.

Return type:

float

seismicrna.fold.rnastructure.parse_rnastructure_ct_title(line: str)

Parse a title in a CT file from RNAstructure, in this format:

{length} ENERGY = {energy} {ref}

where {length} is the number of positions in the structure, {ref} is the name of the reference, and {energy} is the predicted free energy of folding. Also handle the edge case when RNAstructure predicts no base pairs (and thus does not write the free energy) by returning 0.

Parameters:

line (str) – Line containing the title of the structure.

Returns:

Tuple of number of positions in the structure, predicted free energy of folding, and name of the reference sequence.

Return type:

tuple[int, float, str]

seismicrna.fold.rnastructure.require_data_path()

Return an error message if the DATAPATH is not valid.

seismicrna.fold.rnastructure.retitle_ct(ct_input: Path, ct_output: Path, force: bool = False)

Retitle the structures in a CT file produced by RNAstructure.

The default titles follow this format:

ENERGY = {energy} {reference}

where {reference} is the name of the reference sequence and {energy} is the predicted free energy of folding.

The major problem with this format is that structures can have equal predicted free energies, so the titles of the structures can repeat, which would cause some functions (e.g. graphing ROC curves) to fail.

This function assigns a unique integer to each structure (starting with 0 for the minimum free energy and continuing upwards), which ensures that no two structures have identical titles.

Parameters:
  • ct_input (Path) – Path of the CT file to retitle.

  • ct_output (Path) – Path of the CT file to which to write the retitled information.

  • force (bool = False) – Overwrite the output CT file if it already exists.