seismicrna.fold package
Submodules
- seismicrna.fold.datapath.run_datapath()
Guess the DATAPATH for RNAstructure.
- seismicrna.fold.load.find_ct_files(files: Iterable[str | Path])
Yield a file for each given file/directory of a table.
- seismicrna.fold.load.load_ct_structs(files: Iterable[str | Path])
Yield an RNA structure generator for each CT file.
- seismicrna.fold.main.fold_profile(table: MaskPositionTableLoader | ClusterPositionTableLoader, regions: list[Region], quantile: float, n_procs: int, **kwargs)
Fold an RNA molecule from one table of reactivities.
- seismicrna.fold.main.fold_region(rna: RNAProfile, *, out_dir: Path, quantile: float, fold_temp: float, fold_constraint: Path | None, fold_md: int, fold_mfe: bool, fold_max: int, fold_percent: float, force: bool, n_procs: int, tmp_pfx, keep_tmp, **kwargs)
Fold a region of an RNA from one mutational profile.
- seismicrna.fold.main.load_foldable_tables(input_path: Iterable[str | Path])
Load tables that can be folded.
- seismicrna.fold.main.run(input_path: Iterable[str | Path], *, fold_coords: Iterable[tuple[str, int, int]] = (), fold_primers: Iterable[tuple[str, DNA, DNA]] = (), fold_regions_file: str | None = None, fold_full: bool = True, quantile: float = 0.0, fold_temp: float = 310.15, fold_constraint: str | None = None, fold_md: int = 0, fold_mfe: bool = False, fold_max: int = 20, fold_percent: float = 20.0, tmp_pfx: str | Path = './tmp', keep_tmp: bool = False, max_procs: int = 4, force: bool = False)
Predict RNA secondary structures using mutation rates.
- Parameters:
fold_coords (
Iterable
) – Fold a region of a reference given its 5’ and 3’ end coordinates [keyword-only, default: ()]fold_primers (
Iterable
) – Fold a region of a reference given its forward and reverse primers [keyword-only, default: ()]fold_regions_file (
str | None
) – Fold regions of references from coordinates/primers in a CSV file [keyword-only, default: None]fold_full (
bool
) – If no regions are specified, whether to default to the full region or to the table’s region [keyword-only, default: True]quantile (
float
) – Normalize and winsorize ratios to this quantile (0.0 disables) [keyword-only, default: 0.0]fold_temp (
float
) – Predict structures at this temperature (Kelvin) [keyword-only, default: 310.15]fold_constraint (
str | None
) – Force bases to be paired/unpaired from a file of constraints [keyword-only, default: None]fold_md (
int
) – Limit base pair distances to this number of bases (0 for no limit) [keyword-only, default: 0]fold_mfe (
bool
) – Predict only the minimum free energy (MFE) structure [keyword-only, default: False]fold_max (
int
) – Output at most this many structures (overriden by –fold-mfe) [keyword-only, default: 20]fold_percent (
float
) – Stop outputting structures when the % difference in energy exceeds this value (overriden by –fold-mfe) [keyword-only, default: 20.0]tmp_pfx (
str | pathlib._local.Path
) – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp’]keep_tmp (
bool
) – Keep temporary files after finishing [keyword-only, default: False]max_procs (
int
) – Run up to this many processes simultaneously [keyword-only, default: 4]force (
bool
) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]
- class seismicrna.fold.report.FoldReport(**kwargs: Any | Callable[[Report], Any])
Bases:
Report
- classmethod auto_fields()
Names and automatic values of selected fields.
- classmethod dir_seg_types()
Types of the directory segments in the path.
- classmethod fields()
All fields of the report.
- classmethod file_seg_type()
Type of the last segment in the path.
Wrapper around RNAstructure from the Mathews Lab at U of Rochester: https://rna.urmc.rochester.edu/RNAstructure.html
- exception seismicrna.fold.rnastructure.ConnectivityTableAlreadyRetitledError
Bases:
RuntimeError
A CT file was already retitled.
- exception seismicrna.fold.rnastructure.RNAStructureConnectivityTableTitleLineFormatError
Bases:
ValueError
Error in the format of a CT title line from RNAStructure.
- seismicrna.fold.rnastructure.check_data_path(data_path: str | Path | None = None) Path
Confirm the DATAPATH environment variable indicates the correct directory.
- seismicrna.fold.rnastructure.fold(rna: RNAProfile, *, fold_temp: float = 310.15, fold_constraint: Path | None = None, fold_md: int = 0, fold_mfe: bool = False, fold_max: int = 20, fold_percent: float = 20.0, out_dir: Path = './out', tmp_dir: Path, keep_tmp: bool = False, n_procs: int = 4)
Run the ‘Fold’ or ‘Fold-smp’ program of RNAstructure.
- Parameters:
fold_temp (
float
) – Predict structures at this temperature (Kelvin) [keyword-only, default: 310.15]fold_constraint (
pathlib._local.Path | None
) – Force bases to be paired/unpaired from a file of constraints [keyword-only, default: None]fold_md (
int
) – Limit base pair distances to this number of bases (0 for no limit) [keyword-only, default: 0]fold_mfe (
bool
) – Predict only the minimum free energy (MFE) structure [keyword-only, default: False]fold_max (
int
) – Output at most this many structures (overriden by –fold-mfe) [keyword-only, default: 20]fold_percent (
float
) – Stop outputting structures when the % difference in energy exceeds this value (overriden by –fold-mfe) [keyword-only, default: 20.0]out_dir (
Path
) – Write all output files to this directory [keyword-only, default: ‘./out’]keep_tmp (
bool
) – Keep temporary files after finishing [keyword-only, default: False]
- seismicrna.fold.rnastructure.format_retitled_ct_line(length: int, ref: str, uniqid: int, energy: float)
Format a new CT title line including unique identifiers:
{length} {ref} #{uniqid}: {energy}
where {length} is the number of positions in the structure (required for all CT files), {ref} is the name of the reference, {uniqid} is the unique identifier, and {energy} is the free energy of folding.
- seismicrna.fold.rnastructure.guess_data_path()
Guess the DATAPATH.
- seismicrna.fold.rnastructure.make_fold_cmd(fasta_file: Path, ct_file: Path, *, dms_file: Path | None, fold_constraint: Path | None, fold_temp: float, fold_md: int, fold_mfe: bool, fold_max: int, fold_percent: float, n_procs: int = 1)
- seismicrna.fold.rnastructure.parse_energy(line: str)
Parse the predicted free energy of folding from a line in format
{length} {ref} #{uniqid}: {energy}
where {length} is the number of positions in the structure (required for all CT files), {ref} is the name of the reference, {uniqid} is the unique identifier, and {energy} is the free energy of folding.
- seismicrna.fold.rnastructure.parse_rnastructure_ct_title(line: str)
Parse a title in a CT file from RNAstructure, in this format:
{length} ENERGY = {energy} {ref}
where {length} is the number of positions in the structure, {ref} is the name of the reference, and {energy} is the predicted free energy of folding. Also handle the edge case when RNAstructure predicts no base pairs (and thus does not write the free energy) by returning 0.
- seismicrna.fold.rnastructure.require_data_path()
Return an error message if the DATAPATH is not valid.
- seismicrna.fold.rnastructure.retitle_ct(ct_input: Path, ct_output: Path, force: bool = False)
Retitle the structures in a CT file produced by RNAstructure.
The default titles follow this format:
ENERGY = {energy} {reference}
where {reference} is the name of the reference sequence and {energy} is the predicted free energy of folding.
The major problem with this format is that structures can have equal predicted free energies, so the titles of the structures can repeat, which would cause some functions (e.g. graphing ROC curves) to fail.
This function assigns a unique integer to each structure (starting with 0 for the minimum free energy and continuing upwards), which ensures that no two structures have identical titles.
- Parameters:
ct_input (
Path
) – Path of the CT file to retitle.ct_output (
Path
) – Path of the CT file to which to write the retitled information.force (
bool = False
) – Overwrite the output CT file if it already exists.