seismicrna.fold package
Submodules
- seismicrna.fold.datapath.run_datapath()
Guess the DATAPATH for RNAstructure.
- seismicrna.fold.main.fold_region(rna: RNAFoldProfile, *, out_dir: Path, branch: str, fold_dry_run: bool, fold_backend: str, fold_constraint: Path | None, fold_commands: Path | None, eddy_prior_paired_file: Path | None, eddy_prior_unpaired_file: Path | None, fold_md: int, fold_mfe: bool, fold_max: int, fold_percent: float, fold_isolated: bool, force: bool, keep_tmp: bool, num_cpus: int, tmp_pfx)
Fold a region of an RNA from one mutational profile.
- seismicrna.fold.main.fold_table(table: MaskPositionTableLoader | ClusterPositionTableLoader, regions: list[Region], fold_temp: float, fold_energy_method: str, fold_quantile: float, deigan_slope: float, deigan_intercept: float, eddy_prior_paired_file: Path | None, eddy_prior_unpaired_file: Path | None, num_cpus: int, keep_tmp: bool, fold_dry_run: bool, fold_backend: str, **kwargs)
Fold an RNA molecule from one table of reactivities.
- seismicrna.fold.main.load_foldable_tables(input_path: Iterable[str | Path], **kwargs)
Load tables that can be folded.
- seismicrna.fold.main.run(input_path: Iterable[str | Path] = Sentinel.UNSET, *, branch: str = '', fold_coords: Iterable[tuple[str, int, int]] = (), fold_primers: Iterable[tuple[str, DNA, DNA]] = (), fold_regions_file: str | None = None, fold_full: bool = True, fold_dry_run: bool = False, fold_backend: str = 'auto', fold_energy_method: str = 'auto', deigan_slope: float = 1.8, deigan_intercept: float = -0.6, fold_temp: float = 37.0, fold_quantile: float = 0.95, fold_constraint: str | None = None, fold_commands: str | None = None, eddy_prior_paired_file: str | None = None, eddy_prior_unpaired_file: str | None = None, fold_md: int = 0, fold_mfe: bool = False, fold_max: int = 20, fold_percent: float = 20.0, fold_isolated: bool = False, tmp_pfx: str | Path = './tmp', keep_tmp: bool = False, verify_times: bool = True, num_cpus: int = 4, force: bool = False)
Predict RNA secondary structures using mutation rates.
- Parameters:
branch (
str) – Create a new branch of the workflow with this name [keyword-only, default: ‘’]fold_coords (
Iterable) – Fold a region of a reference given its 5’ and 3’ end coordinates [keyword-only, default: ()]fold_primers (
Iterable) – Fold a region of a reference given its forward and reverse primers [keyword-only, default: ()]fold_regions_file (
str | None) – Fold regions of references from coordinates/primers in a CSV file [keyword-only, default: None]fold_full (
bool) – If no regions are specified, whether to default to the full region or to the table’s region [keyword-only, default: True]fold_dry_run (
bool) – Only generate the fold command and input files; do not run folding [keyword-only, default: False]fold_backend (
str) – Model RNA structures using Fold (RNAstructure), ShapeKnots (RNAstructure), or RNAfold (ViennaRNA); auto selects Fold for DMS and RNAFold for other probes [keyword-only, default: ‘auto’]fold_energy_method (
str) – Use this method to incorporate reactivities into folding energies. auto selects Cordero for DMS and Eddy for other probes; Eddy requires –fold-backend=RNAFold; Cordero requires –fold-backend=Fold or ShapeKnots [keyword-only, default: ‘auto’]deigan_slope (
float) – Slope (kcal/mol) for SHAPE reactivities using Deigan method; used only with –fold-energy-method=Deigan [keyword-only, default: 1.8]deigan_intercept (
float) – Intercept (kcal/mol) for SHAPE reactivities using Deigan method; used only with –fold-energy-method=Deigan [keyword-only, default: -0.6]fold_temp (
float) – Predict structures at this temperature (Celsius) [keyword-only, default: 37.0]fold_quantile (
float) – Normalize and winsorize reactivities to this quantile for folding [keyword-only, default: 0.95]fold_constraint (
str | None) – Force bases to be paired/unpaired from a file of constraints [keyword-only, default: None]fold_commands (
str | None) – Command file for RNAFold [keyword-only, default: None]eddy_prior_paired_file (
str | None) – File of per-position prior probabilities of being paired for the Eddy method (passed as –sp-data with –sp-strategy Pp); only used with –fold-energy-method=Eddy and –fold-backend=RNAFold [keyword-only, default: None]eddy_prior_unpaired_file (
str | None) – File of per-position prior probabilities of being unpaired for the Eddy method (passed as –sp-data with –sp-strategy Pu); only used with –fold-energy-method=Eddy and –fold-backend=RNAFold [keyword-only, default: None]fold_md (
int) – Limit base pair distances to this number of bases (0 for no limit) [keyword-only, default: 0]fold_mfe (
bool) – Predict only the minimum free energy (MFE) structure [keyword-only, default: False]fold_max (
int) – Output at most this many structures (overriden by –fold-mfe) [keyword-only, default: 20]fold_percent (
float) – Stop outputting structures when the % difference in energy exceeds this value (overriden by –fold-mfe) [keyword-only, default: 20.0]fold_isolated (
bool) – Allow isolated (non-stacked) base pairs when folding [keyword-only, default: False]tmp_pfx (
str | pathlib._local.Path) – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp’]keep_tmp (
bool) – Keep temporary files after finishing [keyword-only, default: False]verify_times (
bool) – Verify that report files from later steps have later timestamps [keyword-only, default: True]num_cpus (
int) – Use up to this many CPUs simultaneously [keyword-only, default: 4]force (
bool) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]
- class seismicrna.fold.profile.RNAFoldProfile(*, fold_temp: float | int, fold_energy_method: str, fold_quantile: float | int, deigan_slope: float | int, deigan_intercept: float | int, **kwargs)
Bases:
RNAProfile- property fold_temp_k
Folding temperature (Kelvin).
- classmethod from_profile(profile: RNAProfile, **kwargs)
Make an RNAFoldProfile from an RNAProfile.
- get_rnastructure_shape_args(top: Path, branch: str)
Get the SHAPE/DMS arguments for Fold/ShapeKnots.
- property mus_normalized
Mutation rates after normalizing and winsorizing.
- property rnafold_sp_strategy
–sp-strategy string for RNAFold.
- seismicrna.fold.profile.celsius_to_kelvin(temp_c: float | int)
Convert a temperature from Celsius to Kelvin.
- seismicrna.fold.profile.guess_temperature_to_celsius(temp: float | int)
Guess whether a temperature is in Celsius or Kelvin and return as Celsius.
- seismicrna.fold.profile.kelvin_to_celsius(temp_k: float | int)
Convert a temperature from Kelvin to Celsius.
- class seismicrna.fold.report.FoldReport(**kwargs: Any | Callable[[Report], Any])
-
- classmethod get_checksum_report_fields()
Checksum fields of the report.
- classmethod get_file_seg_type()
Type of the last segment in the path.
- classmethod get_param_report_fields()
Parameter fields of the report.
Wrapper around RNAstructure from the Mathews Lab at the University of Rochester: https://rna.urmc.rochester.edu/RNAstructure.html
- exception seismicrna.fold.rnastructure.ConnectivityTableAlreadyRetitledError
Bases:
RuntimeErrorA CT file was already retitled.
- exception seismicrna.fold.rnastructure.RNAStructureConnectivityTableTitleLineFormatError
Bases:
ValueErrorError in the format of a CT title line from RNAStructure.
- seismicrna.fold.rnastructure.check_data_path(data_path: str | Path | None = None) Path
Confirm the DATAPATH environment variable indicates the correct directory.
- seismicrna.fold.rnastructure.format_retitled_ct_line(length: int, ref: str, uniqid: int, energy: float)
Format a new CT title line including unique identifiers:
{length} {ref} #{uniqid}: {energy}
where {length} is the number of positions in the structure (required for all CT files), {ref} is the name of the reference, {uniqid} is the unique identifier, and {energy} is the free energy of folding.
- seismicrna.fold.rnastructure.guess_data_path()
Guess the DATAPATH.
- seismicrna.fold.rnastructure.make_rnastructure_cmd(fasta_file: Path, ct_file: Path, *, fold_backend: str, fold_constraint: Path | None, dms_file: Path | None, shape_file: Path | None, deigan_intercept: float | None, deigan_slope: float | None, fold_temp_k: float | None, fold_isolated: bool, fold_md: int, fold_mfe: bool, fold_max: int, fold_percent: float, num_cpus: int = 1)
Make a command for ‘Fold’, ‘Fold-smp’, or ‘ShapeKnots’.
- seismicrna.fold.rnastructure.parse_energy(line: str)
Parse the predicted free energy of folding from a line in format
{length} {ref} #{uniqid}: {energy}
where {length} is the number of positions in the structure (required for all CT files), {ref} is the name of the reference, {uniqid} is the unique identifier, and {energy} is the free energy of folding.
- seismicrna.fold.rnastructure.parse_rnastructure_ct_title(line: str)
Parse a title in a CT file from RNAstructure, in this format:
{length} ENERGY = {energy} {ref}
where {length} is the number of positions in the structure, {ref} is the name of the reference, and {energy} is the predicted free energy of folding. Also handle the edge case when RNAstructure predicts no base pairs (and thus does not write the free energy) by returning 0.
- seismicrna.fold.rnastructure.require_data_path()
Return an error message if the DATAPATH is not valid.
- seismicrna.fold.rnastructure.retitle_ct(ct_input: Path, ct_output: Path, force: bool = False)
Retitle the structures in a CT file produced by RNAstructure.
The default titles follow this format:
ENERGY = {energy} {reference}
where {reference} is the name of the reference sequence and {energy} is the predicted free energy of folding.
The major problem with this format is that structures can have equal predicted free energies, so the titles of the structures can repeat, which would cause some functions (e.g. graphing ROC curves) to fail.
This function assigns a unique integer to each structure (starting with 0 for the minimum free energy and continuing upwards), which ensures that no two structures have identical titles.
- Parameters:
ct_input (
Path) – Path of the CT file to retitle.ct_output (
Path) – Path of the CT file to which to write the retitled information.force (
bool = False) – Overwrite the output CT file if it already exists.
- seismicrna.fold.rnastructure.run_rnastructure(fasta_tmp: Path, ct_tmp: Path, ct_out: Path, *, fold_backend: str, fold_temp_k: float | None, dms_file: Path | None, shape_file: Path | None, deigan_slope: float | None, deigan_intercept: float | None, fold_constraint: Path | None, fold_isolated: bool, fold_md: int, fold_mfe: bool, fold_max: int, fold_percent: float, end5: int, num_cpus: int, fold_dry_run: bool = False)
Run Fold/ShapeKnots on pre-built paths, retitle, and renumber.
Wrapper around ViennaRNA from Lorenz and Hofacker at the University of Vienna: https://www.tbi.univie.ac.at/RNA/
- seismicrna.fold.viennarna.extract_energies(vienna_input: Path, db_output: Path, force: bool = False)
Extract the free energies from a vienna file and prepend them to the reference name.
The title will follow this format:
ENERGY = {energy} {reference}
where {reference} is the name of the reference sequence and {energy} is the predicted free energy of folding.
- Parameters:
vienna_input (
Path) – Path of the vienna file to extract energies from.db_output (
Path) – Path of the DB file to which to write the extracted information.force (
bool = False) – Overwrite the output DB file if it already exists.
- seismicrna.fold.viennarna.get_subopt(subopt_out: Path, db_target: Path)
Extract suboptimal structures from the output of RNAsubopt and add them to a vienna file.
- seismicrna.fold.viennarna.make_rnafold_cmd(fasta_file: Path, vienna_file: Path, *, sp_data: Path | None, sp_strategy: str | None, eddy_prior_paired_file: Path | None, eddy_prior_unpaired_file: Path | None, fold_constraint: Path | None, fold_commands: Path | None, fold_temp_c: float, fold_isolated: bool, fold_md: int, fold_max: int, fold_mfe: bool, num_cpus: int = 1)
Build the shell command to run RNAfold (and optionally RNAsubopt).
- Parameters:
fasta_file (
Path) – Input FASTA file containing the RNA sequence.vienna_file (
Path) – Path prefix for the output vienna file (suffix is determined automatically based onfold_mfe).sp_data (
PathorNone) – File of per-position reactivity data for soft-constraints; None disables soft constraints.sp_strategy (
strorNone) – Soft-constraint strategy passed to--sp-strategy; None omits the flag.fold_constraint (
PathorNone) – Hard-constraint file passed to--constraint; None omits the flag.fold_commands (
PathorNone) – Commands file passed to--commands; None omits the flag.fold_temp_c (
float) – Folding temperature in degrees Celsius.fold_isolated (
bool) – If True, allow isolated base pairs; if False, pass--noLP.fold_md (
int) – Maximum base-pair span in nucleotides; 0 disables the limit.fold_max (
int) – Maximum number of structures to keep (passed tohead).fold_mfe (
bool) – If True, run only RNAfold (MFE structure); if False, also run RNAsubopt for suboptimal structures.num_cpus (
int, optional) – Number of threads for RNAfold (default 1).
- Returns:
A shell command string ready to be executed.
- Return type:
- seismicrna.fold.viennarna.run_rnafold(fasta_tmp: Path, ct_tmp: Path, ct_out: Path, vienna_tmp: Path, db_tmp: Path, *, sp_data: Path | None, sp_strategy: str | None, eddy_prior_paired_file: Path | None, eddy_prior_unpaired_file: Path | None, fold_constraint: Path | None, fold_commands: Path | None, fold_temp_c: float, fold_isolated: bool, fold_md: int, fold_max: int, fold_mfe: bool, end5: int, num_cpus: int, fold_dry_run: bool = False)
Run RNAfold/RNAsubopt on pre-built paths, convert to CT, retitle, and renumber.