seismicrna.fold package
Submodules
- seismicrna.fold.datapath.run_datapath()
Guess the DATAPATH for RNAstructure.
- seismicrna.fold.main.fold_region(rna: RNAFoldProfile, *, out_dir: Path, branch: str, fold_vienna: bool, fold_constraint: Path | None, fold_commands: Path | None, fold_md: int, fold_mfe: bool, fold_max: int, fold_percent: float, pseudoenergy_all: bool, force: bool, keep_tmp: bool, num_cpus: int, tmp_pfx)
Fold a region of an RNA from one mutational profile.
- seismicrna.fold.main.fold_table(table: MaskPositionTableLoader | ClusterPositionTableLoader, regions: list[Region], fold_temp: float, fold_fpaired: float, fold_mu_eps: float, num_cpus: int, **kwargs)
Fold an RNA molecule from one table of reactivities.
- seismicrna.fold.main.load_foldable_tables(input_path: Iterable[str | Path], **kwargs)
Load tables that can be folded.
- seismicrna.fold.main.run(input_path: Iterable[str | Path], *, branch: str = '', fold_coords: Iterable[tuple[str, int, int]] = (), fold_primers: Iterable[tuple[str, DNA, DNA]] = (), fold_regions_file: str | None = None, fold_full: bool = True, fold_vienna: bool = False, fold_temp: float = 310.15, fold_fpaired: float = 0.5, fold_mu_eps: float = 0.005, fold_constraint: str | None = None, fold_commands: str | None = None, fold_md: int = 0, fold_mfe: bool = False, fold_max: int = 20, fold_percent: float = 20.0, pseudoenergy_all: bool = True, tmp_pfx: str | Path = './tmp', keep_tmp: bool = False, verify_times: bool = True, num_cpus: int = 4, force: bool = False)
Predict RNA secondary structures using mutation rates.
- Parameters:
branch (
str
) – Create a new branch of the workflow with this name [keyword-only, default: ‘’]fold_coords (
Iterable
) – Fold a region of a reference given its 5’ and 3’ end coordinates [keyword-only, default: ()]fold_primers (
Iterable
) – Fold a region of a reference given its forward and reverse primers [keyword-only, default: ()]fold_regions_file (
str | None
) – Fold regions of references from coordinates/primers in a CSV file [keyword-only, default: None]fold_full (
bool
) – If no regions are specified, whether to default to the full region or to the table’s region [keyword-only, default: True]fold_vienna (
bool
) – Use RNAfold from ViennaRNA as the folding engine [keyword-only, default: False]fold_temp (
float
) – Predict structures at this temperature (Kelvin) [keyword-only, default: 310.15]fold_fpaired (
float
) – Scale mutation rates assuming this is the fraction of paired bases [keyword-only, default: 0.5]fold_mu_eps (
float
) – Clip folding mutation rates to [eps, 1 - eps] to avoid division by 0 [keyword-only, default: 0.005]fold_constraint (
str | None
) – Force bases to be paired/unpaired from a file of constraints [keyword-only, default: None]fold_commands (
str | None
) – Command file for ViennaRNA [keyword-only, default: None]fold_md (
int
) – Limit base pair distances to this number of bases (0 for no limit) [keyword-only, default: 0]fold_mfe (
bool
) – Predict only the minimum free energy (MFE) structure [keyword-only, default: False]fold_max (
int
) – Output at most this many structures (overriden by –fold-mfe) [keyword-only, default: 20]fold_percent (
float
) – Stop outputting structures when the % difference in energy exceeds this value (overriden by –fold-mfe) [keyword-only, default: 20.0]pseudoenergy_all (
bool
) – Apply pseudoenergy constraints from chemical probing data to all base pairs or only stacked base pairs. –pseudoenergy-stacked requires –fold-vienna [keyword-only, default: True]tmp_pfx (
str | pathlib._local.Path
) – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp’]keep_tmp (
bool
) – Keep temporary files after finishing [keyword-only, default: False]verify_times (
bool
) – Verify that report files from later steps have later timestamps [keyword-only, default: True]num_cpus (
int
) – Use up to this many CPUs simultaneously [keyword-only, default: 4]force (
bool
) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]
- class seismicrna.fold.profile.RNAFoldProfile(*, fold_temp: float | int, fold_fpaired: float | int, mu_eps: float | int, **kwargs)
Bases:
RNAProfile
- classmethod from_profile(profile: RNAProfile, **kwargs)
Make an RNAFoldProfile from an RNAProfile.
- property intercept
Intercept parameter (kcal/mol) for structure prediction.
- property mus_clipped
Mutation rates after clipping to [mu_eps, 1 - mu_eps].
- property pseudoenergies
Pseudoenergies (kcal/mol) for structure prediction.
- property pseudomus
Pseudo-mutation rates for structure prediction.
- property shape_method
shapeMethod string for ViennaRNA. Slope and intercept are halved to avoid double counting
- property slope
Slope parameter (kcal/mol) for structure prediction.
- class seismicrna.fold.report.FoldReport(**kwargs: Any | Callable[[Report], Any])
-
- classmethod get_checksum_report_fields()
Checksum fields of the report.
- classmethod get_file_seg_type()
Type of the last segment in the path.
- classmethod get_param_report_fields()
Parameter fields of the report.
Wrapper around RNAstructure from the Mathews Lab at the University of Rochester: https://rna.urmc.rochester.edu/RNAstructure.html
- exception seismicrna.fold.rnastructure.ConnectivityTableAlreadyRetitledError
Bases:
RuntimeError
A CT file was already retitled.
- exception seismicrna.fold.rnastructure.RNAStructureConnectivityTableTitleLineFormatError
Bases:
ValueError
Error in the format of a CT title line from RNAStructure.
- seismicrna.fold.rnastructure.check_data_path(data_path: str | Path | None = None) Path
Confirm the DATAPATH environment variable indicates the correct directory.
- seismicrna.fold.rnastructure.fold(rna: RNAFoldProfile, *, branch: str = '', fold_constraint: Path | None = None, fold_md: int = 0, fold_mfe: bool = False, fold_max: int = 20, fold_percent: float = 20.0, out_dir: Path = './out', tmp_dir: Path, keep_tmp: bool = False, num_cpus: int = 4)
Run the ‘Fold’ or ‘Fold-smp’ program of RNAstructure.
- Parameters:
branch (
str
) – Create a new branch of the workflow with this name [keyword-only, default: ‘’]fold_constraint (
pathlib._local.Path | None
) – Force bases to be paired/unpaired from a file of constraints [keyword-only, default: None]fold_md (
int
) – Limit base pair distances to this number of bases (0 for no limit) [keyword-only, default: 0]fold_mfe (
bool
) – Predict only the minimum free energy (MFE) structure [keyword-only, default: False]fold_max (
int
) – Output at most this many structures (overriden by –fold-mfe) [keyword-only, default: 20]fold_percent (
float
) – Stop outputting structures when the % difference in energy exceeds this value (overriden by –fold-mfe) [keyword-only, default: 20.0]out_dir (
Path
) – Write all output files to this directory [keyword-only, default: ‘./out’]keep_tmp (
bool
) – Keep temporary files after finishing [keyword-only, default: False]num_cpus (
int
) – Use up to this many CPUs simultaneously [keyword-only, default: 4]
- seismicrna.fold.rnastructure.format_retitled_ct_line(length: int, ref: str, uniqid: int, energy: float)
Format a new CT title line including unique identifiers:
{length} {ref} #{uniqid}: {energy}
where {length} is the number of positions in the structure (required for all CT files), {ref} is the name of the reference, {uniqid} is the unique identifier, and {energy} is the free energy of folding.
- seismicrna.fold.rnastructure.guess_data_path()
Guess the DATAPATH.
- seismicrna.fold.rnastructure.make_fold_cmd(fasta_file: Path, ct_file: Path, *, fold_constraint: Path | None = None, shape_file: Path | None = None, shape_intercept: float | None = None, shape_slope: float | None = None, fold_temp: float | None = None, fold_md: int = 0, fold_mfe: bool = False, fold_max: int = 0, fold_percent: float = 0.0, num_cpus: int = 1)
- seismicrna.fold.rnastructure.parse_energy(line: str)
Parse the predicted free energy of folding from a line in format
{length} {ref} #{uniqid}: {energy}
where {length} is the number of positions in the structure (required for all CT files), {ref} is the name of the reference, {uniqid} is the unique identifier, and {energy} is the free energy of folding.
- seismicrna.fold.rnastructure.parse_rnastructure_ct_title(line: str)
Parse a title in a CT file from RNAstructure, in this format:
{length} ENERGY = {energy} {ref}
where {length} is the number of positions in the structure, {ref} is the name of the reference, and {energy} is the predicted free energy of folding. Also handle the edge case when RNAstructure predicts no base pairs (and thus does not write the free energy) by returning 0.
- seismicrna.fold.rnastructure.require_data_path()
Return an error message if the DATAPATH is not valid.
- seismicrna.fold.rnastructure.retitle_ct(ct_input: Path, ct_output: Path, force: bool = False)
Retitle the structures in a CT file produced by RNAstructure.
The default titles follow this format:
ENERGY = {energy} {reference}
where {reference} is the name of the reference sequence and {energy} is the predicted free energy of folding.
The major problem with this format is that structures can have equal predicted free energies, so the titles of the structures can repeat, which would cause some functions (e.g. graphing ROC curves) to fail.
This function assigns a unique integer to each structure (starting with 0 for the minimum free energy and continuing upwards), which ensures that no two structures have identical titles.
- Parameters:
ct_input (
Path
) – Path of the CT file to retitle.ct_output (
Path
) – Path of the CT file to which to write the retitled information.force (
bool = False
) – Overwrite the output CT file if it already exists.
Wrapper around ViennaRNA from Lorenz and Hofacker at the University of Vienna: https://www.tbi.univie.ac.at/RNA/
- seismicrna.fold.viennarna.calc_bp_pseudoenergy(seq_len: int, pseudoenergies: Series, out_file: Path)
Identify (i, j) pairs, with j >= i + 3, where the sum of pseudoenergies is non-zero, and write them in ViennaRNA command format:
“E i j 1 <bp_pseudoenergy>”
- Parameters:
seq_len – The total length of the sequence. Positions are assumed to be numbered 1..seq_len.
pseudoenergies – A pandas Series of pseudoenergies, indexed with multiindex (Position, Base).
out_file – Path to the file where the results will be written.
- seismicrna.fold.viennarna.extract_energies(vienna_input: Path, db_output: Path, force: bool = False)
Extract the free energies from a vienna file and prepend them to the reference name.
The title will follow this format:
ENERGY = {energy} {reference}
where {reference} is the name of the reference sequence and {energy} is the predicted free energy of folding.
- Parameters:
vienna_input (
Path
) – Path of the vienna file to extract energies from.db_output (
Path
) – Path of the DB file to which to write the extracted information.force (
bool = False
) – Overwrite the output DB file if it already exists.
- seismicrna.fold.viennarna.get_subopt(subopt_out: Path, db_target: Path)
Extract suboptimal structures from the output of RNAsubopt and add them to a vienna file.
- seismicrna.fold.viennarna.make_fold_cmd(fasta_file: Path, vienna_file: Path, *, fold_delta_e: float, dms_file: Path | None, shape_method: str | None, fold_constraint: Path | None, fold_commands: Path | None, fold_temp: float, fold_md: int, fold_mfe: bool, num_cpus: int = 1, **kwargs)
- seismicrna.fold.viennarna.rnafold(rna: RNAFoldProfile, *, branch: str = '', fold_constraint: Path | None = None, fold_commands: Path | None = None, fold_md: int = 0, fold_mfe: bool = False, fold_max: int = 20, fold_percent: float = 20.0, pseudoenergy_all: bool = True, out_dir: Path = './out', tmp_dir: Path, keep_tmp: bool = False, num_cpus: int = 4)
Run the ‘RNAFold’ or ‘RNAsubopt’ program of ViennaRNA.
- Parameters:
branch (
str
) – Create a new branch of the workflow with this name [keyword-only, default: ‘’]fold_constraint (
pathlib._local.Path | None
) – Force bases to be paired/unpaired from a file of constraints [keyword-only, default: None]fold_commands (
pathlib._local.Path | None
) – Command file for ViennaRNA [keyword-only, default: None]fold_md (
int
) – Limit base pair distances to this number of bases (0 for no limit) [keyword-only, default: 0]fold_mfe (
bool
) – Predict only the minimum free energy (MFE) structure [keyword-only, default: False]fold_max (
int
) – Output at most this many structures (overriden by –fold-mfe) [keyword-only, default: 20]fold_percent (
float
) – Stop outputting structures when the % difference in energy exceeds this value (overriden by –fold-mfe) [keyword-only, default: 20.0]pseudoenergy_all (
bool
) – Apply pseudoenergy constraints from chemical probing data to all base pairs or only stacked base pairs. –pseudoenergy-stacked requires –fold-vienna [keyword-only, default: True]out_dir (
Path
) – Write all output files to this directory [keyword-only, default: ‘./out’]keep_tmp (
bool
) – Keep temporary files after finishing [keyword-only, default: False]num_cpus (
int
) – Use up to this many CPUs simultaneously [keyword-only, default: 4]