Fold: Predict RNA secondary structures using mutation rates
Fold: Input files
Fold input file: Mask or Cluster positional table
You can give any number of positional table files of masked or clustered reads
(mask-per-pos.csv or clust-per-pos.csv, respectively) as inputs.
See List Input Files for ways to list multiple files.
(SEISMIC-RNA will not crash if you give other type of table files, such as a
relate-per-pos.csv or mask-per-read.csv.gz file, but will ignore them.)
To predict structures using the mutational profiles in all valid tables in the
directory {out}, you could use the command
seismic fold {out}
Fold: Settings
Fold setting: Choose a folding backend
seismic fold supports three folding backends, selected with
--fold-backend:
|
Program |
Package |
|---|---|---|
|
RNAstructure Fold |
RNAstructure (≥ 6.6) |
|
RNAstructure ShapeKnots |
RNAstructure (≥ 6.6) — predicts pseudoknots |
|
ViennaRNA RNAfold |
ViennaRNA (≥ 2.7.2) — see Appendix 1: Install the dependencies of SEISMIC-RNA with or without Conda |
All three backends accept normalized mutation rates as soft constraints to guide structure prediction (see Fold setting: Energy method). ShapeKnots is the only backend that can predict pseudoknots. RNAFold is the only backend that supports the Eddy energy method.
Fold setting: Energy method
Mutation rates are incorporated into the folding energy function as soft
constraints.
Use --fold-energy-method to choose the method:
Deigan(default): SHAPE pseudo-energy termm * log(reactivity + 1) + bwith slope--deigan-slope(default 1.8 kcal/mol) and intercept--deigan-intercept(default −0.6 kcal/mol). Works with all three backends: RNAstructure uses SHAPE-directed folding; ViennaRNA uses a soft-constraint file.Cordero: Hard partition of positions into paired/unpaired constraints based on a reactivity threshold. Requires--fold-backend Foldor--fold-backend ShapeKnots.Eddy: Uses ViennaRNA’s built-in soft constraint facility. Requires--fold-backend RNAFold.
Fold setting: Define regions
You can predict structures of the full reference sequences or specific regions. See Define Regions for ways to define regions.
Defining regions in seismic fold works identically to seismic mask but
accomplishes a very different purpose.
Regions in seismic fold determine for which parts of the reference sequence
to predict structures.
Regions in seismic mask determine for which parts of the reference sequence
to use mutational data.
SEISMIC-RNA allows these regions to be different.
There are several common scenarios:
The region you are folding matches the region for which you have data. For example, you could have mutationally profiled a full transcript and now want to predict the structure of the full transcript using the data from the full mutational profile.
You are folding a region that contains and is longer than the region for which you have data. For example, you could have mutationally profiled a short amplicon from a much longer transcript; and after clustering that amplicon, you want to model each alternative structure of the long transcript while using the short mutational profile of each cluster to guide the structure predictions.
You are folding a short region that is contained by a longer region for which you have mutational profiling data. For example, you could have mutationally profiled a full transcript and now want to predict the structure of a small part of the transcript that you are reasonably sure does not interact with any other part of the transcript.
Fold setting: Quantile for normalization
Folding requires that the mutation rates be normalized to the interval [0, 1].
Use --fold-quantile (default 0.95) to set the quantile to which reactivities
are normalized and winsorized before folding.
See Normalize Mutation Rates for more information on normalization.
Fold setting: RNAstructure parameters
seismic fold exposes several options for the RNAstructure Fold and
ShapeKnots programs (see the documentation for Fold for details on each
option).
Options marked with (†) are also honoured by the ViennaRNA RNAfold backend.
Option in |
Option in RNAstructure |
Brief explanation |
|---|---|---|
|
|
temperature (°C) of folding (default 37) |
|
|
optional folding constraints file with forced pairs/unpaired bases (†) |
|
|
maximum distance between paired bases (0 = no limit) |
|
|
predict only the optimal structure (same result as |
|
|
maximum number of structures to predict (ignored if using |
|
|
maximum % difference in free energy of predicted structures (ignored if using |
|
|
allow isolated (non-stacked) base pairs (default: disallowed) |
Fold setting: ViennaRNA-specific parameters
When --fold-backend RNAFold is selected, you may additionally pass a
commands file to RNAfold using --fold-commands.
This file is forwarded verbatim to RNAfold’s --commands option.
Fold setting: Dry run
Use --fold-dry-run to generate the input files and command for the folding
backend without actually running it.
This is useful for inspecting the soft-constraint data files or debugging the
command line before a long production run.
Use --fold-real-run (the default) to run folding normally.
Fold: Output files
All output files go into the directory {out}/{sample}/fold/{ref}/{reg},
where {out} is the output directory, {sample} is the sample, {ref}
is the reference, and {reg} is the region you folded (not that from
which the data came).
The files for each predicted structure are named {reg}__{profile}, where
{reg} is the region from which the data came (not that which you
folded) and {profile} is the mutational profile of those data, which can be
average (ensemble average) or cluster-{n}-{i} (where {n} is the
number of clusters and {i} is the cluster number).
Fold output file: Fold report
SEISMIC-RNA writes a report file, fold-report.json, to record the settings
you used for running the Fold step.
See Fold Report for more information.
Fold output file: Connectivity table
The primary output is a connectivity table file. For details on this format, see Connectivity Table (CT): RNA secondary structures.
Fold output file: Dot-bracket structure
The Fold step also outputs the structures in dot-bracket format, which you can copy-paste into RNA drawing software such as VARNA. For details on this format, see Dot-bracket (DB): RNA secondary structures.
Fold output file: VARNA color file
The Fold step also outputs the normalized mutation rates in VARNA color format, which you can import into the RNA drawing software VARNA. For details on this format, see VARNA Color: Color codes for VARNA.
Fold: Visualize structures in VARNA
VARNA is a third-party application for drawing RNA structures. To draw a structure from SEISMIC-RNA in VARNA:
Install (if needed) and launch VARNA.
Open your dot-bracket file (see Fold output file: Dot-bracket structure) in a text editor.
Right-click the drawing canvas, select “File” > “New…”, and copy-paste the sequence and dot-bracket structure.
Adjust the layout of the structure by clicking and dragging.
To color the bases by their mutation rates, right-click the drawing canvas, select “Display” > “Color map” > “Load values…”, copy-paste the path to your VARNA color file into the box or click “Choose file” and navigate to your VARNA color file, and click “OK” to load the file.
To customize the colors, select “Display” > “Color map” > “Style…”:
Drag a color bar to adjust its location.
Click the square below a color bar to change its color.
Click the X below the square to delete the color.
Click anywhere on the color spectrum to create a new color bar.
We recommend setting the color for missing data (-1) to white or light gray and using a continuous (not discrete) color scale for the mutation data.