seismicrna.export package

Submodules

seismicrna.export.main.run(input_path: Iterable[str | Path] = Sentinel.UNSET, *, samples_meta: str = None, refs_meta: str = None, all_pos: bool = True, force: bool = False, num_cpus: int = 4) list[Path]

Export a file of each sample for the seismic-graph web app.

Parameters:
  • samples_meta (str) – Add sample metadata from this CSV file to exported results [keyword-only, default: None]

  • refs_meta (str) – Add reference metadata from this CSV file to exported results [keyword-only, default: None]

  • all_pos (bool) – Export all positions (not just unmasked positions) [keyword-only, default: True]

  • force (bool) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]

  • num_cpus (int) – Use up to this many CPUs simultaneously [keyword-only, default: 4]

seismicrna.export.meta.combine_metadata(special_metadata: dict[str, Any], parsed_metadata: dict[Any, dict], item: Any, what: str = 'item')

Merge computed metadata with metadata parsed from a CSV file.

Parameters:
  • special_metadata (dict[str, Any]) – Metadata computed internally (e.g. sequence, read counts). These values take precedence over parsed_metadata values, and any key collision with a differing value raises an error.

  • parsed_metadata (dict[Any, dict]) – Mapping from item identifier to its parsed metadata dict, as returned by _parse_metadata.

  • item (Any) – The identifier used to look up the item in parsed_metadata (e.g. a sample name or reference name).

  • what (str, optional) – Human-readable label for the item type, used in log/error messages (default "item").

Returns:

Union of special_metadata and the parsed metadata for item. If item is not found in parsed_metadata, special_metadata is returned unchanged.

Return type:

dict[str, Any]

seismicrna.export.meta.parse_refs_metadata(file: Path)

Parse a CSV file of metadata for each reference.

Parameters:

file (Path) – CSV file of metadata for each reference

Returns:

Parsed metadata for each reference

Return type:

dict

seismicrna.export.meta.parse_samples_metadata(file: Path)

Parse a CSV file of metadata for each sample.

Parameters:

file (Path) – CSV file of metadata for each sample

Returns:

Parsed metadata for each sample

Return type:

dict

seismicrna.export.web.conform_series(series: Series | DataFrame)
seismicrna.export.web.export_sample(top_sample: tuple[Path, str], *args, force: bool, **kwargs)

Export data for one sample to a JSON file.

Parameters:
  • top_sample (tuple[Path, str]) – Pair of (top, sample) identifying the sample.

  • *args – Positional arguments forwarded to get_sample_data.

  • force (bool) – Whether to overwrite an existing JSON file.

  • **kwargs – Keyword arguments forwarded to get_sample_data.

Returns:

Path of the written JSON file.

Return type:

Path

seismicrna.export.web.format_metadata(metadata: dict[str, Any])

Prefix each key with the metadata symbol.

seismicrna.export.web.get_db_structs(table: PositionTable, k: int | None = None, clust: int | None = None)

Parse dot-bracket structures and free energies for a table.

Parameters:
  • table (PositionTable) – Position table whose profiles are used to locate dot-bracket files.

  • k (int or None, optional) – Number of clusters to select; None selects all.

  • clust (int or None, optional) – Cluster index to select; None selects all.

Returns:

Mapping of profile name to dot-bracket structure string, and mapping of profile name to minimum free energy value.

Return type:

tuple[dict, dict]

seismicrna.export.web.get_ref_metadata(top: Path, sample: str, ref: str, refs_metadata: dict[str, dict])

Build metadata dict for a reference sequence.

Parameters:
  • top (Path) – Top-level output directory.

  • sample (str) – Sample name.

  • ref (str) – Reference sequence name.

  • refs_metadata (dict[str, dict]) – Parsed per-reference metadata from a CSV file.

Returns:

Metadata including the reference sequence and number of aligned reads, merged with any additional parsed metadata.

Return type:

dict[str, Any]

seismicrna.export.web.get_reg_metadata(top: Path, sample: str, ref: str, reg: str, all_pos: bool)

Build metadata dict for a masked region.

Parameters:
  • top (Path) – Top-level output directory.

  • sample (str) – Sample name.

  • ref (str) – Reference sequence name.

  • reg (str) – Region name.

  • all_pos (bool) – If True, include all positions in the region; if False, include only unmasked positions.

Returns:

Metadata including 5’/3’ end coordinates and included positions.

Return type:

dict[str, Any]

seismicrna.export.web.get_sample_data(top: Path, sample: str, tables: list[Table], *, samples_metadata: dict[str, dict], refs_metadata: dict[str, dict], all_pos: bool)

Assemble the full nested data dict for one sample.

Parameters:
  • top (Path) – Top-level output directory.

  • sample (str) – Sample name.

  • tables (list[Table]) – Tables for the sample (consumed by this function via pop()).

  • samples_metadata (dict[str, dict]) – Parsed per-sample metadata from a CSV file.

  • refs_metadata (dict[str, dict]) – Parsed per-reference metadata from a CSV file.

  • all_pos (bool) – If True, include all positions; if False, include only unmasked positions.

Returns:

Nested dict suitable for JSON export, structured as {sample_meta..., ref: {ref_meta..., reg: {reg_meta..., clust: {counts, rates, ...}}}}.

Return type:

dict

seismicrna.export.web.get_sample_metadata(sample: str, samples_metadata: dict[str, dict])
seismicrna.export.web.get_table_data(table: Table, all_pos: bool)
seismicrna.export.web.iter_clust_table_data(table: AbundanceTable, k: int, clust: int)

Yield cluster abundance data (proportion) for one cluster.

Parameters:
  • table (AbundanceTable) – Abundance table from which to fetch data.

  • k (int) – Number of clusters.

  • clust (int) – Cluster index.

Yields:

tuple[str, float](CLUST_PROP, proportion) where proportion is the fraction of reads assigned to this cluster.

seismicrna.export.web.iter_pos_table_data(table: PositionTable, k: int, clust: int, all_pos: bool)

Yield all position-table data (series + structure) for one cluster.

Parameters:
  • table (PositionTable) – Position table from which to fetch data.

  • k (int) – Number of clusters.

  • clust (int) – Cluster index.

  • all_pos (bool) – If True, include masked positions; if False, exclude them.

Yields:

tuple[str, Any] – Key-value pairs for per-position counts, rates, structure, and free energy.

seismicrna.export.web.iter_pos_table_series(table: PositionTable, k: int, clust: int, all_pos: bool)

Yield per-position count and rate key-value pairs for one cluster.

Parameters:
  • table (PositionTable) – Position table from which to fetch data.

  • k (int) – Number of clusters.

  • clust (int) – Cluster index.

  • all_pos (bool) – If True, include masked positions; if False, exclude them.

Yields:

tuple[str, list] – Key-value pairs for each relationship count and the substitution rate as lists.

seismicrna.export.web.iter_pos_table_struct(table: PositionTable, k: int, clust: int)

Yield structure and free-energy key-value pairs for one cluster.

Parameters:
  • table (PositionTable) – Position table to look up structures for.

  • k (int) – Number of clusters.

  • clust (int) – Cluster index.

Yields:

tuple[str, Any](STRUCTURE, dot-bracket string) then (FREE_ENERGY, energy value), if a structure is available.

seismicrna.export.web.iter_read_table_data(table: ReadTable, k: int, clust: int)

Yield read-table data (substitution histogram) for one cluster.

Parameters:
  • table (ReadTable) – Read table from which to fetch data.

  • k (int) – Number of clusters.

  • clust (int) – Cluster index.

Yields:

tuple[str, list](SUBST_HIST, histogram) where the histogram is a list of read counts indexed by number of substitutions.

seismicrna.export.web.iter_table_data(table: Table, k: int, clust: int, all_pos: bool)

Yield all export data for a table of any type and one cluster.

Parameters:
  • table (Table) – Table from which to fetch data (PositionTable, ReadTable, or AbundanceTable).

  • k (int) – Number of clusters.

  • clust (int) – Cluster index.

  • all_pos (bool) – For PositionTable only: if True, include masked positions.

Yields:

tuple[str, Any] – Key-value pairs appropriate for the table type.