seismicrna.core.table package

Submodules

class seismicrna.core.table.base.AbundanceTable

Bases: Table, ABC

Table of abundances.

property data: Series: Table’s data.

classmethod get_by_read(): Whether the table contains data for each read.

classmethod get_file_seg_type(): Type of the last segment in the path.

classmethod get_index_depth(): Number of columns in the index.

property proportions: Series: Proportion of each item.

class seismicrna.core.table.base.PositionTable

Bases: RelTypeTable, ABC

Table indexed by position.

MASK = 'pos-mask'

ci_count(confidence: float, **kwargs)

Confidence intervals of counts, under these simplifications:

Counts are independent of each other.
Counts follow binomial distributions.
Coverage counts are constant.

Parameters:

confidence (float) – Confidence level; must be in [0, 1).
**kwargs – Keyword arguments for fetch methods.

Returns:

Lower and upper bounds of the confidence interval.

Return type:

tuple[pandas.DataFrame, pandas.DataFrame]

ci_ratio(confidence: float, **kwargs)

Confidence intervals of ratios, under these simplifications:

Ratios are independent of each other.
Ratios follow beta distributions.
Coverage counts are constant.

Parameters:

confidence (float) – Confidence level; must be in [0, 1).
**kwargs – Keyword arguments for fetch methods.

Returns:

Lower and upper bounds of the confidence interval.

Return type:

tuple[pandas.DataFrame, pandas.DataFrame]

property end3

property end5

classmethod get_by_read(): Whether the table contains data for each read.

classmethod get_file_seg_type(): Type of the last segment in the path.

classmethod get_index_depth(): Number of columns in the index.

iter_profiles(*, regions: Iterable[Region] | None = None, rel: str = 'Mutated', k: int | None = None, clust: int | None = None): Yield RNA mutational profiles from the table.

property range

property range_int

property region: Region covered by the table.

resample(fraction: float = 1.0, *, exclude_masked: bool = False, seed: int | None = None, max_seed: int = 4294967296)

Resample the reads and return a new DataFrame.

Parameters:

fraction (float = 1.) – Number of reads to resample, expressed as a fraction of the original number of reads. Must be ≥ 0; may be > 1.
exclude_masked (bool = False) – Exclude positions that have been masked.
seed (int | None = None) – Seed for the random number generator.
max_seed (int = 2 ** 32) – Maximum seed to pass to the next random number generator.

property seq

class seismicrna.core.table.base.ReadTable

Bases: RelTypeTable, ABC

Table indexed by read.

classmethod get_by_read(): Whether the table contains data for each read.

classmethod get_file_seg_type(): Type of the last segment in the path.

classmethod get_index_depth(): Number of columns in the index.

property reads

class seismicrna.core.table.base.RelTypeTable

Bases: Table, ABC

Table with multiple types of relationships.

property data: DataFrame: Table’s data.

fetch_count(*, exclude_masked: bool = False, squeeze: bool = False, **kwargs) → Series | DataFrame: Fetch counts of one or more columns.

fetch_ratio(*, exclude_masked: bool = False, squeeze: bool = False, precision: int | None = None, **kwargs) → Series | DataFrame: Fetch ratios of one or more columns.

classmethod get_header_rows() → list[int]: Row(s) of the file to use as the columns.

class seismicrna.core.table.base.Table

Bases: HasRefFilePath, ABC

Table base class.

property branches: dict[str, str]: Branches of the workflow.

property data: DataFrame | Series: Table’s data.

classmethod get_auto_path_fields(): Names and path fields that have automatic values.

abstractmethod classmethod get_by_read() → bool: Whether the table contains data for each read.

classmethod get_ext(): File extension.

classmethod get_header_depth()

abstractmethod classmethod get_header_type() → type[Header]: Type of the header for the table.

classmethod get_index_cols() → list[int]: Column(s) of the file to use as the index.

abstractmethod classmethod get_index_depth() → int: Number of columns in the index.

abstractmethod classmethod get_load_function() → LoadFunction: LoadFunction for all Dataset types for this Table.

property header: Header for the table’s data.

property path: Path of the table’s file.

property ref: str: Name of the table’s reference.

property refseq: DNA: Reference sequence.

property reg: str: Name of the table’s region.

property sample: str: Name of the table’s sample.

property top: Path: Path of the table’s output directory.

seismicrna.core.table.base.all_patterns(subpattern: RelPattern | None = None): Every RelPattern, keyed by its name.

seismicrna.core.table.base.get_pattern(rel: str): Get a RelPattern from the name of its relationship.

seismicrna.core.table.base.get_rel_name(rel_code: str): Get the name of a relationship from its code.

seismicrna.core.table.base.get_subpattern(rel: str, subpattern: RelPattern | None = None): Get a RelPattern, optionally with masking.

class seismicrna.core.table.load.PositionTableLoader(table_file: str | Path, **kwargs)

Bases: RelTypeTableLoader, PositionTable, ABC

Load data indexed by position.

class seismicrna.core.table.load.ReadTableLoader(table_file: str | Path, **kwargs)

Bases: RelTypeTableLoader, ReadTable, ABC

Load data indexed by read.

class seismicrna.core.table.load.RelTypeTableLoader(table_file: str | Path, **kwargs)

Bases: TableLoader, RelTypeTable, ABC

Load a table of relationship types.

property data: DataFrame: Table’s data.

class seismicrna.core.table.load.TableLoader(table_file: str | Path, **kwargs)

Bases: Table, ABC

Load a table from a file.

classmethod find_tables(paths: Iterable[str | Path]): Yield files of the tables within the given paths.

classmethod load_tables(paths: Iterable[str | Path], **kwargs): Yield tables within the given paths.

class seismicrna.core.table.write.AbundanceTableWriter(tabulator: Tabulator)

Bases: TableWriter, AbundanceTable, ABC

property data: Table’s data.

class seismicrna.core.table.write.BatchTabulator(*, get_batch_count_all: Callable, num_batches: int, num_cpus: int = 1, **kwargs)

Bases: Tabulator, ABC

Tabulator that accepts batches as the input data.

class seismicrna.core.table.write.CountTabulator(*, batch_counts: Iterable[tuple[Any, Any, Any, Any]], **kwargs)

Bases: Tabulator, ABC

Tabulator that accepts pre-counted data from batches.

class seismicrna.core.table.write.DatasetTabulator(*, dataset: MutsDataset, validate: bool = False, **kwargs)

Bases: BatchTabulator, ABC

Tabulator made from one dataset.

classmethod get_dataset_types(): Types of Dataset this Tabulator can process.

classmethod init_kws(): Attributes of the dataset to use as keyword arguments in super().__init__().

class seismicrna.core.table.write.PositionTableWriter(tabulator: Tabulator)

Bases: TableWriter, PositionTable, ABC

property data: Table’s data.

class seismicrna.core.table.write.ReadTableWriter(tabulator: Tabulator)

Bases: TableWriter, ReadTable, ABC

property data: Table’s data.

class seismicrna.core.table.write.TableWriter(tabulator: Tabulator)

Bases: Table, ABC

Write a table to a file.

write(force: bool): Write the table’s rounded data to the table’s CSV file.

class seismicrna.core.table.write.Tabulator(*, top: Path, branches: dict[str, str], sample: str, region: Region, count_ends: bool, count_pos: bool, count_read: bool, validate: bool = True)

Bases: ABC

Base class for tabulating data for one or more tables.

property counts_per_pos: Raw counts per position.

property counts_per_read: Raw counts per read.

property data_per_clust: Series | None: Series of per-cluster data (or None if no clusters).

property data_per_pos: DataFrame of per-position data.

property data_per_read: DataFrame of per-read data.

property end_counts: Raw counts for each pair of end coordinates.

generate_tables(*, pos: bool = True, read: bool = True, clust: bool = True): Generate the tables from this data.

classmethod get_load_function(): LoadFunction for all Dataset types for this Tabulator.

abstractmethod classmethod get_null_value() → int | float: The null value for a count: either 0 or NaN.

property num_reads: Raw number of reads.

property pos_header: Header of the per-position data.

property read_header: Header of the per-read data.

property ref: Name of the reference.

abstractmethod classmethod table_types() → list[type[TableWriter]]: Types of tables that this tabulator can write.

write_tables(*, force: bool = False, **kwargs)