seismicrna.core package
Subpackages
- seismicrna.core.arg package
- seismicrna.core.batch package
- Subpackages
- Submodules
accumulate_batches()
accumulate_counts()
calc_count_per_pos()
calc_count_per_read()
calc_coverage()
calc_reads_per_pos()
calc_rels_per_pos()
calc_rels_per_read()
count_end_coords()
EndCoords
count_reads_segments()
find_contiguous_reads()
find_read_end3s()
find_read_end5s()
mask_segment_ends()
match_reads_segments()
merge_read_ends()
merge_segment_ends()
sanitize_segment_ends()
simulate_segment_ends()
sort_segment_ends()
count_base_types()
iter_base_types()
list_batch_nums()
MutsBatch
RegionMutsBatch
RegionMutsBatch.calc_min_mut_dist()
RegionMutsBatch.count_all()
RegionMutsBatch.count_per_pos()
RegionMutsBatch.count_per_read()
RegionMutsBatch.cover_per_pos
RegionMutsBatch.cover_per_read
RegionMutsBatch.iter_reads()
RegionMutsBatch.matrix
RegionMutsBatch.pos_index
RegionMutsBatch.reads_noclose_muts()
RegionMutsBatch.reads_per_pos()
RegionMutsBatch.rels_per_pos
RegionMutsBatch.rels_per_read
calc_muts_matrix()
sanitize_muts()
simulate_muts()
ReadBatch
- seismicrna.core.extern package
- seismicrna.core.io package
- seismicrna.core.mu package
- Subpackages
- Submodules
calc_arcsine_distance()
calc_coeff_determ()
calc_mean_arcsine_distance()
calc_pearson()
calc_spearman()
calc_sum_arcsine_distance()
compare_windows()
get_comp_func()
get_comp_method()
get_comp_name()
count_pos()
counts_pos()
counts_pos_consensus()
auto_reframe()
reframe()
reframe_like()
calc_gini()
calc_signal_noise()
any_nan()
auto_remove_nan()
auto_removes_nan()
no_nan()
remove_nan()
removes_nan()
calc_quantile()
calc_ranks()
normalize()
winsorize()
- seismicrna.core.ngs package
- Subpackages
- Submodules
decode_phred()
encode_phred()
DuplicateSampleReferenceError
calc_extra_threads()
collate_xam_cmd()
count_single_paired()
count_total_reads()
flagstat_cmd()
idxstats_cmd()
index_xam_cmd()
parse_flagstat()
parse_idxstats()
parse_ref_header()
ref_header_cmd()
sort_xam_cmd()
view_xam_cmd()
xam_paired()
xam_to_fastq_cmd()
- seismicrna.core.rel package
- Subpackages
- Submodules
HalfRelPattern
HalfRelPattern.a
HalfRelPattern.allc()
HalfRelPattern.as_fancy()
HalfRelPattern.as_match()
HalfRelPattern.as_plain()
HalfRelPattern.c
HalfRelPattern.codes
HalfRelPattern.compile()
HalfRelPattern.decompile()
HalfRelPattern.fits()
HalfRelPattern.fmt_fancy
HalfRelPattern.fmt_plain
HalfRelPattern.from_counts()
HalfRelPattern.from_report_format()
HalfRelPattern.g
HalfRelPattern.intersect()
HalfRelPattern.mut_bits
HalfRelPattern.muts()
HalfRelPattern.none()
HalfRelPattern.patterns
HalfRelPattern.ptrn_fancy
HalfRelPattern.ptrn_plain
HalfRelPattern.read_bases
HalfRelPattern.ref_bases
HalfRelPattern.refs()
HalfRelPattern.t
HalfRelPattern.to_report_format()
RelPattern
- seismicrna.core.rna package
- Subpackages
- Submodules
RNARegion
run_ct_to_db()
run_db_to_ct()
parse_ct()
format_db_structure()
parse_db()
parse_db_strings()
parse_db_structure()
ct_to_db()
db_to_ct()
find_ct_region()
from_ct()
from_db()
renumber_ct()
to_ct()
to_db()
dict_to_pairs()
dict_to_table()
find_enclosing_pairs()
find_root_pairs()
map_nested()
pairs_to_dict()
pairs_to_table()
renumber_pairs()
table_to_dict()
table_to_pairs()
RNAProfile
compute_auc()
compute_auc_roc()
compute_roc_curve()
compute_rolling_auc()
RNAState
RNAStructure
Rna2dPart
Rna2dStem
Rna2dStemLoop
RnaJunction
- seismicrna.core.seq package
- Subpackages
- Submodules
BadReferenceNameError
BadReferenceNameLineError
DuplicateReferenceNameError
MissingReferenceNameError
ReferenceNameError
extract_fasta_seqname()
format_fasta_name_line()
format_fasta_record()
format_fasta_seq_lines()
get_fasta_seq()
parse_fasta()
valid_fasta_seqname()
write_fasta()
RefSeqs
RefRegions
Region
Region.MASK_GU
Region.MASK_LIST
Region.MASK_POLYA
Region.add_mask()
Region.coord
Region.copy()
Region.get_mask()
Region.hyphen
Region.length
Region.mask_gu()
Region.mask_list()
Region.mask_names
Region.mask_polya()
Region.masked_bool
Region.masked_int
Region.masked_zero
Region.range
Region.range_int
Region.range_one
Region.ref_reg
Region.remove_mask()
Region.renumber_from()
Region.size
Region.subregion()
Region.to_dict()
Region.unmasked
Region.unmasked_bool
Region.unmasked_int
Region.unmasked_zero
RegionFinder
RegionTuple
get_coords_by_ref()
get_reg_coords_primers()
get_shared_index()
hyphenate_ends()
index_to_pos()
index_to_seq()
intersect()
iter_windows()
seq_pos_to_index()
unite()
verify_index_names()
window_to_margins()
CompressedSeq
DNA
RNA
XNA
XNA.__add__()
XNA.__bool__()
XNA.__contains__()
XNA.__eq__()
XNA.__getitem__()
XNA.__hash__()
XNA.__mul__()
XNA.__repr__()
XNA.alph()
XNA.array
XNA.compress()
XNA.four()
XNA.get_alphaset()
XNA.get_comp()
XNA.get_comptrans()
XNA.get_nonalphaset()
XNA.get_other_iupac()
XNA.get_pictrans()
XNA.kmers()
XNA.pict()
XNA.picto
XNA.random()
XNA.rc
XNA.t_or_u()
decompress()
expand_degenerate_seq()
- seismicrna.core.table package
- Submodules
AbundanceTable
PositionTable
ReadTable
RelTypeTable
Table
Table.build_path()
Table.by_read()
Table.data
Table.default_path_fields()
Table.ext()
Table.gzipped()
Table.header
Table.header_depth()
Table.header_type()
Table.index_cols()
Table.index_depth()
Table.kind()
Table.path
Table.path_fields
Table.path_segs()
Table.ref
Table.refseq
Table.reg
Table.sample
Table.top
all_patterns()
get_pattern()
get_rel_name()
get_subpattern()
AbundanceTableWriter
BatchTabulator
CountTabulator
DatasetTabulator
PositionTableWriter
ReadTableWriter
TableWriter
Tabulator
Tabulator.counts_per_pos
Tabulator.counts_per_read
Tabulator.data_per_clust
Tabulator.data_per_pos
Tabulator.data_per_read
Tabulator.end_counts
Tabulator.generate_tables()
Tabulator.get_null_value()
Tabulator.num_reads
Tabulator.pos_header
Tabulator.read_header
Tabulator.ref
Tabulator.table_types()
Tabulator.write_tables()
- Submodules
- seismicrna.core.tests package
- Submodules
TestCalcInverse
TestCalcInverse.test_calc_inverse()
TestCalcInverse.test_calc_inverse_fill_fwd()
TestCalcInverse.test_calc_inverse_fill_fwd_max()
TestCalcInverse.test_calc_inverse_fill_fwd_max_default()
TestCalcInverse.test_calc_inverse_fill_rev()
TestCalcInverse.test_calc_inverse_fill_rev_max()
TestCalcInverse.test_calc_inverse_fill_rev_max_default()
TestCalcInverse.test_calc_inverse_max()
TestCalcInverse.test_empty()
TestCalcInverse.test_empty_max()
TestCalcInverse.test_is_inverse()
TestCalcInverse.test_negative()
TestCalcInverse.test_repeated()
TestEnsureSameLength
TestFindDims
TestFindDims.test_0d()
TestFindDims.test_0d_1dim()
TestFindDims.test_0d_none()
TestFindDims.test_0d_nonzero()
TestFindDims.test_0d_nonzero_extra()
TestFindDims.test_1d()
TestFindDims.test_1d_0dim_none()
TestFindDims.test_1d_1d_crossed()
TestFindDims.test_1d_1d_separate()
TestFindDims.test_1d_1dim_none()
TestFindDims.test_1d_2d_congruent()
TestFindDims.test_1d_2d_crossed()
TestFindDims.test_1d_2dim()
TestFindDims.test_1d_2dim_none()
TestFindDims.test_1d_nonzero()
TestFindDims.test_2d()
TestFindDims.test_2d_1dim_none()
TestFindDims.test_2d_2d_congruent()
TestFindDims.test_2d_2d_crossed()
TestFindDims.test_2d_nonzero()
TestFindDims.test_empty()
TestFindDims.test_none_2d()
TestGetLength
TestLocateElements
TestTriangular
TestClustHeader
TestClustHeader.test_clustered()
TestClustHeader.test_clusts()
TestClustHeader.test_index()
TestClustHeader.test_iter_clust_indexes()
TestClustHeader.test_ks_invalid()
TestClustHeader.test_ks_valid()
TestClustHeader.test_level_keys()
TestClustHeader.test_level_names()
TestClustHeader.test_levels()
TestClustHeader.test_modified_ks()
TestClustHeader.test_modified_none()
TestClustHeader.test_modified_rels()
TestClustHeader.test_names()
TestClustHeader.test_num_levels()
TestClustHeader.test_select_clust()
TestClustHeader.test_select_clusts()
TestClustHeader.test_select_extra()
TestClustHeader.test_select_invalid_clust()
TestClustHeader.test_select_invalid_k()
TestClustHeader.test_select_k()
TestClustHeader.test_select_k_clust_empty()
TestClustHeader.test_select_k_clust_exist()
TestClustHeader.test_select_ks()
TestClustHeader.test_select_ks_clusts_exist()
TestClustHeader.test_select_none()
TestClustHeader.test_signature()
TestConstants
TestDeduplicateRels
TestEqualHeaders
TestFormatClustName
TestFormatClustNames
TestHeader
TestListClusts
TestListKClusts
TestListKsClusts
TestMakeHeader
TestParseHeader
TestParseHeader.test_clust()
TestParseHeader.test_empty()
TestParseHeader.test_extra_index_names()
TestParseHeader.test_extra_values()
TestParseHeader.test_invalid_numeric()
TestParseHeader.test_missing_index_names()
TestParseHeader.test_missing_values()
TestParseHeader.test_nonnumeric()
TestParseHeader.test_rel_index()
TestParseHeader.test_rel_index_invalid_name()
TestParseHeader.test_rel_index_repeated()
TestParseHeader.test_rel_index_valid_name()
TestParseHeader.test_rel_multiindex()
TestParseHeader.test_relclust()
TestRelClustHeader
TestRelClustHeader.test_clustered()
TestRelClustHeader.test_clusts()
TestRelClustHeader.test_index()
TestRelClustHeader.test_iter_clust_indexes()
TestRelClustHeader.test_ks()
TestRelClustHeader.test_level_keys()
TestRelClustHeader.test_level_names()
TestRelClustHeader.test_levels()
TestRelClustHeader.test_modified_all()
TestRelClustHeader.test_modified_ks()
TestRelClustHeader.test_modified_ks_empty()
TestRelClustHeader.test_modified_ks_none()
TestRelClustHeader.test_modified_none()
TestRelClustHeader.test_modified_nullified()
TestRelClustHeader.test_modified_rels()
TestRelClustHeader.test_modified_rels_empty()
TestRelClustHeader.test_modified_rels_none()
TestRelClustHeader.test_num_levels()
TestRelClustHeader.test_select_clust()
TestRelClustHeader.test_select_extra()
TestRelClustHeader.test_select_extra_emptystr()
TestRelClustHeader.test_select_extra_none()
TestRelClustHeader.test_select_extra_zero()
TestRelClustHeader.test_select_invalid_clust()
TestRelClustHeader.test_select_invalid_k()
TestRelClustHeader.test_select_invalid_rel()
TestRelClustHeader.test_select_k_clust_empty()
TestRelClustHeader.test_select_k_clust_exist()
TestRelClustHeader.test_select_ks()
TestRelClustHeader.test_select_none()
TestRelClustHeader.test_select_rel()
TestRelClustHeader.test_signature()
TestRelHeader
TestRelHeader.test_clustered()
TestRelHeader.test_clusts()
TestRelHeader.test_index()
TestRelHeader.test_iter_clust_indexes()
TestRelHeader.test_ks()
TestRelHeader.test_level_keys()
TestRelHeader.test_level_names()
TestRelHeader.test_levels()
TestRelHeader.test_modified_ks()
TestRelHeader.test_modified_none()
TestRelHeader.test_modified_rels()
TestRelHeader.test_modified_rels_empty()
TestRelHeader.test_modified_rels_none()
TestRelHeader.test_names()
TestRelHeader.test_num_levels()
TestRelHeader.test_rels_duplicated()
TestRelHeader.test_rels_empty()
TestRelHeader.test_rels_normal()
TestRelHeader.test_select_extra()
TestRelHeader.test_select_extra_zero()
TestRelHeader.test_select_invalid()
TestRelHeader.test_select_none()
TestRelHeader.test_select_one_rels()
TestRelHeader.test_select_rel()
TestRelHeader.test_select_two_rels()
TestRelHeader.test_signature()
TestRelHeader.test_size()
TestValidateKClust
TestValidateKClust.test_float_clust()
TestValidateKClust.test_float_k()
TestValidateKClust.test_negative_zero()
TestValidateKClust.test_one_zero_allowed()
TestValidateKClust.test_positive_positive()
TestValidateKClust.test_zero()
TestValidateKClust.test_zero_negative()
TestValidateKClust.test_zero_one_allowed()
TestValidateKs
TestEraseConfig
TestExcInfo
TestGetConfig
TestLevels
TestLoggerClass
TestLoggingRaiseOnError
TestRestoreConfig
TestSetConfig
TestGetSeismicRNASourceDir
TestSymlinkIfNeeded
TestStochasticRound
TestCalcBetaMV
TestCalcBetaParams
TestCalcDirichletMV
TestCalcDirichletParams
rand_dirichlet_alpha()
TestCalcPoolSize
TestAdjustMinGap
TestCalcPClust
TestCalcPClustGivenEndsNoClose
TestCalcPClustGivenNoClose
TestCalcPEnds
TestCalcPEndsGivenClustNoClose
TestCalcPEndsGivenNoClose
TestCalcPEndsObserved
TestCalcPMutGivenSpan
TestCalcPMutGivenSpanNoClose
TestCalcPNoClose
TestCalcPNoCloseGivenClust
TestCalcPNoCloseGivenEnds
TestCalcPNoCloseGivenEnds.test_clusters()
TestCalcPNoCloseGivenEnds.test_length_0()
TestCalcPNoCloseGivenEnds.test_length_1()
TestCalcPNoCloseGivenEnds.test_length_2_min_gap_1()
TestCalcPNoCloseGivenEnds.test_length_3_min_gap_1()
TestCalcPNoCloseGivenEnds.test_length_3_min_gap_2()
TestCalcPNoCloseGivenEnds.test_length_4_min_gap_1()
TestCalcPNoCloseGivenEnds.test_length_4_min_gap_2()
TestCalcPNoCloseGivenEnds.test_length_4_min_gap_3()
TestCalcPNoCloseGivenEnds.test_min_gap_0()
TestCalcPNoCloseGivenEndsAuto
TestCalcParams
TestCalcRectangularSum
TestClip
TestFindSplitPositions
TestFindSplitPositions.test_0()
TestFindSplitPositions.test_gap0()
TestFindSplitPositions.test_gap1_single_end3()
TestFindSplitPositions.test_gap1_single_end5()
TestFindSplitPositions.test_gap1_split0_quadruple()
TestFindSplitPositions.test_gap1_split1_double()
TestFindSplitPositions.test_gap1_split1_single_mid()
TestFindSplitPositions.test_gap1_split1_triple()
TestFindSplitPositions.test_gap1_split2()
TestFindSplitPositions.test_gap2_single_end3()
TestFindSplitPositions.test_gap2_single_end5()
TestFindSplitPositions.test_gap2_split0()
TestFindSplitPositions.test_gap2_split1()
TestFindSplitPositions.test_gap3_single_end3()
TestFindSplitPositions.test_gap3_single_end5()
TestFindSplitPositions.test_gap4_split0()
TestFindSplitPositions.test_gap4_split1()
TestFindSplitPositions.test_generic_split()
TestFindSplitPositions.test_thresh0()
TestFindSplitPositions.test_thresh1()
TestNoCloseMuts
TestNormalize
TestQuickUnbias
TestSlicePEnds
TestTriuAllClose
TestTriuCumSum
TestTriuDiv
TestTriuDot
TestTriuLog
TestTriuMul
TestTriuNorm
TestTriuNorm.compare()
TestTriuNorm.test_0x0()
TestTriuNorm.test_0x0x1()
TestTriuNorm.test_1x1()
TestTriuNorm.test_1x1x1()
TestTriuNorm.test_1x1x2()
TestTriuNorm.test_2x2()
TestTriuNorm.test_2x2_zero()
TestTriuNorm.test_2x2x1()
TestTriuNorm.test_2x2x2()
TestTriuNorm.test_2x2x2_zero()
TestTriuNorm.test_2x2x2x2()
TestTriuNorm.test_2x2x2x2_zero()
TestTriuSum
label_no_close_muts()
no_close_muts()
simulate_params()
simulate_reads()
TestConsistentVersion
TestFormatVersion
TestParseVersion
TestParseVersion.test_invalid_1()
TestParseVersion.test_invalid_2()
TestParseVersion.test_invalid_3()
TestParseVersion.test_invalid_4()
TestParseVersion.test_invalid_5()
TestParseVersion.test_parse_default()
TestParseVersion.test_parse_notag()
TestParseVersion.test_parse_prtag_letter()
TestParseVersion.test_parse_prtag_letters()
TestParseVersion.test_parse_prtag_letters_numbers()
- Submodules
Submodules
- seismicrna.core.array.calc_inverse(target: ndarray, require: int = -1, fill: bool = False, fill_rev: bool = False, fill_default: int | None = None, verify: bool = True, what: str = 'array')
Calculate the inverse of target, such that if element i of target has value x, then element x of the inverse has value i.
>>> list(calc_inverse(np.array([3, 2, 7, 5, 1]))) [-1, 4, 1, 0, -1, 3, -1, 2] >>> list(calc_inverse(np.arange(5))) [0, 1, 2, 3, 4]
- Parameters:
target (
np.ndarray
) – Target values; must be a 1-dimensional array of non-negative integers with no duplicate values.require (
int = -1
) – Require the inverse to contain all indexes up to and including require (i.e. that its length is at least require + 1); ignored if require is -1; must be ≥ -1.fill (
bool = False
) – Fill missing indexes (that do not appear in target).fill_rev (
bool = False
) – Fill missing indexes in reverse order instead of forward order; only used if fill is True.fill_default (
int | None = None
) – Value with which to fill before the first non-missing value has been encountered; if fill_rev is True, defaults to the length of target, otherwise to -1.verify (
bool = True
) – Verify that all target values are unique, non-negative integers. If this is incorrect, then if verify is True, then ValueError will be raised; and if False, then the results of this function will be incorrect. Always set to True unless you have already verified that target is unique, non-negative integers.what (str =
"array"
) – What to name the array (only used for error messages).
- Returns:
Inverse of target.
- Return type:
np.ndarray
- seismicrna.core.array.check_naturals(values: ndarray, what: str = 'values')
Raise ValueError if the values are not monotonically increasing natural numbers.
- seismicrna.core.array.ensure_order(array1: ndarray, array2: ndarray, what1: str = 'array1', what2: str = 'array2', gt_eq: bool = False)
Ensure that array1 is ≤ or ≥ array2, element-wise.
- Parameters:
array1 (
np.ndarray
) – Array 1 (same length as array2).array2 (
np.ndarray
) – Array 2 (same length as array1).what1 (str =
"array1"
) – What array1 contains (only used for error messages).what2 (str =
"array2"
) – What array2 contains (only used for error messages).gt_eq (
bool = False
) – Ensure array1 ≥ array2 if True, otherwise array1 ≤ array2.
- Returns:
Shared length of array1 and array2.
- Return type:
- seismicrna.core.array.ensure_same_length(arr1: ndarray, arr2: ndarray, what1: str = 'array1', what2: str = 'array2')
- seismicrna.core.array.find_dims(dims: Sequence[Sequence[str | None]], arrays: Sequence[ndarray], names: Sequence[str] | None = None, nonzero: Iterable[str] | bool = False)
Check the dimensions of the arrays.
- seismicrna.core.array.locate_elements(collection: ndarray, *elements: ndarray, what: str = 'collection', verify: bool = True)
Find the index at which each element of elements occurs in collection.
>>> list(locate_elements(np.array([4, 1, 2, 7, 5, 3]), np.array([5, 2, 5]))) [4, 2, 4]
- Parameters:
collection (
np.ndarray
) – Collection in which to find each element in elements; must be a 1-dimensional array of non-negative integers with no duplicate values.*elements (
np.ndarray
) – Elements to find; must be a 1-dimensional array that is a subset of collection, although duplicate values are permitted.what (str =
"collection"
) – What to name the collection (only used for error messages).verify (
bool = True
) – Verify that all values in collection are unique, non-negative integers and that all items in elements are in collections.
- Returns:
Index of each element of elements in collections.
- Return type:
np.ndarray
- seismicrna.core.array.sanitize_values(values: Iterable[int], lower_limit: int, upper_limit: int, whats: str = 'values')
Validate and sort values, and return them as an array.
- seismicrna.core.array.triangular(n: int)
The n th triangular number (n ≥ 0): number of items in an equilateral triangle with n items on each side.
- class seismicrna.core.dataset.Dataset(report_file: Path, verify_times: bool = True)
Bases:
ABC
Dataset comprising batches of data.
- property batch_nums
Numbers of the batches.
- property dir: Path
Directory containing the dataset.
- iter_batches()
Yield each batch.
- link_data_dirs_to_tmp(tmp_dir: Path)
Make links to a dataset in a temporary directory.
- property num_reads
Number of reads in the dataset.
- abstract property pattern: RelPattern | None
Pattern of mutations to count.
- property top: Path
Top-level directory of the dataset.
- exception seismicrna.core.dataset.FailedToLoadDatasetError
Bases:
RuntimeError
A batch failed to load.
- class seismicrna.core.dataset.LoadFunction(data_type: type[Dataset], /, *more_types: type[Dataset])
Bases:
object
Function to load a dataset.
- __call__(report_file: Path, **kwargs)
Load a dataset from the report file.
- property report_path_auto_fields
Automatic field values of the report file path.
- property report_path_seg_types
Segment types of the report file path.
- class seismicrna.core.dataset.LoadedDataset(report_file: Path, verify_times: bool = True)
-
Dataset created by loading directly from a Report.
- property data_dirs
All directories containing data for the dataset.
- get_batch(batch_num: int) ReadBatchIO | MutsBatchIO
Get a specific batch of data.
- abstract classmethod get_batch_type() type[ReadBatchIO | MutsBatchIO]
Type of batch.
- classmethod get_btype_name()
Name of the type of batch.
- abstract classmethod get_report_type() type[BatchedReport]
Type of report.
- property num_batches
Number of batches.
- property timestamp
Time at which the data were written.
- class seismicrna.core.dataset.MergedDataset(report_file: Path, verify_times: bool = True)
-
Dataset made by merging one or more constituent datasets.
- property data_dirs
All directories containing data for the dataset.
- abstract classmethod get_dataset_load_func() LoadFunction
Function to load one constituent dataset.
- property pattern
Pattern of mutations to count.
- property timestamp
Time at which the data were written.
- class seismicrna.core.dataset.MergedRegionDataset(report_file: Path, verify_times: bool = True)
Bases:
MergedDataset
,RegionDataset
,ABC
- property refseq
Sequence of the reference.
- class seismicrna.core.dataset.MergedUnbiasDataset(*args, masked_read_nums: dict[[<class 'int'>, <class 'list'>]] | None = None, **kwargs)
Bases:
MergedDataset
,UnbiasDataset
,ABC
MergedDataset with attributes for correcting observer bias.
- property min_mut_gap
Minimum gap between two mutations.
- property quick_unbias
Use the quick heuristic for unbiasing.
- property quick_unbias_thresh
Consider mutation rates less than or equal to this threshold to be 0 when using the quick heuristic for unbiasing.
- exception seismicrna.core.dataset.MissingBatchError
Bases:
RuntimeError
A dataset does not have a batch of a given type and number.
- exception seismicrna.core.dataset.MissingBatchTypeError
Bases:
MissingBatchError
A dataset does not have a batch of a given type.
- class seismicrna.core.dataset.MultistepDataset(dataset2_report_file: Path, **kwargs)
Bases:
MutsDataset
,ABC
Dataset made by integrating two datasets from different steps of the workflow.
- property data_dirs
All directories containing data for the dataset.
- abstract classmethod get_dataset1_load_func() LoadFunction
Function to load Dataset 1.
- classmethod get_dataset1_report_file(dataset2_report_file: Path)
Given the report file for Dataset 2, determine the report file for Dataset 1.
- classmethod get_dataset2_load_func()
Function to load Dataset 2.
- abstract classmethod get_dataset2_type() type[RegionDataset]
Type of Dataset 2.
- classmethod get_report_type()
Type of report.
- property num_batches
Number of batches.
- property refseq
Sequence of the reference.
- property timestamp
Time at which the data were written.
- class seismicrna.core.dataset.MutsDataset(report_file: Path, verify_times: bool = True)
Bases:
RegionDataset
,ABC
Dataset with a known region and explicit mutational data.
- abstract get_batch(batch_num: int) RegionMutsBatch
Get a specific batch of data.
- iter_batches()
Yield each batch.
- class seismicrna.core.dataset.RegionDataset(report_file: Path, verify_times: bool = True)
-
Dataset with a known reference sequence and region.
- property reflen
Length of the reference sequence.
- exception seismicrna.core.dataset.ReversedTimeStampError
Bases:
RuntimeError
A dataset has a timestamp that is earlier than a dataset that should have been written before it.
- class seismicrna.core.dataset.TallDataset(report_file: Path, verify_times: bool = True)
Bases:
MergedDataset
,ABC
Dataset made by vertically pooling other datasets from one or more samples aligned to the same reference sequence.
- property datasets
Constituent datasets that were merged.
- property num_batches
Number of batches.
- class seismicrna.core.dataset.UnbiasDataset(*args, masked_read_nums: dict[[<class 'int'>, <class 'list'>]] | None = None, **kwargs)
-
Dataset with attributes for correcting observer bias.
- class seismicrna.core.dataset.WideDataset(report_file: Path, verify_times: bool = True)
Bases:
MergedRegionDataset
,ABC
Dataset made by horizontally joining other datasets from one or more regions of the same reference sequence.
- property datasets
Constituent datasets that were merged.
- property num_batches
Number of batches.
- property region
Region of the dataset.
- property region_names
Names of all joined regions.
- class seismicrna.core.dataset.WideMutsDataset(report_file: Path, verify_times: bool = True)
Bases:
WideDataset
,MutsDataset
,ABC
WideDataset with mutation data.
- seismicrna.core.dataset.load_datasets(input_path: Iterable[str | Path], load_func: LoadFunction, **kwargs)
Yield a Dataset from each report file in input_path.
- Parameters:
input_path (
Iterable[str | Path]
) – Input paths to be searched recursively for report files.load_func (
LoadFunction
) – Function to load the dataset from each report file.
Generic Exceptions
- exception seismicrna.core.error.IncompatibleValuesError
Bases:
ValueError
Two or more values are individually valid, but their combination is not.
- exception seismicrna.core.error.InconsistentValueError
Bases:
ValueError
Two or more values differ when they should be equal.
- exception seismicrna.core.error.OutOfBoundsError
Bases:
ValueError
A numeric value is outside its proper bounds.
- class seismicrna.core.header.ClustHeader(*, ks: Iterable[int], **kwargs)
Bases:
Header
Header of clusters.
- classmethod clustered()
Whether the header has clusters.
- property clusts
clusters for clustered data, otherwise one track of the average.
- Type:
Tracks of data
- property index
Index of the header.
- iter_clust_indexes()
For each cluster, yield an Index/MultiIndex of every column that is part of the cluster.
- property ks
Numbers of clusters.
- classmethod levels()
Levels of the index.
- property signature
Signature of the header, which will generate an identical header if passed as keyword arguments to make_header.
- class seismicrna.core.header.Header
Bases:
ABC
Header for a table.
- property clusts: list[tuple[int, int]]
clusters for clustered data, otherwise one track of the average.
- Type:
Tracks of data
- get_clust_header()
Corresponding ClustHeader.
- get_rel_header()
Corresponding RelHeader.
- property index: Index
Index of the header.
- abstract iter_clust_indexes()
For each cluster, yield an Index/MultiIndex of every column that is part of the cluster.
- classmethod level_keys()
Level keys of the index.
- classmethod level_names()
Level names of the index.
- abstract classmethod levels()
Levels of the index.
- modified(**kwargs)
Return a new header with a possibly modified signature.
- Parameters:
**kwargs – Keyword arguments for modifying the signature of the header. Each argument given here will be passed to make_header and override the attribute (if any) with the same name in this header’s signature. Attributes of this header’s signature that are not overriden will also be passed to make_header.
- Returns:
New header with a possibly modified signature.
- Return type:
- property names
Formatted name of each track.
- classmethod num_levels()
Number of levels.
- select(**kwargs) Index
Select and return items from the header as an Index.
- property signature
Signature of the header, which will generate an identical header if passed as keyword arguments to make_header.
- property size
Number of items in the Header.
- class seismicrna.core.header.RelClustHeader(*, ks: Iterable[int], **kwargs)
Bases:
ClustHeader
,RelHeader
Header of relationships and clusters.
- property index
Index of the header.
- class seismicrna.core.header.RelHeader(*, rels: Iterable[str], **kwargs)
Bases:
Header
Header of relationships.
- classmethod clustered()
Whether the header has clusters.
- property clusts
clusters for clustered data, otherwise one track of the average.
- Type:
Tracks of data
- property index
Index of the header.
- iter_clust_indexes()
For each cluster, yield an Index/MultiIndex of every column that is part of the cluster.
- property ks
Numbers of clusters.
- classmethod levels()
Levels of the index.
- property rels
Relationships.
- property signature
Signature of the header, which will generate an identical header if passed as keyword arguments to make_header.
- seismicrna.core.header.deduplicate_rels(rels: Iterable)
Remove duplicate relationships while preserving their order.
- Parameters:
rels (
Iterable
) – Relationships- Returns:
Relationships with duplicates removed, in the original order.
- Return type:
list[str]
- seismicrna.core.header.format_clust_name(k: int, clust: int)
Format a pair of k and cluster numbers into a name.
- seismicrna.core.header.format_clust_names(clusts: Iterable[tuple[int, int]], allow_duplicates: bool = False)
Format pairs of k and clust into a list of names.
- Parameters:
clusts (
Iterable[tuple[int
,int]]
) – Zero or more pairs of k and cluster numbers.allow_duplicates (
bool = False
) – Allow k and clust pairs to be duplicated.
- Returns:
List of names of the pairs of k and clust.
- Return type:
list[str]
- Raises:
ValueError – If allow_duplicates is False and clusts has duplicates.
- seismicrna.core.header.list_clusts(k: int)
List all cluster numbers for one k.
- Parameters:
k (
int
) – Number of clusters (≥ 0)- Returns:
List of cluster numbers.
- Return type:
list[int]
- seismicrna.core.header.list_k_clusts(k: int)
List k and cluster numbers as 2-tuples for one k.
- Parameters:
k (
int
) – Number of clusters (≥ 0)- Returns:
List wherein each item is a tuple of the number of clusters and the cluster number.
- Return type:
list[tuple[int
,int]]
- seismicrna.core.header.list_ks_clusts(ks: Iterable[int])
List k and cluster numbers as 2-tuples.
- Parameters:
ks (
Iterable[int]
)- Returns:
List wherein each item is a tuple of the number of clusters and the cluster number.
- Return type:
list[tuple[int
,int]]
- seismicrna.core.header.make_header(*, rels: Iterable[str] | None = None, ks: Iterable[int] | None = None)
Make a new Header of an appropriate type.
- Parameters:
rels (
Iterable[str] | None = None
) – Relationships in the headerks (
Iterable[int] | None = None
) – Numbers of clusters
- Returns:
Header of the appropriate type.
- Return type:
- seismicrna.core.header.parse_header(index: Index | MultiIndex)
Parse an Index into a Header of an appropriate type.
- Parameters:
index (
pd.Index | pd.MultiIndex
) – Index to parse.- Returns:
New Header whose index is index.
- Return type:
- seismicrna.core.header.validate_k_clust(k: int, clust: int)
Validate a pair of k and cluster numbers.
- Parameters:
- Returns:
If the k and cluster numbers form a valid pair.
- Return type:
- Raises:
TypeError – If k or clust is not an integer.
ValueError – If k and clust do not form a valid pair.
- seismicrna.core.header.validate_ks(ks: Iterable)
Validate and sort numbers of clusters.
- Parameters:
ks (
Iterable
) – Numbers of clusters- Returns:
Sorted numbers of clusters
- Return type:
list[int]
- Raises:
ValueError – If any k is not positive or is repeated.
- class seismicrna.core.join.JoinMutsDataset(report_file: Path, verify_times: bool = True)
Bases:
WideMutsDataset
,ABC
- property min_mut_gap
- class seismicrna.core.join.JoinReport(**kwargs: Any | Callable[[Report], Any])
-
Report for a joined dataset.
- class seismicrna.core.logs.AnsiCode
Bases:
object
Format text with ANSI codes.
- BOLD = 1
- END = 'm'
- RESET = 0
- START = '\x1b['
- classmethod reset()
Convenience function to end formatting.
- class seismicrna.core.logs.ConsoleStream(filterer: Filterer, formatter: Formatter)
Bases:
Stream
Log to the console’s stderr stream.
- filterer
- formatter
- property stream
Text stream to which messages will be logged after filtering and formating.
- class seismicrna.core.logs.FileStream(file_path: str | Path, *args, **kwargs)
Bases:
Stream
Log to a file.
- close()
Close the file stream.
- file_path
- property stream
Text stream to which messages will be logged after filtering and formating.
- class seismicrna.core.logs.Filterer(verbosity: int)
Bases:
object
Filter messages before logging.
- verbosity
- class seismicrna.core.logs.Formatter(formatter: Callable[[Message], str])
Bases:
object
Filter messages before logging.
- formatter
- class seismicrna.core.logs.Level(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
IntEnum
Level of a logging message.
- ACTION = 2
- DETAIL = 4
- ERROR = -2
- FATAL = -3
- ROUTINE = 3
- STATUS = 0
- TASK = 1
- WARNING = -1
- class seismicrna.core.logs.Logger(console_stream: ConsoleStream | None = None, file_stream: FileStream | None = None, raise_on_error: bool = False)
Bases:
object
Log messages to the console and to files.
- console_stream
- file_stream
- raise_on_error
- class seismicrna.core.logs.LoggerConfig(verbosity, log_file_path, log_color, raise_on_error)
Bases:
tuple
- log_color
Alias for field number 2
- log_file_path
Alias for field number 1
- raise_on_error
Alias for field number 3
- verbosity
Alias for field number 0
- class seismicrna.core.logs.Message(level: Level, content: object)
Bases:
object
Message with a logging level.
- content
- level
- class seismicrna.core.logs.Stream(filterer: Filterer, formatter: Formatter)
Bases:
ABC
Log to a stream, such as to the console or to a file.
- filterer
- formatter
- seismicrna.core.logs.erase_config()
Erase the existing logger configuration.
- seismicrna.core.logs.exc_info()
Whether to log exception information.
- seismicrna.core.logs.format_console_color(message: Message)
Format a message to log on the console with color.
- seismicrna.core.logs.format_console_plain(message: Message)
Format a message to log on the console without color.
- seismicrna.core.logs.get_config()
Get the configuration parameters of a logger.
- seismicrna.core.logs.log_exceptions(default: Callable | None)
If any exception occurs, catch it and return the default.
- seismicrna.core.logs.restore_config(func: Callable)
After the function exits, restore the logging configuration that was in place before the function ran.
- seismicrna.core.logs.set_config(verbosity: int = 0, log_file_path: str | Path | None = None, log_color: bool = True, raise_on_error: bool = False)
Configure the main logger with handlers and verbosity.
- class seismicrna.core.path.Field(dtype: type[str | int | Path], options: Iterable = (), is_ext: bool = False)
Bases:
object
- property as_str
- class seismicrna.core.path.Path(*seg_types: Segment)
Bases:
object
- property as_str
- exception seismicrna.core.path.PathTypeError
-
Use of the wrong type of path or segment.
- exception seismicrna.core.path.PathValueError
Bases:
PathError
,ValueError
Invalid value of a path segment field.
- class seismicrna.core.path.Segment(segment_name: str, field_types: dict[str, Field], *, order: int = 0, frmt: str | None = None)
Bases:
object
- property as_str
- property ext_type
Type of the segment’s file extension, or None if it has no file extension.
- exception seismicrna.core.path.WrongFileExtensionError
Bases:
PathValueError
A file has the wrong extension.
- seismicrna.core.path.build(*segment_types: Segment, **field_values: Any)
Return a pathlib.Path from the given segment types and field values.
- seismicrna.core.path.builddir(*segment_types: Segment, **field_values: Any)
Build the path and create it on the file system as a directory if it does not already exist.
- seismicrna.core.path.buildpar(*segment_types: Segment, **field_values: Any)
Build a path and create its parent directory if it does not already exist.
- seismicrna.core.path.cast_path(input_path: Path, input_segments: Sequence[Segment], output_segments: Sequence[Segment], **override: Any)
Cast input_path made of input_segments to a new path made of output_segments.
- Parameters:
input_path (
pathlib.Path
) – Input path from which to take the path fields.input_segments (
Sequence[Segment]
) – Path segments to use to determine the fields in input_path.output_segments (
Sequence[Segment]
) – Path segments to use to determine the fields in output_path.**override (
Any
) – Override and supplement the fields in input_path.
- Returns:
Path comprising output_segments made of fields in input_path (as determined by input_segments).
- Return type:
- seismicrna.core.path.create_path_type(*segment_types: Segment)
Create and cache a Path instance from the segment types.
- seismicrna.core.path.deduplicate(paths: Iterable[str | Path], warn: bool = True)
Yield the non-redundant paths.
- seismicrna.core.path.deduplicated(func: Callable)
Decorate a Path generator to yield non-redundant paths.
- seismicrna.core.path.fill_whitespace(path: str | Path, fill: str = '_')
Replace all whitespace in path with fill.
- seismicrna.core.path.find_files(path: str | Path, segments: Sequence[Segment], pre_sanitize: bool = True)
Yield all files that match a sequence of path segments. The behavior depends on what path is:
If it is a file, then yield path if it matches the segments; otherwise, yield nothing.
If it is a directory, then search it recursively and yield every matching file in the directory and its subdirectories.
- Parameters:
path (
str | pathlib.Path
) – Path of a file to check or a directory to search recursively.segments (
Sequence[Segment]
) – Sequence(s) of Path segments to check if each file matches.pre_sanitize (
bool
) – Whether to sanitize the path before searching it.
- Returns:
Paths of files matching the segments.
- Return type:
Generator[Path
,Any
,None]
- seismicrna.core.path.find_files_chain(paths: Iterable[str | Path], segments: Sequence[Segment])
Yield from find_files called on every path in paths.
- seismicrna.core.path.get_fields_in_seg_types(*segment_types: Segment) dict[str, Field]
Get all fields among the given segment types.
- seismicrna.core.path.get_seismicrna_project_dir()
SEISMIC-RNA project directory, named seismic-rna, containing src, pyproject.toml, and all other project files. Will exist if the entire SEISMIC-RNA project has been downloaded, e.g. from GitHub, but not if SEISMIC-RNA was only installed using pip or conda.
- seismicrna.core.path.get_seismicrna_source_dir()
SEISMIC-RNA source directory, named seismicrna, containing __init__.py and the top-level modules and subpackages.
- seismicrna.core.path.mkdir_if_needed(path: Path | str)
Create a directory and log that event if it does not exist.
- seismicrna.core.path.parse(path: str | Path, /, *segment_types: Segment)
Return the fields of a path based on the segment types.
- seismicrna.core.path.parse_top_separate(path: str | Path, /, *segment_types: Segment)
Return the fields of a path, and the top field separately.
- seismicrna.core.path.path_matches(path: str | Path, segments: Sequence[Segment])
Check if a path matches a sequence of path segments.
- Parameters:
path (
str | pathlib.Path
) – Path of the file/directory.segments (
Sequence[Segment]
) – Sequence of path segments to check if the file matches.
- Returns:
Whether the path matches any given sequence of path segments.
- Return type:
- seismicrna.core.path.randdir(parent: str | Path | None = None, prefix: str = '', suffix: str = '')
Build a path of a new directory that does not exist and create it on the file system.
- seismicrna.core.path.rmdir_if_needed(path: Path | str, rmtree: bool = False, rmtree_ignore_errors: bool = False, raise_on_rmtree_error: bool = True)
Remove a directory and log that event if it exists.
- seismicrna.core.path.sanitize(path: str | Path, strict: bool = False)
Sanitize a path-like object by ensuring it is an absolute path, eliminating symbolic links and redundant path separators/references, and returning a Path object.
- Parameters:
path (
str | pathlib.Path
) – Path to sanitize.strict (
bool = False
) – Require the path to exist and contain no symbolic link loops.
- Returns:
Absolute, normalized, symlink-free path.
- Return type:
- seismicrna.core.path.symlink_if_needed(link_path: Path | str, target_path: Path | str)
Make link_path a link pointing to target_path and log that event if it does not exist.
- seismicrna.core.path.transpath(to_dir: str | Path, from_dir: str | Path, path: str | Path, strict: bool = False)
Return the path that would be produced by moving path from from_dir to to_dir (but do not actually move the path on the file system). This function does not require that any of the given paths exist, unless strict is True.
- Parameters:
to_dir (
str | pathlib.Path
) – Directory to which to move path.from_dir (
str | pathlib.Path
) – Directory from which to move path; must contain path but not necessarily be the direct parent directory of path.path (
str | pathlib.Path
) – Path to move; can be a file or directory.strict (
bool = False
) – Require that all paths exist and contain no symbolic link loops.
- Returns:
Hypothetical path after moving path from indir to outdir.
- Return type:
- seismicrna.core.path.transpaths(to_dir: str | Path, *paths: str | Path, strict: bool = False)
Return all paths that would be produced by moving all paths in paths from their longest common sub-path to to_dir (but do not actually move the paths on the file system). This function does not require that any of the given paths exist, unless strict is True.
- Parameters:
to_dir (
str | pathlib.Path
) – Directory to which to move every path in path.*paths (
str | pathlib.Path
) – Paths to move; can be files or directories. A common sub-path must exist among all of these paths.strict (
bool = False
) – Require that all paths exist and contain no symbolic link loops.
- Returns:
Hypothetical paths after moving all paths in path to outdir.
- Return type:
tuple[pathlib.Path
,]
- seismicrna.core.path.validate_top(top: Path)
- seismicrna.core.random.stochastic_round(values: ndarray | list | float | int, preserve_sum: bool = False)
Round values to integers stochastically, so that the probability of rounding up equals the fractional part of the original value.
- Parameters:
values (
np.ndarray | list | float | int
) – Values to round; if scalar, a 0D integer array will be returned.preserve_sum (
bool
) – Whether to ensure that the sum of the rounded values equals the sum of the original values.
- Returns:
Values rounded to integers, with the original sum preserved.
- Return type:
np.ndarray
- class seismicrna.core.report.BatchedRefseqReport(**kwargs: Any | Callable[[Report], Any])
Bases:
BatchedReport
,RefseqReport
,ABC
Convenience class used as a base for several Report classes.
- class seismicrna.core.report.BatchedReport(**kwargs: Any | Callable[[Report], Any])
-
Report with a number of data batches (one file per batch).
- classmethod batch_types() dict[str, type[ReadBatchIO]]
Type(s) of batch(es) for the report, keyed by name.
- abstract classmethod fields()
All fields of the report.
- classmethod get_batch_type(btype: str | None = None) type[ReadBatchIO]
Return a valid type of batch based on its name.
- class seismicrna.core.report.Field(key: str, title: str, dtype: type, default: Any | None = None, *, iconv: Callable[[Any], Any] | None = None, oconv: Callable[[Any], Any] | None = None)
Bases:
object
Field of a report.
- default
- dtype
- iconv
- key
- oconv
- title
- exception seismicrna.core.report.InvalidReportFieldKeyError
Bases:
ReportFieldKeyError
The key does not belog to an actual report field.
- exception seismicrna.core.report.InvalidReportFieldTitleError
Bases:
ReportFieldKeyError
The title does not belog to an actual report field.
- exception seismicrna.core.report.MissingFieldWithNoDefaultError
Bases:
ReportFieldValueError
The default value is requested of a field with no default.
- class seismicrna.core.report.OptionField(option: Option, **kwargs)
Bases:
Field
Field based on a command line option.
- default
- dtype
- iconv
- key
- oconv
- title
- class seismicrna.core.report.RefseqReport(**kwargs: Any | Callable[[Report], Any])
-
Report associated with a reference sequence file.
- abstract classmethod fields()
All fields of the report.
- class seismicrna.core.report.Report(**kwargs: Any | Callable[[Report], Any])
-
Abstract base class for a report from a step.
- classmethod field_keys()
Keys of all fields of the report.
- abstract classmethod fields()
All fields of the report.
- classmethod from_dict(odata: dict[str, Any])
Convert a dict of raw values (keyed by the titles of their fields) into a dict of encoded values (keyed by the keys of their fields), from which a new Report is instantiated.
- get_field(field: Field, missing_ok: bool = False)
Return the value of a field of the report using the field instance directly, not its key.
- to_dict()
Return a dict of raw values of the fields, keyed by the titles of their fields.
- exception seismicrna.core.report.ReportDoesNotHaveFieldError
Bases:
ReportFieldAttributeError
A report does not contain this type of field.
- exception seismicrna.core.report.ReportFieldAttributeError
Bases:
ReportFieldError
,AttributeError
- exception seismicrna.core.report.ReportFieldError
Bases:
RuntimeError
Any error involving a field of a report.
- exception seismicrna.core.report.ReportFieldKeyError
Bases:
ReportFieldError
,KeyError
- exception seismicrna.core.report.ReportFieldTypeError
Bases:
ReportFieldError
,TypeError
- exception seismicrna.core.report.ReportFieldValueError
Bases:
ReportFieldError
,ValueError
- seismicrna.core.report.calc_dt_minutes(began: datetime, ended: datetime)
Calculate the time taken in minutes.
- seismicrna.core.report.fields()
- seismicrna.core.report.iconv_dict_str_dict_int_dict_int_int(mapping: dict[Any, dict[Any, dict[Any, Any]]]) dict[str, dict[int, dict[int, int]]]
- seismicrna.core.run.run_func(command: str, default: ~typing.Callable | None = <class 'list'>, with_tmp: bool = False, pass_keep_tmp: bool = False, *args, **kwargs)
Decorator for a run function.
- seismicrna.core.stats.calc_beta_mv(alpha: float, beta: float)
Find the mean and variance of a beta distribution from its alpha and beta parameters.
- seismicrna.core.stats.calc_beta_params(mean: float, variance: float)
Find the alpha and beta parameters of a beta distribution from its mean and variance.
- seismicrna.core.stats.calc_dirichlet_mv(alpha: ndarray)
Find the means and variances of a Dirichlet distribution from its concentration parameters.
- Parameters:
alpha (
np.ndarray
) – Concentration parameters of the Dirichlet distribution.- Returns:
Means and variances of the Dirichlet distribution.
- Return type:
tuple[np.ndarray
,np.ndarray]
- seismicrna.core.stats.calc_dirichlet_params(mean: ndarray, variance: ndarray)
Find the concentration parameters of a Dirichlet distribution from its mean and variance.
- Parameters:
mean (
np.ndarray
) – Means.variance (
np.ndarray
) – Variances.
- Returns:
Concentration parameters.
- Return type:
np.ndarray
- class seismicrna.core.task.Task(func: Callable)
Bases:
object
Wrap a parallelizable task in a try-except block so that if it fails, it just returns None rather than crashing the other tasks being run in parallel.
- __call__(*args, **kwargs)
Call the task’s function in a try-except block, return the result if it succeeds, and return None otherwise.
- property name
- seismicrna.core.task.as_list_of_tuples(args: Iterable[Any])
Given an iterable of arguments, return a list of 1-item tuples, each containing one of the given arguments. This function is useful for creating a list of tuples to pass to the args parameter of dispatch.
- seismicrna.core.task.calc_pool_size(num_tasks: int, max_procs: int)
Calculate the size of a process pool.
- seismicrna.core.task.dispatch(funcs: list[Callable] | Callable, max_procs: int, pass_n_procs: bool = True, raise_on_error: bool = False, args: list[tuple] | tuple = (), kwargs: dict[str, Any] | None = None)
Run one or more tasks in series or in parallel, depending on the number of tasks, the maximum number of processes, and whether tasks are allowed to be run in parallel.
- Parameters:
funcs (
list[Callable] | Callable
) – The function(s) to run. Can be a list of functions or a single function that is not in a list. If a single function, then if args is a tuple, it is called once with that tuple as its positional arguments; and if args is a list of tuples, it is called for each tuple of positional arguments in args.max_procs (
int
) – Maximum number of processes to run at one time. Must be ≥ 1.pass_n_procs (
bool
) – Whether to pass the number of processes to the function as the keyword argument n_procs.raise_on_error (
bool
) – Whether to raise an error if any tasks fail (if False, only log a warning message).args (
list[tuple] | tuple
) – Positional arguments to pass to each function in funcs. Can be a list of tuples of positional arguments or a single tuple that is not in a list. If a single tuple, then each function receives args as positional arguments. If a list, then args must be the same length as funcs; each function funcs[i] receives args[i] as positional arguments.kwargs (
dict[str
,Any] | None
) – Keyword arguments to pass to every function call.
- Returns:
List of the return value of each run.
- Return type:
- seismicrna.core.tmp.get_release_working_dirs(tmp_dir: Path)
- seismicrna.core.tmp.release_to_out(out_dir: Path, release_dir: Path, initial_path: Path)
Move temporary path(s) to the output directory.
- seismicrna.core.tmp.with_tmp_dir(pass_keep_tmp: bool)
Make a temporary directory, and delete it after returning.
- seismicrna.core.types.fit_uint_type(value: int)
Smallest unsigned int type that will fit the value.
- seismicrna.core.types.get_byte_dtype(nchars: int)
NumPy byte type with the given number of characters.
- seismicrna.core.types.get_max_uint(uint_type: type)
Maximum value of a NumPy unsigned integer type.
- seismicrna.core.types.get_max_value(nbytes: int)
Get the maximum value of an unsigned integer of N bytes.
- seismicrna.core.types.get_uint_dtype(nbytes: int)
NumPy uint data type with the given number of bytes.
- seismicrna.core.unbias.calc_n_reads_per_pos(p_ends_observed: ndarray, n_reads_per_clust: ndarray)
- seismicrna.core.unbias.calc_p_clust(p_clust_observed: ndarray, p_noclose_given_clust: ndarray)
Cluster proportion among all reads.
- Parameters:
p_clust_observed (
np.ndarray
) – Proportion of each cluster among reads with no two mutations too close. 1D (clusters)p_noclose_given_clust (
np.ndarray
) – Probability that a read from each cluster would have no two mutations too close. 1D (clusters)
- Returns:
Proportion of each cluster among all reads. 1D (clusters)
- Return type:
np.ndarray
- seismicrna.core.unbias.calc_p_clust_given_ends_noclose(p_ends_given_clust_noclose: ndarray, p_clust_given_noclose: ndarray)
Calculate the probability that a read with each pair of 5’/3’ ends and no two mutations too close came from each cluster.
- Parameters:
p_ends_given_clust_noclose (
np.ndarray
) – 3D (positions x positions x clusters) array of the probability that a read from each cluster has each pair of 5’/3’ ends given that it has no two mutations too close.p_clust_given_noclose (
np.ndarray
) – 1D (clusters) array of the probability that a read comes from each cluster given that it has no two mutations too close.
- Returns:
3D (positions x positions x clusters) array of the probability that a read with each pair of 5’/3’ ends and no two mutations too close comes from each cluster.
- Return type:
np.ndarray
- seismicrna.core.unbias.calc_p_clust_given_noclose(p_clust: ndarray, p_noclose_given_clust: ndarray)
Cluster proportions among reads with no two mutations too close.
- Parameters:
p_clust (
np.ndarray
) – Proportion of each cluster among all reads. 1D (clusters)p_noclose_given_clust (
np.ndarray
) – Probability that a read from each cluster would have no two mutations too close. 1D (clusters)
- Returns:
Proportion of each cluster among reads with no two mutations too close. 1D (clusters)
- Return type:
np.ndarray
- seismicrna.core.unbias.calc_p_ends(p_ends_observed: ndarray, p_noclose_given_ends: ndarray, p_mut_given_span: ndarray, p_clust: ndarray)
Calculate the proportion of total reads with each pair of 5’ and 3’ coordinates.
This function is meant to be called by another function that has validated the arguments; hence, this function makes assumptions:
Every value in the upper triangle of p_ends_observed is ≥ 0 and ≤ 1; no values below the main diagonal are used.
The upper triangle of p_ends_observed sums to 1.
Every value in p_mut_given_span is ≥ 0 and ≤ 1.
- Parameters:
p_ends_observed (
np.ndarray
) – 3D (positions x positions x clusters) array of the proportion of observed reads in each cluster beginning at the row position and ending at the column position.p_noclose_given_ends (
np.ndarray
) – 3D (positions x positions x clusters) array of the pobabilities that a read with 5’ and 3’ coordinates corresponding to the row and column would have no two mutations too close.p_mut_given_span (
np.ndarray
) – 2D (positions x clusters) array of the total mutation rate at each position in each cluster.p_clust (
np.ndarray
) – 1D (clusters) array of the proportion of each cluster.
- Returns:
2D (positions x positions) array of the proportion of reads beginning at the row position and ending at the column position. This array is assumed to be identical for all clusters.
- Return type:
np.ndarray
- seismicrna.core.unbias.calc_p_ends_given_clust_noclose(p_ends: ndarray, p_noclose_given_ends: ndarray)
Calculate the proportion of reads with no two mutations too close with each pair of 5’ and 3’ coordinates.
Assumptions
p_ends has 2 dimensions: (positions x positions)
Every value in the upper triangle of p_ends is ≥ 0 and ≤ 1; no values below the main diagonal are used.
The upper triangle of p_ends sums to 1.
min_gap is a non-negative integer.
p_mut_given_span has 2 dimensions: (positions x clusters)
Every value in p_mut_given_span is ≥ 0 and ≤ 1.
There is at least 1 cluster.
- param p_ends:
2D (positions x positions) array of the proportion of reads in each cluster beginning at the row position and ending at the column position.
- type p_ends:
np.ndarray
- param p_noclose_given_ends:
3D (positions x positions x clusters) array of the probabilities that a read with 5’ and 3’ coordinates corresponding to the row and column would have no two mutations too close.
- type p_noclose_given_ends:
np.ndarray
- returns:
3D (positions x positions x clusters) array of the proportion of reads without mutations too close, beginning at the row position and ending at the column position, in each cluster.
- rtype:
np.ndarray
- seismicrna.core.unbias.calc_p_ends_given_noclose(p_ends_given_clust_noclose: ndarray, p_clust_given_noclose: ndarray)
Calculate the probability that a read would have each pair of 5’/3’ ends and no two mutations too close.
- Parameters:
p_ends_given_clust_noclose (
np.ndarray
) – 3D (positions x positions x clusters) array of the probability that a read from each cluster has each pair of 5’/3’ ends given that it has no two mutations too close.p_clust_given_noclose (
np.ndarray
) – 1D (clusters) array of the probability that a read comes from each cluster given that it has no two mutations too close.
- Returns:
2D (positions x positions) array of the probability that a read with no two mutations too close has each pair of 5’/3’ ends, regardless of the cluster to which it belongs.
- Return type:
np.ndarray
- seismicrna.core.unbias.calc_p_ends_observed(npos: int, end5s: ndarray, end3s: ndarray, weights: ndarray | None = None, check_values: bool = True)
Calculate the proportion of each pair of 5’/3’ end coordinates observed in end5s and end3s, optionally weighted by weights.
- Parameters:
npos (
int
) – Number of positions.end5s (
np.ndarray
) – 5’ ends (0-indexed) of the reads: 1D array (reads)end3s (
np.ndarray
) – 3’ ends (0-indexed) of the reads: 1D array (reads)weights (
np.ndarray | None
) – Number of times each read occurs in each cluster: 2D array (reads x clusters)check_values (
bool
) – Check that end5s, end3s, and weights are all valid.
- Returns:
Fraction of reads with each 5’ (row) and 3’ (column) coordinate: 3D array (positions x positions x clusters)
- Return type:
np.ndarray
- seismicrna.core.unbias.calc_p_mut_given_span(p_mut_given_span_observed: ndarray, min_gap: int, p_ends: ndarray, init_p_mut_given_span: ndarray, *, quick_unbias: bool = True, quick_unbias_thresh: float = 0.0, f_tol: float = 0.0001, x_rtol: float = 0.001)
Calculate the underlying mutation rates including for reads with two mutations too close based on the observed mutation rates.
- seismicrna.core.unbias.calc_p_mut_given_span_noclose(p_mut_given_span: ndarray, p_ends: ndarray, p_noclose_given_ends: ndarray, p_nomut_window: ndarray)
Calculate the mutation rates of only reads with no two mutations too close that span each position.
- Parameters:
p_mut_given_span (
np.ndarray
) – 2D (positions x clusters) array of the underlying mutation rates (i.e. the probability that a read has a mutation at position (j) given that it contains that position).p_ends (
np.ndarray
) – 2D (positions x positions) array of the proportion of reads in each cluster beginning at the row position and ending at the column position.p_noclose_given_ends (
np.ndarray
) – 3D (positions x positions x clusters) array of the probabilities that a read with 5’ and 3’ coordinates corresponding to the row and column would have no two mutations too close.p_nomut_window (
np.ndarray
) – 3D (window x positions x clusters) array of the probability that (window) consecutive bases, ending at position (position), would have zero mutations at all.
- Returns:
2D (positions x clusters) array of the mutation rate among reads with no two mutations too close per position per cluster.
- Return type:
np.ndarray
- seismicrna.core.unbias.calc_p_noclose(p_clust: ndarray, p_noclose_given_clust: ndarray)
Probability that any read would have two mutations too close.
- Parameters:
p_clust (
np.ndarray
) – Proportion of each cluster among all reads. 1D (clusters)p_noclose_given_clust (
np.ndarray
) – Probability that a read from each cluster would have no two mutations too close. 1D (clusters)
- Returns:
Probability that any read would have no two mutations too close.
- Return type:
- seismicrna.core.unbias.calc_p_noclose_given_clust(p_ends: ndarray, p_noclose_given_ends: ndarray)
Calculate the probability that a read from each cluster would have no two mutations too close.
- seismicrna.core.unbias.calc_p_noclose_given_ends(p_mut_given_span: ndarray, p_nomut_window: ndarray)
Given underlying mutation rates (p_mut_given_span), calculate the probability that a read starting at position (a) and ending at position (b) would have no two mutations too close, for each (a) and (b) where 1 ≤ a ≤ b ≤ L (biological coordinates) or 0 ≤ a ≤ b < L (Python coordinates).
- Parameters:
p_mut_given_span (
np.ndarray
) – 2D (positions x clusters) array of the underlying mutation rates (i.e. the probability that a read has a mutation at position (j) given that it contains that position).p_nomut_window (
np.ndarray
) – 3D (window x positions x clusters) array of the probability that (window) consecutive bases, ending at position (position), would have zero mutations at all.
- Returns:
3D (positions x positions x clusters) array of the probability that a random read starting at position (a) (row) and ending at position (b) (column) would have no two mutations too close.
- Return type:
np.ndarray
- seismicrna.core.unbias.calc_p_noclose_given_ends_auto(p_mut_given_span: ndarray, min_gap: int)
Given underlying mutation rates (p_mut_given_span), calculate the probability that a read starting at position (a) and ending at position (b) would have no two mutations too close (i.e. separated by fewer than min_gap non-mutated positions), for each combination of (a) and (b) such that 1 ≤ a ≤ b ≤ L (in biological coordinates) or 0 ≤ a ≤ b < L (in Python coordinates).
- Parameters:
p_mut_given_span (
ndarray
) – A 2D (positions x clusters) array of the underlying mutation rates, i.e. the probability that a read has a mutation at position (j) given that it contains position (j).min_gap (
int
) – Minimum number of non-mutated bases between two mutations; must be ≥ 0.
- Returns:
3D (positions x positions x clusters) array of the probability that a random read starting at position (a) (row) and ending at position (b) (column) would have no two mutations too close.
- Return type:
np.ndarray
- seismicrna.core.unbias.calc_p_nomut_window(p_mut_given_span: ndarray, min_gap: int)
Given underlying mutation rates (p_mut_given_span), find the probability of no mutations in each window of size 0 to min_gap.
- Parameters:
p_mut_given_span (
ndarray
) – 2D (positions x clusters) array of the underlying mutation rates (i.e. the probability that a read has a mutation at position (j) given that it contains that position).min_gap (
int
) – Minimum number of non-mutated bases between two mutations.
- Returns:
3D (window x positions + 1 x clusters) array of the probability that (window) consecutive bases, ending at position (position), would have 0 mutations at all.
- Return type:
np.ndarray
- seismicrna.core.unbias.calc_params(p_mut_given_span_observed: ndarray, p_ends_observed: ndarray, p_clust_observed: ndarray, min_gap: int, guess_p_mut_given_span: ndarray | None = None, guess_p_ends: ndarray | None = None, guess_p_clust: ndarray | None = None, *, prenormalize: bool = True, max_iter: int = 128, convergence_thresh: float = 0.0001, **kwargs)
Calculate the three sets of parameters based on observed data.
- Parameters:
p_mut_given_span_observed (
np.ndarray
) – Observed probability that each position is mutated given that no two mutations are too close: 2D array (positions x clusters)p_ends_observed (
np.ndarray
) – Observed proportion of reads aligned with each pair of 5’ and 3’ end coordinates given that no two mutations are too close: 3D array (positions x positions x clusters)p_clust_observed (
np.ndarray
) – Observed proportion of reads in each cluster given that no two mutations are too close: 1D array (clusters)min_gap (
int
) – Minimum number of non-mutated bases between two mutations. Must be a non-negative integer.guess_p_mut_given_span (
np.ndarray | None = None
) – Initial guess for the probability that each position is mutated. If given, must be a 2D array (positions x clusters); defaults to p_mut_given_span_observed.guess_p_ends (
np.ndarray | None = None
) – Initial guess for the proportion of total reads aligned to each pair of 5’ and 3’ end coordinates. If given, must be a 2D array (positions x positions); defaults to p_ends_observed.guess_p_clust (
np.ndarray | None = None
) – Initial guess for the proportion of total reads in each cluster. If given, must be a 1D array (clusters); defaults to p_clust_observed.prenormalize (
bool = True
) – Fill missing values in guess_p_mut_given_span, guess_p_ends, and guess_p_clust, and clip every value to be ≥ 0 and ≤ 1. Ensure the proportions in guess_p_clust and the upper triangle of guess_p_ends sum to 1.max_iter (
int = 128
) – Maximum number of iterations in which to refine the parameters.convergence_thresh (
float = 1.e-4
) – Convergence threshold based on the root-mean-square difference in mutation rates between consecutive iterations.**kwargs – Additional keyword arguments for _calc_p_mut_given_span.
- seismicrna.core.unbias.calc_params_observed(n_pos_total: int, unmasked_pos: Iterable[int], muts_per_pos: Iterable[ndarray], end5s: ndarray, end3s: ndarray, counts_per_uniq: ndarray, resps: ndarray)
Calculate the observed estimates of the parameters.
- Parameters:
n_pos_total (
int
) – Total number of positions in the region.unmasked_pos (
Iterable[int]
) – Unmasked positions; must be zero-indexed with respect to the 5’ end of the region.muts_per_pos (
Iterable[np.ndarray]
) – For each unmasked position, numbers of all reads with a mutation at that position.end5s (
np.ndarray
) – 5’ end of every unique read; must be 0-indexed with respect to the 5’ end of the region.end3s (
np.ndarray
) – 3’ end of every unique read; must be 0-indexed with respect to the 5’ end of the region.counts_per_uniq (
np.ndarray
) – Number of times each unique read occurs.resps (
np.ndarray
) – Cluster memberships of each read: 2D array (reads x clusters)
- Return type:
tuple[np.ndarray
,np.ndarray
,np.ndarray]
- seismicrna.core.unbias.calc_rectangular_sum(array: ndarray)
For each element of the main diagonal, calculate the sum over the rectangular array from that element to the upper right corner. This function is meant to be called by another function that has validated the arguments; hence, this function makes assumptions:
array has at least 2 dimensions.
The first and second dimensions of array have equal lengths.
- Parameters:
array (
np.ndarray
) – Array of at least two dimensions for which to calculate the sum of each rectangular array from each element on the main diagonal to the upper right corner.- Returns:
Array with all but the first dimension of array indicating the sum of the array from each element on the main diagonal to the upper right corner of array.
- Return type:
np.ndarray
- seismicrna.core.unbias.triu_allclose(a: ndarray | float, b: ndarray | float, rtol: float = 0.001, atol: float = 1e-06)
Whether the upper triangles of a and b are all close.
- Parameters:
a (
np.ndarray | float
) – Array 1.b (
np.ndarray | float
) – Array 2.rtol (
float = 1.0e-3
) – Relative tolerance.atol (
float = 1.0e-6
) – Absolute tolerance.
- Returns:
Whether all elements of the upper triangles of a and b are close using the function np.allclose.
- Return type:
- seismicrna.core.unbias.triu_dot(a: ndarray, b: ndarray)
Dot product of a and b over their first 2 dimensions.
- Parameters:
a (
np.ndarray
) – Array 1.b (
np.ndarray
) – Array 2.
- Returns:
Dot product of a and b over their first 2 dimensions.
- Return type:
np.ndarray
- seismicrna.core.unbias.triu_log(a: ndarray)
Calculate the logarithm of the upper triangle(s) of array a. In the result, elements below the main diagonal are undefined.
- Parameters:
a (
np.ndarray
) – Array (≥ 2 dimensions) of whose upper triangle to compute the logarithm; the first 2 dimensions must have equal lengths.- Returns:
Logarithm of the upper triangle(s) of a.
- Return type:
np.ndarray
- seismicrna.core.unbias.triu_sum(a: ndarray)
Calculate the sum over the upper triangle(s) of array a.
- Parameters:
a (
np.ndarray
) – Array whose upper triangle to sum.- Returns:
Sum of the upper triangle(s), with the same shape as the third and subsequent dimensions of a.
- Return type:
np.ndarray
- seismicrna.core.version.format_version(major: int = 0, minor: int = 23, patch: int = 0, prtag: str = '')
- seismicrna.core.version.parse_version(version: str = '0.23.0')
Major and minor versions, patch, and pre-release tag.
- seismicrna.core.write.need_write(query: Path, force: bool = False, warn: bool = True)
Determine whether a file/directory must be written.
- Parameters:
query (
Path
) – File or directory for which to check the need for writing.force (
bool = False
) – Force the query to be written, even if it already exists.warn (
bool = True
) – If the query does not need to be written, then log a warning.
- Returns:
Whether the file must be written.
- Return type:
- seismicrna.core.write.write_mode(force: bool = False, binary: bool = False)
Get the mode in which to open a file for writing.
- Parameters:
force (
bool = False
) – Force the file to be written, truncating the file if it exists. If False and the file exists, a FileExistsError will be raised.binary (
bool = False
) – Write the file in binary mode instead of text mode.
- Returns:
The mode argument for the builtin function open().
- Return type: