seismicrna.core.mu package

Subpackages

seismicrna.core.mu.tests package
- Submodules

Submodules

Calculate the arcsine distance between mus1 and mus2. Assume that mus1 and mus2 are on the same scale (e.g. two clusters from the same sample), so perform no scaling or normalization.

Parameters:

mus1 (float | np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.
mus2 (float | np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Arcsine distance: 2/π * |arcsin(√mus1) - arcsin(√mus2)|

Return type:

float | np.ndarray | pd.Series | pd.DataFrame

seismicrna.core.mu.compare.calc_coeff_determ(mus1: ndarray | Series | DataFrame, mus2: ndarray | Series | DataFrame)

Calculate the coefficient of determination (a.k.a. R-squared) between two groups of mutation rates, ignoring NaNs.

Parameters:

mus1 (np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.
mus2 (np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Coefficient of determination.

Return type:

float | np.ndarray | pd.Series

seismicrna.core.mu.compare.calc_mean_arcsine_distance(mus1: ndarray | Series | DataFrame, mus2: ndarray | Series | DataFrame)

Calculate the mean arcsine distance between mus1 and mus2.

Parameters:

mus1 (np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.
mus2 (np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Mean arcsine distance.

Return type:

float | np.ndarray | pd.Series

seismicrna.core.mu.compare.calc_pearson(mus1: ndarray | Series | DataFrame, mus2: ndarray | Series | DataFrame)

Calculate the Pearson correlation coefficient between two groups of mutation rates, ignoring NaNs.

Parameters:

mus1 (np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.
mus2 (np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Pearson correlation coefficient.

Return type:

float | np.ndarray | pd.Series

seismicrna.core.mu.compare.calc_spearman(mus1: ndarray | Series | DataFrame, mus2: ndarray | Series | DataFrame)

Calculate the Spearman rank correlation coefficient between two groups of mutation rates, ignoring NaNs.

Parameters:

mus1 (np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.
mus2 (np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Spearman rank correlation coefficient.

Return type:

float | np.ndarray | pd.Series

seismicrna.core.mu.compare.calc_sum_arcsine_distance(mus1: ndarray | Series | DataFrame, mus2: ndarray | Series | DataFrame)

Calculate the sum of arcsine distances between mus1 and mus2.

Parameters:

mus1 (np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.
mus2 (np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Sum of arcsine distances.

Return type:

float | np.ndarray | pd.Series

seismicrna.core.mu.compare.compare_windows(mus1: Series, mus2: Series, method: str | Callable, size: int, min_count: int = 2): Compare two Series via sliding windows.

seismicrna.core.mu.compare.get_comp_func(key: str) → Callable

Get the function of a comparison method based on its key.

Parameters:: key (str) – Key with which to retrieve the comparison function.
Returns:: Function to compare mutation rates.
Return type:: Callable

seismicrna.core.mu.compare.get_comp_method(key: str): Get a comparison method based on its key.

seismicrna.core.mu.compare.get_comp_name(key: str) → str

Get the name of a comparison method based on its key.

Parameters:: key (str) – Key with which to retrieve the comparison method name.
Returns:: Name of the comparison method.
Return type:: str

seismicrna.core.mu.dim.count_pos(mus: ndarray | Series | DataFrame)

Count the positions in an array of mutation rates.

Parameters:: mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.
Returns:: Number of positions in the array of mutation rates.
Return type:: int

seismicrna.core.mu.dim.counts_pos(*mus: ndarray | Series | DataFrame)

Count the positions in each array of mutation rates.

Parameters:: *mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Groups of mutation rates; each can contain multiple sets as the columns of a multidimensional array or DataFrame.
Returns:: Number of positions in each array of mutation rates.
Return type:: tuple[int, ]

seismicrna.core.mu.dim.counts_pos_consensus(*mus: ndarray | Series | DataFrame)

Find the number of positions in every array of mutation rates; every array must have the same number of positions.

Parameters:: *mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Groups of mutation rates; each can contain multiple sets as the columns of a multidimensional array.
Returns:: Number of positions in every array of mutation rates.
Return type:: int

seismicrna.core.mu.dist.calc_pdfs(mus_bases: dict[str, Series | DataFrame], scale_factor: float): Calculate the PDFs for each base.

seismicrna.core.mu.dist.calc_pseudoenergies(mus: Series | DataFrame, temperature: float | int, f_paired: float | int, **kwargs): Calculate the pseudoenergy of each base to predict structures, in kcal/mol.

seismicrna.core.mu.dist.calc_scale_factor(mus: Series | DataFrame, f_paired: float | int, eps: float = 1e-06, **kwargs): Calculate the scale factor for the parameters to fit the given mutation rates.

seismicrna.core.mu.dist.get_mus_bases(mus: Series | DataFrame): Get the mutation rates for each base.

seismicrna.core.mu.dist.get_scaled_pdf_func(pdf_func: Callable): Return a function gives the PDF of a random variable Y = cX, where the PDF of X is given by pdf_func.

seismicrna.core.mu.dist.scaled_double_kumaraswamy_pdf(y: ndarray, c: float, *params: float)

seismicrna.core.mu.dist.scaled_kumaraswamy_pdf(y: ndarray, c: float, *params: float)

seismicrna.core.mu.dist.simulate_and_plot_distributions(n_samples: int = 1000, f_paired: float = 0.5, base: str = 'A', seed: int = 42, figsize: tuple = (10, 6), **kwargs)

Simulate mutation rates and plot the original and scaled distributions.

Parameters:

n_samples (int) – Number of mutation rates to simulate.
f_paired (float) – Fraction of paired bases.
base (str) – Base type to simulate (A, C, G, or U).
seed (int) – Random seed for reproducibility.
figsize (tuple) – Figure size (width, height) in inches.
**kwargs – Additional arguments to pass to calc_scale_factor.

Returns:

(scale_factor, figure) - The optimized scale factor and the matplotlib figure.

Return type:

tuple

seismicrna.core.mu.fit.fit_beta_mixture_model(mus: ndarray | Series, ab_params_mean: ndarray, ab_params_cov: ndarray, fpaired_alpha: float = 1.0, fpaired_beta: float = 1.0, eps: float = 1e-06, n_trials: int = 20, maxiter: int = 10000, ftol: float = 1e-08, gtol: float = 1e-08)

Fit a two-component beta mixture model to mutation rates.

Parameters:

mus (numpy.ndarray | pd.Series) – Array of mutation rates (values between 0 and 1)
ab_params_mean (numpy.ndarray) – Mean of the log-normal distribution of the alpha and beta parameters
ab_params_cov (numpy.ndarray) – Covariance matrix of the log-normal distribution of the alpha and beta parameters
fpaired_alpha (float) – Alpha parameter for the beta distribution of the fraction of paired bases
fpaired_beta (float) – Beta parameter for the beta distribution of the fraction of paired bases
eps (float) – Clip all mus < eps to eps, and all mus > (1 - eps) to (1 - eps).
n_trials (int) – Number of optimization attempts with different random initial parameters
maxiter (int) – Maximum number of iterations
ftol (float) – Tolerance for the function value
gtol (float) – Tolerance for the gradient

Returns:

Best fitting parameters found across all trials

Return type:

dict[str, float]

seismicrna.core.mu.fit.plot_beta_mixture(mus, params)

Plot the fitted beta mixture model against the data histogram.

Parameters:

musnumpy.ndarray: The data (mutation rates)
weightslist or numpy.ndarray: The weights of the mixture components [weight1, weight2]
paramslist or numpy.ndarray: The parameters of the beta distributions [alpha1, beta1, alpha2, beta2]

seismicrna.core.mu.frame.auto_reframe(func: Callable)

Decorate a function with one positional argument of data so that it converts the input data to a NumPy array, runs, and then reframes the return value using the original argument as the target.

Note that if @auto_reframe and @auto_remove_nan are used to decorate the same function, then auto_reframe should be the inner decorator. If auto_remove_nan is the inner decorator and removes any NaNs, then auto_reframe will attempt to broadcast the NaN-less axis 0 over the original (longer) axis 0. This operation would raise a ValueError or, worse, if the NaN-less axis 0 happened to have length 1, would still broadcast to the original axis, causing a silent bug.

Place the values in an array object with the given axes.

Parameters:

values (Number | numpy.ndarray | pandas.Series | pandas.DataFrame) – Value(s) to place in a new array-like object.
axes (tuple[int | numpy.ndarray | pandas.Index, ] | None) –
Axes of the new array-like object, specified as follows:
- If None, then return just the values as a NumPy array.
- If a tuple, then each element creates an axis as follows:
  - If an integer, then force the corresponding axis to be of that length.
  - If an array-like, then assign the axis a Pandas Index from the values in the element.
  Then, the array and index types are determined as follows:
  - If all elements are integers, then return a NumPy array in which the values are broadcast to the shape given by axes.
  - If at least one element is array-like, then return a Pandas object (a Series if axes has one item, a DataFrame if two).
  - If integers and array-like items are mixed, then replace each integer with a Pandas RangeIndex.

Returns:

Value(s) in their new array-like object.

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

Place the values in an array object with the same type and axes as target.

Parameters:

values (Number | numpy.ndarray | pandas.Series | pandas.DataFrame) – Value(s) to place in a new array-like object.
target (numpy.ndarray | pandas.Series | pandas.DataFrame) – Array object whose type and axes are to be used for constructing the returned array.
drop (int = 0) – Reduce the dimensionality of the target by dropping this number of axes, starting from axis 0 and continuing upwards.

Returns:

Value(s) in their new array-like object.

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

seismicrna.core.mu.measure.calc_gini(mus: ndarray | Series | DataFrame)

Calculate the Gini coefficient of mutation rates, ignoring NaNs.

Parameters:: mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.
Returns:: Value of the Gini coefficient.
Return type:: float | numpy.ndarray | pandas.Series

seismicrna.core.mu.measure.calc_signal_noise(mus: ndarray | Series | DataFrame, is_signal: ndarray | Series)

Calculate the signal-to-noise ratio of mutation rates.

Parameters:

mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a DataFrame.
is_signal (np.ndarray | pd.Series) – Whether to count each position as signal.

Returns:

Signal-to-noise ratio.

Return type:

float | numpy.ndarray | pandas.Series

seismicrna.core.mu.nan.any_nan(mus: ndarray | Series | DataFrame)

Boolean array of positions where any mutation rate is NaN.

Parameters:: mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.
Returns:: Boolean array of positions where any mutation rate is NaN.
Return type:: numpy.ndarray | pandas.Series

seismicrna.core.mu.nan.auto_remove_nan(func: Callable)

Decorate a function with one positional argument of mutation rates so that it automatically removes positions with NaNs from the input argument (but, if while using the NaN-less input, the function produces any new NaNs, then those NaNs will be returned).

Note that if @auto_reframe and @auto_remove_nan are used to decorate the same function, then auto_reframe should be the inner decorator. If auto_remove_nan is the inner decorator and removes any NaNs, then auto_reframe will attempt to broadcast the NaN-less axis 0 over the original (longer) axis 0. This operation would raise a ValueError or, worse, if the NaN-less axis 0 happened to have length 1, would still broadcast to the original axis, causing a silent bug.

seismicrna.core.mu.nan.auto_removes_nan(func: Callable): Decorate a function with positional argument(s) of mutation rates so that it automatically removes positions with NaNs from the input argument (but, if while using the NaN-less input, the function produces any new NaNs, then those NaNs will be returned).

seismicrna.core.mu.nan.no_nan(mus: ndarray | Series | DataFrame)

Boolean array of positions where no mutation rate is NaN.

Parameters:: mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.
Returns:: Boolean array of positions where no mutation rate is NaN.
Return type:: numpy.ndarray | pandas.Series

seismicrna.core.mu.nan.remove_nan(mus: ndarray | Series | DataFrame)

Remove positions at which any mutation rate is NaN.

Parameters:: mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.
Returns:: Mutation rates without NaN values.
Return type:: tuple[numpy.ndarray | pandas.Series | pandas.DataFrame, ]

seismicrna.core.mu.nan.removes_nan(*mus: ndarray | Series | DataFrame)

Remove positions at which any mutation rate in any group is NaN.

Parameters:: *mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Groups of mutation rates; each can contain multiple sets as the columns of a multidimensional array.
Returns:: Mutation rates without NaN values.
Return type:: tuple[numpy.ndarray | pandas.Series | pandas.DataFrame, ]

seismicrna.core.mu.scale.calc_ranks(mus: ndarray | Series | DataFrame)

Rank the mutation rates.

Parameters:: mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.
Returns:: Ranks of the mutation rates.
Return type:: numpy.ndarray | pandas.Series | pandas.DataFrame