seismicrna.core.mu package

Subpackages

Submodules

seismicrna.core.mu.compare.calc_arcsine_distance(mus1: float | ndarray | Series | DataFrame, mus2: float | ndarray | Series | DataFrame)

Calculate the arcsine distance between mus1 and mus2. Assume that mus1 and mus2 are on the same scale (e.g. two clusters from the same sample), so perform no scaling or normalization.

Parameters:
  • mus1 (float | np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

  • mus2 (float | np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Arcsine distance: 2/π * |arcsin(√mus1) - arcsin(√mus2)|

Return type:

float | np.ndarray | pd.Series | pd.DataFrame

seismicrna.core.mu.compare.calc_coeff_determ(mus1: ndarray | Series | DataFrame, mus2: ndarray | Series | DataFrame)

Calculate the coefficient of determination (a.k.a. R-squared) between two groups of mutation rates, ignoring NaNs.

Parameters:
  • mus1 (np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

  • mus2 (np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Coefficient of determination.

Return type:

float | np.ndarray | pd.Series

seismicrna.core.mu.compare.calc_mean_arcsine_distance(mus1: ndarray | Series | DataFrame, mus2: ndarray | Series | DataFrame)

Calculate the mean arcsine distance between mus1 and mus2.

Parameters:
  • mus1 (np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

  • mus2 (np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Mean arcsine distance.

Return type:

float | np.ndarray | pd.Series

seismicrna.core.mu.compare.calc_pearson(mus1: ndarray | Series | DataFrame, mus2: ndarray | Series | DataFrame)

Calculate the Pearson correlation coefficient between two groups of mutation rates, ignoring NaNs.

Parameters:
  • mus1 (np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

  • mus2 (np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Pearson correlation coefficient.

Return type:

float | np.ndarray | pd.Series

seismicrna.core.mu.compare.calc_spearman(mus1: ndarray | Series | DataFrame, mus2: ndarray | Series | DataFrame)

Calculate the Spearman rank correlation coefficient between two groups of mutation rates, ignoring NaNs.

Parameters:
  • mus1 (np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

  • mus2 (np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Spearman rank correlation coefficient.

Return type:

float | np.ndarray | pd.Series

seismicrna.core.mu.compare.calc_sum_arcsine_distance(mus1: ndarray | Series | DataFrame, mus2: ndarray | Series | DataFrame)

Calculate the sum of arcsine distances between mus1 and mus2.

Parameters:
  • mus1 (np.ndarray | pd.Series | pd.DataFrame) – First group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

  • mus2 (np.ndarray | pd.Series | pd.DataFrame) – Second group of mutation rates; can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Sum of arcsine distances.

Return type:

float | np.ndarray | pd.Series

seismicrna.core.mu.compare.compare_windows(mus1: Series, mus2: Series, method: str | Callable, size: int, min_count: int = 2)

Compare two Series via sliding windows.

seismicrna.core.mu.compare.get_comp_func(key: str) Callable

Get the function of a comparison method based on its key.

Parameters:

key (str) – Key with which to retrieve the comparison function.

Returns:

Function to compare mutation rates.

Return type:

Callable

seismicrna.core.mu.compare.get_comp_method(key: str)

Get a comparison method based on its key.

seismicrna.core.mu.compare.get_comp_name(key: str) str

Get the name of a comparison method based on its key.

Parameters:

key (str) – Key with which to retrieve the comparison method name.

Returns:

Name of the comparison method.

Return type:

str

seismicrna.core.mu.dim.count_pos(mus: ndarray | Series | DataFrame)

Count the positions in an array of mutation rates.

Parameters:

mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

Returns:

Number of positions in the array of mutation rates.

Return type:

int

seismicrna.core.mu.dim.counts_pos(*mus: ndarray | Series | DataFrame)

Count the positions in each array of mutation rates.

Parameters:

*mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Groups of mutation rates; each can contain multiple sets as the columns of a multidimensional array or DataFrame.

Returns:

Number of positions in each array of mutation rates.

Return type:

tuple[int, ]

seismicrna.core.mu.dim.counts_pos_consensus(*mus: ndarray | Series | DataFrame)

Find the number of positions in every array of mutation rates; every array must have the same number of positions.

Parameters:

*mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Groups of mutation rates; each can contain multiple sets as the columns of a multidimensional array.

Returns:

Number of positions in every array of mutation rates.

Return type:

int

seismicrna.core.mu.frame.auto_reframe(func: Callable)

Decorate a function with one positional argument of data so that it converts the input data to a NumPy array, runs, and then reframes the return value using the original argument as the target.

Note that if @auto_reframe and @auto_remove_nan are used to decorate the same function, then auto_reframe should be the inner decorator. If auto_remove_nan is the inner decorator and removes any NaNs, then auto_reframe will attempt to broadcast the NaN-less axis 0 over the original (longer) axis 0. This operation would raise a ValueError or, worse, if the NaN-less axis 0 happened to have length 1, would still broadcast to the original axis, causing a silent bug.

seismicrna.core.mu.frame.reframe(values: Number | ndarray | Series | DataFrame, axes: Iterable[int | ndarray | Index] | None = None)

Place the values in an array object with the given axes.

Parameters:
  • values (Number | numpy.ndarray | pandas.Series | pandas.DataFrame) – Value(s) to place in a new array-like object.

  • axes (tuple[int | numpy.ndarray | pandas.Index, ] | None) –

    Axes of the new array-like object, specified as follows:

    • If None, then return just the values as a NumPy array.

    • If a tuple, then each element creates an axis as follows:

      • If an integer, then force the corresponding axis to be of that length.

      • If an array-like, then assign the axis a Pandas Index from the values in the element.

      Then, the array and index types are determined as follows:

      • If all elements are integers, then return a NumPy array in which the values are broadcast to the shape given by axes.

      • If at least one element is array-like, then return a Pandas object (a Series if axes has one item, a DataFrame if two).

      • If integers and array-like items are mixed, then replace each integer with a Pandas RangeIndex.

Returns:

Value(s) in their new array-like object.

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

seismicrna.core.mu.frame.reframe_like(values: Number | ndarray | Series | DataFrame, target: ndarray | Series | DataFrame, drop: int = 0)

Place the values in an array object with the same type and axes as target.

Parameters:
  • values (Number | numpy.ndarray | pandas.Series | pandas.DataFrame) – Value(s) to place in a new array-like object.

  • target (numpy.ndarray | pandas.Series | pandas.DataFrame) – Array object whose type and axes are to be used for constructing the returned array.

  • drop (int = 0) – Reduce the dimensionality of the target by dropping this number of axes, starting from axis 0 and continuing upwards.

Returns:

Value(s) in their new array-like object.

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

seismicrna.core.mu.measure.calc_gini(mus: ndarray | Series | DataFrame)

Calculate the Gini coefficient of mutation rates, ignoring NaNs.

Parameters:

mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

Returns:

Value of the Gini coefficient.

Return type:

float | numpy.ndarray | pandas.Series

seismicrna.core.mu.measure.calc_signal_noise(mus: ndarray | Series | DataFrame, is_signal: ndarray | Series)

Calculate the signal-to-noise ratio of mutation rates.

Parameters:
  • mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a DataFrame.

  • is_signal (np.ndarray | pd.Series) – Whether to count each position as signal.

Returns:

Signal-to-noise ratio.

Return type:

float | numpy.ndarray | pandas.Series

Comparisons of arbitrary numbers of mutation rates.

seismicrna.core.mu.nan.any_nan(mus: ndarray | Series | DataFrame)

Boolean array of positions where any mutation rate is NaN.

Parameters:

mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

Returns:

Boolean array of positions where any mutation rate is NaN.

Return type:

numpy.ndarray | pandas.Series

seismicrna.core.mu.nan.auto_remove_nan(func: Callable)

Decorate a function with one positional argument of mutation rates so that it automatically removes positions with NaNs from the input argument (but, if while using the NaN-less input, the function produces any new NaNs, then those NaNs will be returned).

Note that if @auto_reframe and @auto_remove_nan are used to decorate the same function, then auto_reframe should be the inner decorator. If auto_remove_nan is the inner decorator and removes any NaNs, then auto_reframe will attempt to broadcast the NaN-less axis 0 over the original (longer) axis 0. This operation would raise a ValueError or, worse, if the NaN-less axis 0 happened to have length 1, would still broadcast to the original axis, causing a silent bug.

seismicrna.core.mu.nan.auto_removes_nan(func: Callable)

Decorate a function with positional argument(s) of mutation rates so that it automatically removes positions with NaNs from the input argument (but, if while using the NaN-less input, the function produces any new NaNs, then those NaNs will be returned).

seismicrna.core.mu.nan.no_nan(mus: ndarray | Series | DataFrame)

Boolean array of positions where no mutation rate is NaN.

Parameters:

mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

Returns:

Boolean array of positions where no mutation rate is NaN.

Return type:

numpy.ndarray | pandas.Series

seismicrna.core.mu.nan.remove_nan(mus: ndarray | Series | DataFrame)

Remove positions at which any mutation rate is NaN.

Parameters:

mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

Returns:

Mutation rates without NaN values.

Return type:

tuple[numpy.ndarray | pandas.Series | pandas.DataFrame, ]

seismicrna.core.mu.nan.removes_nan(*mus: ndarray | Series | DataFrame)

Remove positions at which any mutation rate in any group is NaN.

Parameters:

*mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Groups of mutation rates; each can contain multiple sets as the columns of a multidimensional array.

Returns:

Mutation rates without NaN values.

Return type:

tuple[numpy.ndarray | pandas.Series | pandas.DataFrame, ]

Scale mutation rates.

seismicrna.core.mu.scale.calc_quantile(mus: ndarray | Series | DataFrame, quantile: float)

Calculate the mutation rate at a quantile, ignoring NaNs.

Parameters:
  • mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

  • quantile (float) – Quantile to return from the mutation rates; must be in [0, 1].

Returns:

Value of the quantile from the mutation rates.

Return type:

float | numpy.ndarray | pandas.Series

seismicrna.core.mu.scale.calc_ranks(mus: ndarray | Series | DataFrame)

Rank the mutation rates.

Parameters:

mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

Returns:

Ranks of the mutation rates.

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

seismicrna.core.mu.scale.normalize(mus: ndarray | Series | DataFrame, quantile: float)

Normalize the mutation rates to a quantile, so that the value of the quantile is scaled to 1 and all other mutation rates are scaled by the same factor. If quantile is 0, then do not normalize.

Parameters:
  • mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

  • quantile (float) – Quantile for normalizing the mutation rates; must be in [0, 1].

Returns:

Normalized mutation rates.

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame

seismicrna.core.mu.scale.winsorize(mus: ndarray | Series | DataFrame, quantile: float)

Normalize and winsorize the mutation rates to a quantile so that all mutation rates greater than or equal to the mutation rate at the quantile are set to 1, and lesser mutation rates are normalized.

Parameters:
  • mus (numpy.ndarray | pandas.Series | pandas.DataFrame) – Mutation rates. Multiple sets of mutation rates can be given as columns of a multidimensional array or DataFrame.

  • quantile (float) – Quantile for normalizing the mutation rates; must be in [0, 1].

Returns:

Normalized and winsorized mutation rates.

Return type:

numpy.ndarray | pandas.Series | pandas.DataFrame