dynast.estimation.pi

Module Contents

Functions

read_pi(pi_path, group_by=None)

Read pi CSV as a dictionary.

initializer(model)

Multiprocessing initializer.

beta_mean(alpha, beta)

Calculate the mean of a beta distribution.

beta_mode(alpha, beta)

Calculate the mode of a beta distribution.

guess_beta_parameters(guess, strength=5)

Given a guess of the mean of a beta distribution, calculate beta

fit_stan_mcmc(values, p_e, p_c, guess=0.5, model=None, n_chains=1, n_warmup=1000, n_iters=1000, seed=None)

Run MCMC to estimate the fraction of labeled RNA.

estimate_pi(df_aggregates, p_e, p_c, pi_path, group_by=None, p_group_by=None, n_threads=8, threshold=16, seed=None, nasc=False, model=None)

Estimate the fraction of labeled RNA.

Attributes

_model

dynast.estimation.pi.read_pi(pi_path, group_by=None)

Read pi CSV as a dictionary.

Parameters
  • pi_path (str) – path to CSV containing pi values

  • group_by (list, optional) – columns that were used to group estimation, defaults to None

Returns

dictionary with barcodes and genes as keys

Return type

dictionary

dynast.estimation.pi._model
dynast.estimation.pi.initializer(model)

Multiprocessing initializer. https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor

This initializer performs a one-time expensive initialization for each process.

dynast.estimation.pi.beta_mean(alpha, beta)

Calculate the mean of a beta distribution. https://en.wikipedia.org/wiki/Beta_distribution

Parameters
  • alpha (float) – first parameter of the beta distribution

  • beta (float) – second parameter of the beta distribution

Returns

mean of the beta distribution

Return type

float

dynast.estimation.pi.beta_mode(alpha, beta)

Calculate the mode of a beta distribution. https://en.wikipedia.org/wiki/Beta_distribution

When the distribution is bimodal (alpha, beta < 1), this function returns nan.

Parameters
  • alpha (float) – first parameter of the beta distribution

  • beta (float) – second parameter of the beta distribution

Returns

mode of the beta distribution

Return type

float

dynast.estimation.pi.guess_beta_parameters(guess, strength=5)

Given a guess of the mean of a beta distribution, calculate beta distribution parameters such that the distribution is skewed by some strength toward the guess.

Parameters
  • guess (float) – guess of the mean of the beta distribution

  • strength (int) – strength of the skew, defaults to 5

Returns

beta distribution parameters (alpha, beta)

Return type

(float, float)

dynast.estimation.pi.fit_stan_mcmc(values, p_e, p_c, guess=0.5, model=None, n_chains=1, n_warmup=1000, n_iters=1000, seed=None)

Run MCMC to estimate the fraction of labeled RNA.

Parameters
  • values (numpy.ndarray) –

    array of three columns encoding a sparse array in (row, column, value) format, zero-indexed, where

    row: number of conversions column: nucleotide content value: number of reads

  • p_e (float) – average mutation rate in unlabeled RNA

  • p_c (float) – average mutation rate in labeled RNA

  • guess (float, optional) – guess for the fraction of labeled RNA, defaults to 0.5

  • model (pystan.StanModel, optional) – pyStan model to run MCMC with, defaults to None if not provided, will try to use the _model global variable

  • n_chains (int, optional) – number of MCMC chains, defaults to 1

  • n_warmup (int, optional) – number of warmup iterations, defaults to 1000

  • n_iters (int, optional) – number of MCMC iterations, excluding any warmups, defaults to 1000

  • seed (int, optional) – random seed used for MCMC, defaults to None

Returns

(guess, alpha, beta, pi)

Return type

(float, float, float, float)

dynast.estimation.pi.estimate_pi(df_aggregates, p_e, p_c, pi_path, group_by=None, p_group_by=None, n_threads=8, threshold=16, seed=None, nasc=False, model=None)

Estimate the fraction of labeled RNA.

Parameters
  • df_aggregates (pandas.DataFrame) – Pandas dataframe containing aggregate values

  • p_e (float) – average mutation rate in unlabeled RNA

  • p_c (float) – average mutation rate in labeled RNA

  • pi_path (str) – path to write pi estimates

  • group_by (list, optional) – columns that were used to group cells, defaults to None

  • p_group_by (list, optional) – columns that p_e/p_c estimation was grouped by, defaults to None

  • n_threads (int, optional) – number of threads, defaults to 8

  • threshold (int, optional) – any conversion-content pairs with fewer than this many reads will not be processed, defaults to 16

  • seed (int, optional) – random seed, defaults to None

  • nasc (bool, optional) – flag to change behavior to match NASC-seq pipeline. Specifically, the mode of the estimated Beta distribution is used as pi, defaults to False

  • model (pystan.StanModel, optional) – pyStan model to run MCMC with, defaults to None if not provided, will try to compile the module manually

Returns

path to pi output

Return type

str