`dynast.estimation.pi`

Module Contents

Functions

`read_pi`(pi_path, group_by=None)	Read pi CSV as a dictionary.
`initializer`(model)	Multiprocessing initializer.
`beta_mean`(alpha, beta)	Calculate the mean of a beta distribution.
`beta_mode`(alpha, beta)	Calculate the mode of a beta distribution.
`guess_beta_parameters`(guess, strength=5)	Given a guess of the mean of a beta distribution, calculate beta
`fit_stan_mcmc`(values, p_e, p_c, guess=0.5, model=None, n_chains=1, n_warmup=1000, n_iters=1000, seed=None)	Run MCMC to estimate the fraction of labeled RNA.
`estimate_pi`(df_aggregates, p_e, p_c, pi_path, group_by=None, p_group_by=None, n_threads=8, threshold=16, seed=None, nasc=False, model=None)	Estimate the fraction of labeled RNA.

Attributes

_model

dynast.estimation.pi.read_pi(pi_path, group_by=None)

Read pi CSV as a dictionary.

Parameters

pi_path (str) – path to CSV containing pi values
group_by (list, optional) – columns that were used to group estimation, defaults to None

Returns

dictionary with barcodes and genes as keys

Return type

dictionary

dynast.estimation.pi._model

dynast.estimation.pi.initializer(model)

Multiprocessing initializer. https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor

This initializer performs a one-time expensive initialization for each process.

dynast.estimation.pi.beta_mean(alpha, beta)

Calculate the mean of a beta distribution. https://en.wikipedia.org/wiki/Beta_distribution

Parameters

alpha (float) – first parameter of the beta distribution
beta (float) – second parameter of the beta distribution

Returns

mean of the beta distribution

Return type

float

dynast.estimation.pi.beta_mode(alpha, beta)

Calculate the mode of a beta distribution. https://en.wikipedia.org/wiki/Beta_distribution

When the distribution is bimodal (alpha, beta < 1), this function returns nan.

Parameters

alpha (float) – first parameter of the beta distribution
beta (float) – second parameter of the beta distribution

Returns

mode of the beta distribution

Return type

float

dynast.estimation.pi.guess_beta_parameters(guess, strength=5)

Given a guess of the mean of a beta distribution, calculate beta distribution parameters such that the distribution is skewed by some strength toward the guess.

Parameters

guess (float) – guess of the mean of the beta distribution
strength (int) – strength of the skew, defaults to 5

Returns

beta distribution parameters (alpha, beta)

Return type

(float, float)

dynast.estimation.pi.fit_stan_mcmc(values, p_e, p_c, guess=0.5, model=None, n_chains=1, n_warmup=1000, n_iters=1000, seed=None)

Run MCMC to estimate the fraction of labeled RNA.

Parameters

values (numpy.ndarray) –
array of three columns encoding a sparse array in (row, column, value) format, zero-indexed, where

row: number of conversions column: nucleotide content value: number of reads
p_e (float) – average mutation rate in unlabeled RNA
p_c (float) – average mutation rate in labeled RNA
guess (float, optional) – guess for the fraction of labeled RNA, defaults to 0.5
model (pystan.StanModel, optional) – pyStan model to run MCMC with, defaults to None if not provided, will try to use the _model global variable
n_chains (int, optional) – number of MCMC chains, defaults to 1
n_warmup (int, optional) – number of warmup iterations, defaults to 1000
n_iters (int, optional) – number of MCMC iterations, excluding any warmups, defaults to 1000
seed (int, optional) – random seed used for MCMC, defaults to None

Returns

(guess, alpha, beta, pi)

Return type

(float, float, float, float)

dynast.estimation.pi.estimate_pi(df_aggregates, p_e, p_c, pi_path, group_by=None, p_group_by=None, n_threads=8, threshold=16, seed=None, nasc=False, model=None)

Estimate the fraction of labeled RNA.

Parameters

df_aggregates (pandas.DataFrame) – Pandas dataframe containing aggregate values
p_e (float) – average mutation rate in unlabeled RNA
p_c (float) – average mutation rate in labeled RNA
pi_path (str) – path to write pi estimates
group_by (list, optional) – columns that were used to group cells, defaults to None
p_group_by (list, optional) – columns that p_e/p_c estimation was grouped by, defaults to None
n_threads (int, optional) – number of threads, defaults to 8
threshold (int, optional) – any conversion-content pairs with fewer than this many reads will not be processed, defaults to 16
seed (int, optional) – random seed, defaults to None
nasc (bool, optional) – flag to change behavior to match NASC-seq pipeline. Specifically, the mode of the estimated Beta distribution is used as pi, defaults to False
model (pystan.StanModel, optional) – pyStan model to run MCMC with, defaults to None if not provided, will try to compile the module manually

Returns

path to pi output

Return type

str

dynast.estimation.pi

Module Contents

Functions

Attributes

`dynast.estimation.pi`