dynast.estimation

Submodules

Package Contents

Functions

estimate_p_c(df_aggregates, p_e, p_c_path, group_by=None, threshold=1000, n_threads=8, nasc=False)

Estimate the average conversion rate in labeled RNA.

read_p_c(p_c_path, group_by=None)

Read p_c CSV as a dictionary, with group_by columns as keys.

estimate_p_e(df_counts, p_e_path, conversions=frozenset([('TC', )]), group_by=None)

Estimate background mutation rate of unabeled RNA by calculating the

estimate_p_e_control(df_counts, p_e_path, conversions=frozenset([('TC', )]))

Estimate background mutation rate of unlabeled RNA for a control sample

estimate_p_e_nasc(df_rates, p_e_path, group_by=None)

Estimate background mutation rate of unabeled RNA by calculating the

read_p_e(p_e_path, group_by=None)

Read p_e CSV as a dictionary, with group_by columns as keys.

estimate_pi(df_aggregates, p_e, p_c, pi_path, group_by=None, p_group_by=None, n_threads=8, threshold=16, seed=None, nasc=False, model=None)

Estimate the fraction of labeled RNA.

read_pi(pi_path, group_by=None)

Read pi CSV as a dictionary.

dynast.estimation.estimate_p_c(df_aggregates, p_e, p_c_path, group_by=None, threshold=1000, n_threads=8, nasc=False)

Estimate the average conversion rate in labeled RNA.

Parameters
  • df_aggregates (pandas.DataFrame) – Pandas dataframe containing aggregate values

  • p_e (float) – background mutation rate of unlabeled RNA

  • p_c_path (str) – path to output CSV containing p_c estimates

  • group_by (list, optional) – columns to group by, defaults to None

  • threshold (int, optional) – read count threshold, defaults to 1000

  • n_threads (int, optional) – number of threads, defaults to 8

  • nasc (bool, optional) – flag to indicate whether to use NASC-seq pipeline variant of the EM algorithm, defaults to False

Returns

path to output CSV containing p_c estimates

Return type

str

dynast.estimation.read_p_c(p_c_path, group_by=None)

Read p_c CSV as a dictionary, with group_by columns as keys.

Parameters
  • p_c_path (str) – path to CSV containing p_c values

  • group_by (list, optional) – columns to group by, defaults to None

Returns

dictionary with group_by columns as keys (tuple if multiple)

Return type

dictionary

dynast.estimation.estimate_p_e(df_counts, p_e_path, conversions=frozenset([('TC',)]), group_by=None)

Estimate background mutation rate of unabeled RNA by calculating the average mutation rate of all three nucleotides other than conversion[0].

Parameters
  • df_counts (pandas.DataFrame) – Pandas dataframe containing number of each conversion and nucleotide content of each read

  • p_e_path (str) – path to output CSV containing p_e estimates

  • conversions (list, optional) – conversion(s) in question, defaults to frozenset([(‘TC’,)])

  • group_by (list, optional) – columns to group by, defaults to None

Returns

path to output CSV containing p_e estimates

Return type

str

dynast.estimation.estimate_p_e_control(df_counts, p_e_path, conversions=frozenset([('TC',)]))

Estimate background mutation rate of unlabeled RNA for a control sample by simply calculating the average mutation rate.

Parameters
  • df_counts (pandas.DataFrame) – Pandas dataframe containing number of each conversion and nucleotide content of each read

  • p_e_path (str) – path to output CSV containing p_e estimates

  • conversions (list, optional) – conversion(s) in question, defaults to frozenset([(‘TC’,)])

Returns

path to output CSV containing p_e estimates

Return type

str

dynast.estimation.estimate_p_e_nasc(df_rates, p_e_path, group_by=None)

Estimate background mutation rate of unabeled RNA by calculating the average CT and GA mutation rates. This function imitates the procedure implemented in the NASC-seq pipeline (DOI: 10.1038/s41467-019-11028-9).

Parameters
  • df_counts (pandas.DataFrame) – Pandas dataframe containing number of each conversion and nucleotide content of each read

  • p_e_path (str) – path to output CSV containing p_e estimates

  • group_by (list, optional) – columns to group by, defaults to None

Returns

path to output CSV containing p_e estimates

Return type

str

dynast.estimation.read_p_e(p_e_path, group_by=None)

Read p_e CSV as a dictionary, with group_by columns as keys.

Parameters
  • p_e_path (str) – path to CSV containing p_e values

  • group_by (list, optional) – columns to group by, defaults to None

Returns

dictionary with group_by columns as keys (tuple if multiple)

Return type

dictionary

dynast.estimation.estimate_pi(df_aggregates, p_e, p_c, pi_path, group_by=None, p_group_by=None, n_threads=8, threshold=16, seed=None, nasc=False, model=None)

Estimate the fraction of labeled RNA.

Parameters
  • df_aggregates (pandas.DataFrame) – Pandas dataframe containing aggregate values

  • p_e (float) – average mutation rate in unlabeled RNA

  • p_c (float) – average mutation rate in labeled RNA

  • pi_path (str) – path to write pi estimates

  • group_by (list, optional) – columns that were used to group cells, defaults to None

  • p_group_by (list, optional) – columns that p_e/p_c estimation was grouped by, defaults to None

  • n_threads (int, optional) – number of threads, defaults to 8

  • threshold (int, optional) – any conversion-content pairs with fewer than this many reads will not be processed, defaults to 16

  • seed (int, optional) – random seed, defaults to None

  • nasc (bool, optional) – flag to change behavior to match NASC-seq pipeline. Specifically, the mode of the estimated Beta distribution is used as pi, defaults to False

  • model (pystan.StanModel, optional) – pyStan model to run MCMC with, defaults to None if not provided, will try to compile the module manually

Returns

path to pi output

Return type

str

dynast.estimation.read_pi(pi_path, group_by=None)

Read pi CSV as a dictionary.

Parameters
  • pi_path (str) – path to CSV containing pi values

  • group_by (list, optional) – columns that were used to group estimation, defaults to None

Returns

dictionary with barcodes and genes as keys

Return type

dictionary