dynast.preprocessing.aggregation

Module Contents

Functions

read_rates(rates_path)

Read mutation rates CSV as a pandas dataframe.

read_aggregates(aggregates_path)

Read aggregates CSV as a pandas dataframe.

merge_aggregates(*dfs)

Merge multiple aggregate dataframes into one.

calculate_mutation_rates(df_counts, rates_path, group_by=None)

Calculate mutation rate for each pair of bases.

aggregate_counts(df_counts, aggregates_path, conversions=frozenset([('TC', )]))

Aggregate conversion counts for each pair of bases.

dynast.preprocessing.aggregation.read_rates(rates_path)

Read mutation rates CSV as a pandas dataframe.

Parameters

rates_path (str) – path to rates CSV

Returns

rates dataframe

Return type

pandas.DataFrame

dynast.preprocessing.aggregation.read_aggregates(aggregates_path)

Read aggregates CSV as a pandas dataframe.

Parameters

aggregates_path (str) – path to aggregates CSV

Returns

aggregates dataframe

Return type

pandas.DataFrame

dynast.preprocessing.aggregation.merge_aggregates(*dfs)

Merge multiple aggregate dataframes into one.

Parameters

*dfs

dataframes to merge

Returns

merged dataframe

Return type

pandas.DataFrame

dynast.preprocessing.aggregation.calculate_mutation_rates(df_counts, rates_path, group_by=None)

Calculate mutation rate for each pair of bases.

Parameters
  • df_counts (pandas.DataFrame) – counts dataframe, with complemented reverse strand bases

  • rates_path (str) – path to write rates CSV

  • group_by (list) – column(s) to group calculations by, defaults to None, which combines all rows

Returns

path to rates CSV

Return type

str

dynast.preprocessing.aggregation.aggregate_counts(df_counts, aggregates_path, conversions=frozenset([('TC',)]))

Aggregate conversion counts for each pair of bases.

Parameters
  • df_counts (pandas.DataFrame) – counts dataframe, with complemented reverse strand bases

  • aggregates_path (str) – path to write aggregate CSV

  • conversions (list, optional) – conversion(s) in question, defaults to frozenset([(‘TC’,)])

Returns

path to aggregate CSV that was written

Return type

str