dynast.utils

Module Contents

Classes

suppress_stdout_stderr

A context manager for doing a "deep suppression" of stdout and stderr in

Functions

get_STAR_binary_path()

Get the path to the platform-dependent STAR binary included with

get_STAR_version()

Get the provided STAR version.

combine_arguments(args, additional)

Combine two dictionaries representing command-line arguments.

arguments_to_list(args)

Convert a dictionary of command-line arguments to a list.

get_file_descriptor_limit()

Get the current value for the maximum number of open file descriptors

get_max_file_descriptor_limit()

Get the maximum allowed value for the maximum number of open file

increase_file_descriptor_limit(limit)

Context manager that can be used to temporarily increase the maximum

get_available_memory()

Get total amount of available memory (total memory - used memory) in bytes.

make_pool_with_counter(n_threads)

Create a new Process pool with a shared progress counter.

display_progress_with_counter(counter, total, *async_results, desc=None)

Display progress bar for displaying multiprocessing progress.

as_completed_with_progress(futures)

Wrapper around concurrent.futures.as_completed that displays a progress bar.

split_index(index, n=8)

Split a conversions index, which is a list of tuples (file position,

downsample_counts(df_counts, proportion=None, count=None, seed=None, group_by=None)

Downsample the given counts dataframe according to the proportion or

counts_to_matrix(df_counts, barcodes, features, barcode_column='barcode', feature_column='GX')

Convert a counts dataframe to a sparse counts matrix.

split_counts(df_counts, barcodes, features, barcode_column='barcode', feature_column='GX', conversions=('TC', ))

Split counts dataframe into two count matrices by a column.

split_matrix(matrix, pis, barcodes, features)

Split the given matrix based on provided fraction of new RNA.

results_to_adata(df_counts, conversions=frozenset([('TC', )]), gene_infos=None, pis=None)

Compile all results to a single anndata.

patch_mp_connection_bpo_17560()

Apply PR-10305 / bpo-17560 connection send/receive max size update

Attributes

run_executable

open_as_text

decompress_gzip

flatten_dict_values

mkstemp

all_exists

flatten_dictionary

flatten_iter

merge_dictionaries

write_pickle

read_pickle

dynast.utils.run_executable
dynast.utils.open_as_text
dynast.utils.decompress_gzip
dynast.utils.flatten_dict_values
dynast.utils.mkstemp
dynast.utils.all_exists
dynast.utils.flatten_dictionary
dynast.utils.flatten_iter
dynast.utils.merge_dictionaries
dynast.utils.write_pickle
dynast.utils.read_pickle
exception dynast.utils.UnsupportedOSException

Bases: Exception

Common base class for all non-exit exceptions.

class dynast.utils.suppress_stdout_stderr

A context manager for doing a “deep suppression” of stdout and stderr in Python, i.e. will suppress all print, even if the print originates in a compiled C/Fortran sub-function.

This will not suppress raised exceptions, since exceptions are printed

to stderr just before a script exits, and after the context manager has exited (at least, I think that is why it lets exceptions through). https://github.com/facebook/prophet/issues/223

__enter__(self)
__exit__(self, *_)
dynast.utils.get_STAR_binary_path()

Get the path to the platform-dependent STAR binary included with the installation.

Returns

path to the binary

Return type

str

dynast.utils.get_STAR_version()

Get the provided STAR version.

Returns

version string

Return type

str

dynast.utils.combine_arguments(args, additional)

Combine two dictionaries representing command-line arguments.

Any duplicate keys will be merged according to the following procedure: 1. If the value in both dictionaries are lists, the two lists are combined. 2. Otherwise, the value in the first dictionary is OVERWRITTEN.

Parameters
  • args (dictionary) – original command-line arguments

  • additional (dictionary) – additional command-line arguments

Returns

combined command-line arguments

Return type

dictionary

dynast.utils.arguments_to_list(args)

Convert a dictionary of command-line arguments to a list.

Parameters

args (dictionary) – command-line arguments

Returns

list of command-line arguments

Return type

list

dynast.utils.get_file_descriptor_limit()

Get the current value for the maximum number of open file descriptors in a platform-dependent way.

Returns

the current value of the maximum number of open file descriptors.

Return type

int

dynast.utils.get_max_file_descriptor_limit()

Get the maximum allowed value for the maximum number of open file descriptors.

Note that for Windows, there is not an easy way to get this, as it requires reading from the registry. So, we just return the maximum for a vanilla Windows installation, which is 8192. https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmaxstdio?view=vs-2019

Similarly, on MacOS, we return a hardcoded 10240.

Returns

maximum allowed value for the maximum number of open file descriptors

Return type

int

dynast.utils.increase_file_descriptor_limit(limit)

Context manager that can be used to temporarily increase the maximum number of open file descriptors for the current process. The original value is restored when execution exits this function.

This is required when running STAR with many threads.

Parameters

limit (int) – maximum number of open file descriptors will be increased to this value for the duration of the context

dynast.utils.get_available_memory()

Get total amount of available memory (total memory - used memory) in bytes.

Returns

available memory in bytes

Return type

int

dynast.utils.make_pool_with_counter(n_threads)

Create a new Process pool with a shared progress counter.

Parameters

n_threads (int) – number of processes

Returns

(Process pool, progress counter, lock)

Return type

(multiprocessing.Pool, multiprocessing.Value, multiprocessing.Lock)

dynast.utils.display_progress_with_counter(counter, total, *async_results, desc=None)

Display progress bar for displaying multiprocessing progress.

Parameters
  • counter (multiprocessing.Value) – progress counter

  • total (int) – maximum number of units of processing

  • *async_results

    multiprocessing results to monitor. These are used to determine when all processes are done.

  • desc (str, optional) – progress bar description, defaults to None

dynast.utils.as_completed_with_progress(futures)

Wrapper around concurrent.futures.as_completed that displays a progress bar.

Parameters

futures (iterable) – iterator of concurrent.futures.Future objects

dynast.utils.split_index(index, n=8)

Split a conversions index, which is a list of tuples (file position, number of lines, alignment position), one for each read, into n approximately equal parts. This function is used to split the conversions CSV for multiprocessing.

Parameters
  • index (list) – index

  • n (int, optional) – number of splits, defaults to 8

Returns

list of parts, where each part is a list of (file position, number of lines, alignment position) tuples

Return type

list

dynast.utils.downsample_counts(df_counts, proportion=None, count=None, seed=None, group_by=None)

Downsample the given counts dataframe according to the proportion or count arguments. One of these two must be provided, but not both. The dataframe is assumed to be UMI-deduplicated.

Parameters
  • df_counts (pandas.DataFrame) – counts dataframe

  • proportion (float, optional) – proportion of reads (UMIs) to keep, defaults to None

  • count (int, optional) – absolute number of reads (UMIs) to keep, defaults to None

  • seed (int, optional) – random seed, defaults to None

  • group_by (list, optional) – Columns in the counts dataframe to use to group entries. When this is provided, UMIs are no longer sampled at random, but instead grouped by this argument, and only groups that have more than count UMIs are downsampled.

Returns

downsampled counts dataframe

Return type

pandas.DataFrame

dynast.utils.counts_to_matrix(df_counts, barcodes, features, barcode_column='barcode', feature_column='GX')

Convert a counts dataframe to a sparse counts matrix.

Counts are assumed to be appropriately deduplicated.

Parameters
  • df_counts (pandas.DataFrame) – counts dataframe

  • barcodes (list) – list of barcodes that will map to the rows

  • features (list) – list of features (i.e. genes) that will map to the columns

  • barcode_column (str) – column in counts dataframe to use as barcodes, defaults to barcode

  • feature_column (str) – column in counts dataframe to use as features, defaults to GX

Returns

sparse counts matrix

Return type

scipy.sparse.csrmatrix

dynast.utils.split_counts(df_counts, barcodes, features, barcode_column='barcode', feature_column='GX', conversions=('TC',))

Split counts dataframe into two count matrices by a column.

Parameters
  • df_counts (pandas.DataFrame) – counts dataframe

  • barcodes (list) – list of barcodes that will map to the rows

  • features (list) – list of features (i.e. genes) that will map to the columns

  • barcode_column (str, optional) – column in counts dataframe to use as barcodes, defaults to barcode

  • feature_column (str, optional) – column in counts dataframe to use as features, defaults to GX

  • conversions (tuple, optional) – conversion(s) in question, defaults to (‘TC’,)

Returns

(count matrix of conversion==0, count matrix of conversion>0)

Return type

(scipy.sparse.csrmatrix, scipy.sparse.csrmatrix)

dynast.utils.split_matrix(matrix, pis, barcodes, features)

Split the given matrix based on provided fraction of new RNA.

Parameters
  • matrix (numpy.ndarray or scipy.sparse.spmatrix) – matrix to split

  • pis (dictionary) – dictionary containing pi estimates

  • barcodes (list) – all barcodes

  • features (list) – all features (i.e. genes)

Returns

(matrix of pi masks, matrix of unlabeled RNA, matrix of labeled RNA)

Return type

(scipy.sparse.spmatrix, scipy.sparse.spmatrix, scipy.sparse.spmatrix)

dynast.utils.results_to_adata(df_counts, conversions=frozenset([('TC',)]), gene_infos=None, pis=None)

Compile all results to a single anndata.

Parameters
  • df_counts (pandas.DataFrame) – counts dataframe, with complemented reverse strand bases

  • conversions (list, optional) – conversion(s) in question, defaults to frozenset([(‘TC’,)])

  • gene_infos (dict, optional) – dictionary containing gene information, defaults to None

  • pis (dict, optional) – dictionary of estimated pis, defaults to None

Returns

anndata containing all results

Return type

anndata.AnnData

dynast.utils.patch_mp_connection_bpo_17560()

Apply PR-10305 / bpo-17560 connection send/receive max size update

See the original issue at https://bugs.python.org/issue17560 and https://github.com/python/cpython/pull/10305 for the pull request.

This only supports Python versions 3.3 - 3.7, this function does nothing for Python versions outside of that range.

Taken from https://stackoverflow.com/a/47776649