dynast.utils
Module Contents
Classes
A context manager for doing a "deep suppression" of stdout and stderr in |
Functions
Get the path to the platform-dependent STAR binary included with |
|
Get the provided STAR version. |
|
|
Combine two dictionaries representing command-line arguments. |
|
Convert a dictionary of command-line arguments to a list. |
Get the current value for the maximum number of open file descriptors |
|
Get the maximum allowed value for the maximum number of open file |
|
Context manager that can be used to temporarily increase the maximum |
|
Get total amount of available memory (total memory - used memory) in bytes. |
|
|
Create a new Process pool with a shared progress counter. |
|
Display progress bar for displaying multiprocessing progress. |
|
Wrapper around concurrent.futures.as_completed that displays a progress bar. |
|
Split a conversions index, which is a list of tuples (file position, |
|
Downsample the given counts dataframe according to the |
|
Convert a counts dataframe to a sparse counts matrix. |
|
Split counts dataframe into two count matrices by a column. |
|
Split the given matrix based on provided fraction of new RNA. |
|
Compile all results to a single anndata. |
Apply PR-10305 / bpo-17560 connection send/receive max size update |
Attributes
- dynast.utils.run_executable
- dynast.utils.open_as_text
- dynast.utils.decompress_gzip
- dynast.utils.flatten_dict_values
- dynast.utils.mkstemp
- dynast.utils.all_exists
- dynast.utils.flatten_dictionary
- dynast.utils.flatten_iter
- dynast.utils.merge_dictionaries
- dynast.utils.write_pickle
- dynast.utils.read_pickle
- exception dynast.utils.UnsupportedOSException
Bases:
Exception
Common base class for all non-exit exceptions.
- class dynast.utils.suppress_stdout_stderr
A context manager for doing a “deep suppression” of stdout and stderr in Python, i.e. will suppress all print, even if the print originates in a compiled C/Fortran sub-function.
This will not suppress raised exceptions, since exceptions are printed
to stderr just before a script exits, and after the context manager has exited (at least, I think that is why it lets exceptions through). https://github.com/facebook/prophet/issues/223
- __enter__(self)
- __exit__(self, *_)
- dynast.utils.get_STAR_binary_path()
Get the path to the platform-dependent STAR binary included with the installation.
- Returns
path to the binary
- Return type
str
- dynast.utils.get_STAR_version()
Get the provided STAR version.
- Returns
version string
- Return type
str
- dynast.utils.combine_arguments(args, additional)
Combine two dictionaries representing command-line arguments.
Any duplicate keys will be merged according to the following procedure: 1. If the value in both dictionaries are lists, the two lists are combined. 2. Otherwise, the value in the first dictionary is OVERWRITTEN.
- Parameters
args (dictionary) – original command-line arguments
additional (dictionary) – additional command-line arguments
- Returns
combined command-line arguments
- Return type
dictionary
- dynast.utils.arguments_to_list(args)
Convert a dictionary of command-line arguments to a list.
- Parameters
args (dictionary) – command-line arguments
- Returns
list of command-line arguments
- Return type
list
- dynast.utils.get_file_descriptor_limit()
Get the current value for the maximum number of open file descriptors in a platform-dependent way.
- Returns
the current value of the maximum number of open file descriptors.
- Return type
int
- dynast.utils.get_max_file_descriptor_limit()
Get the maximum allowed value for the maximum number of open file descriptors.
Note that for Windows, there is not an easy way to get this, as it requires reading from the registry. So, we just return the maximum for a vanilla Windows installation, which is 8192. https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmaxstdio?view=vs-2019
Similarly, on MacOS, we return a hardcoded 10240.
- Returns
maximum allowed value for the maximum number of open file descriptors
- Return type
int
- dynast.utils.increase_file_descriptor_limit(limit)
Context manager that can be used to temporarily increase the maximum number of open file descriptors for the current process. The original value is restored when execution exits this function.
This is required when running STAR with many threads.
- Parameters
limit (int) – maximum number of open file descriptors will be increased to this value for the duration of the context
- dynast.utils.get_available_memory()
Get total amount of available memory (total memory - used memory) in bytes.
- Returns
available memory in bytes
- Return type
int
- dynast.utils.make_pool_with_counter(n_threads)
Create a new Process pool with a shared progress counter.
- Parameters
n_threads (int) – number of processes
- Returns
(Process pool, progress counter, lock)
- Return type
(multiprocessing.Pool, multiprocessing.Value, multiprocessing.Lock)
- dynast.utils.display_progress_with_counter(counter, total, *async_results, desc=None)
Display progress bar for displaying multiprocessing progress.
- Parameters
counter (multiprocessing.Value) – progress counter
total (int) – maximum number of units of processing
*async_results –
multiprocessing results to monitor. These are used to determine when all processes are done.
desc (str, optional) – progress bar description, defaults to None
- dynast.utils.as_completed_with_progress(futures)
Wrapper around concurrent.futures.as_completed that displays a progress bar.
- Parameters
futures (iterable) – iterator of concurrent.futures.Future objects
- dynast.utils.split_index(index, n=8)
Split a conversions index, which is a list of tuples (file position, number of lines, alignment position), one for each read, into n approximately equal parts. This function is used to split the conversions CSV for multiprocessing.
- Parameters
index (list) – index
n (int, optional) – number of splits, defaults to 8
- Returns
list of parts, where each part is a list of (file position, number of lines, alignment position) tuples
- Return type
list
- dynast.utils.downsample_counts(df_counts, proportion=None, count=None, seed=None, group_by=None)
Downsample the given counts dataframe according to the
proportion
orcount
arguments. One of these two must be provided, but not both. The dataframe is assumed to be UMI-deduplicated.- Parameters
df_counts (pandas.DataFrame) – counts dataframe
proportion (float, optional) – proportion of reads (UMIs) to keep, defaults to None
count (int, optional) – absolute number of reads (UMIs) to keep, defaults to None
seed (int, optional) – random seed, defaults to None
group_by (list, optional) – Columns in the counts dataframe to use to group entries. When this is provided, UMIs are no longer sampled at random, but instead grouped by this argument, and only groups that have more than
count
UMIs are downsampled.
- Returns
downsampled counts dataframe
- Return type
pandas.DataFrame
- dynast.utils.counts_to_matrix(df_counts, barcodes, features, barcode_column='barcode', feature_column='GX')
Convert a counts dataframe to a sparse counts matrix.
Counts are assumed to be appropriately deduplicated.
- Parameters
df_counts (pandas.DataFrame) – counts dataframe
barcodes (list) – list of barcodes that will map to the rows
features (list) – list of features (i.e. genes) that will map to the columns
barcode_column (str) – column in counts dataframe to use as barcodes, defaults to barcode
feature_column (str) – column in counts dataframe to use as features, defaults to GX
- Returns
sparse counts matrix
- Return type
scipy.sparse.csrmatrix
- dynast.utils.split_counts(df_counts, barcodes, features, barcode_column='barcode', feature_column='GX', conversions=('TC',))
Split counts dataframe into two count matrices by a column.
- Parameters
df_counts (pandas.DataFrame) – counts dataframe
barcodes (list) – list of barcodes that will map to the rows
features (list) – list of features (i.e. genes) that will map to the columns
barcode_column (str, optional) – column in counts dataframe to use as barcodes, defaults to barcode
feature_column (str, optional) – column in counts dataframe to use as features, defaults to GX
conversions (tuple, optional) – conversion(s) in question, defaults to (‘TC’,)
- Returns
(count matrix of conversion==0, count matrix of conversion>0)
- Return type
(scipy.sparse.csrmatrix, scipy.sparse.csrmatrix)
- dynast.utils.split_matrix(matrix, pis, barcodes, features)
Split the given matrix based on provided fraction of new RNA.
- Parameters
matrix (numpy.ndarray or scipy.sparse.spmatrix) – matrix to split
pis (dictionary) – dictionary containing pi estimates
barcodes (list) – all barcodes
features (list) – all features (i.e. genes)
- Returns
(matrix of pi masks, matrix of unlabeled RNA, matrix of labeled RNA)
- Return type
(scipy.sparse.spmatrix, scipy.sparse.spmatrix, scipy.sparse.spmatrix)
- dynast.utils.results_to_adata(df_counts, conversions=frozenset([('TC',)]), gene_infos=None, pis=None)
Compile all results to a single anndata.
- Parameters
df_counts (pandas.DataFrame) – counts dataframe, with complemented reverse strand bases
conversions (list, optional) – conversion(s) in question, defaults to frozenset([(‘TC’,)])
gene_infos (dict, optional) – dictionary containing gene information, defaults to None
pis (dict, optional) – dictionary of estimated pis, defaults to None
- Returns
anndata containing all results
- Return type
anndata.AnnData
- dynast.utils.patch_mp_connection_bpo_17560()
Apply PR-10305 / bpo-17560 connection send/receive max size update
See the original issue at https://bugs.python.org/issue17560 and https://github.com/python/cpython/pull/10305 for the pull request.
This only supports Python versions 3.3 - 3.7, this function does nothing for Python versions outside of that range.
Taken from https://stackoverflow.com/a/47776649