dynast.preprocessing.coverage

Module Contents

Functions

read_coverage(coverage_path)

Read coverage CSV as a dictionary.

calculate_coverage_contig(counter, lock, bam_path, contig, indices, alignments=None, umi_tag=None, barcode_tag=None, gene_tag='GX', barcodes=None, temp_dir=None, update_every=50000, velocity=True)

Calculate converage for a specific contig. This function is designed to

calculate_coverage(bam_path, conversions, coverage_path, alignments=None, umi_tag=None, barcode_tag=None, gene_tag='GX', barcodes=None, temp_dir=None, velocity=True)

Calculate coverage of each genomic position per barcode.

Attributes

COVERAGE_PARSER

dynast.preprocessing.coverage.COVERAGE_PARSER
dynast.preprocessing.coverage.read_coverage(coverage_path)

Read coverage CSV as a dictionary.

Parameters

coverage_path (str) – path to coverage CSV

Returns

coverage as a nested dictionary

Return type

dict

dynast.preprocessing.coverage.calculate_coverage_contig(counter, lock, bam_path, contig, indices, alignments=None, umi_tag=None, barcode_tag=None, gene_tag='GX', barcodes=None, temp_dir=None, update_every=50000, velocity=True)

Calculate converage for a specific contig. This function is designed to be called as a separate process.

Parameters
  • counter (multiprocessing.Value) – counter that keeps track of how many reads have been processed

  • lock (multiprocessing.Lock) – semaphore for the counter so that multiple processes do not modify it at the same time

  • bam_path (str) – path to alignment BAM file

  • contig (str) – only reads that map to this contig will be processed

  • indices (list) – genomic positions to consider

  • alignments (set, optional) – set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.

  • umi_tag (str, optional) – BAM tag that encodes UMI, if not provided, NA is output in the umi column, defaults to None

  • barcode_tag (str, optional) – BAM tag that encodes cell barcode, if not provided, NA is output in the barcode column, defaults to None

  • gene_tag (str, optional) – BAM tag that encodes gene assignment, defaults to GX

  • barcodes (list, optional) – list of barcodes to be considered. All barcodes are considered if not provided, defaults to None

  • temp_dir (str, optional) – path to temporary directory, defaults to None

  • update_every (int, optional) – update the counter every this many reads, defaults to 30000

  • velocity (bool, optional) – whether or not velocities were assigned

Returns

coverag

Return type

dict

dynast.preprocessing.coverage.calculate_coverage(bam_path, conversions, coverage_path, alignments=None, umi_tag=None, barcode_tag=None, gene_tag='GX', barcodes=None, temp_dir=None, velocity=True)

Calculate coverage of each genomic position per barcode.

Parameters
  • bam_path (str) – path to alignment BAM file

  • conversions (dictionary) – dictionary of contigs as keys and sets of genomic positions as values that indicates positions where conversions were observed

  • coverage_path (str) – path to write coverage CSV

  • alignments (set, optional) – set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.

  • umi_tag (str, optional) – BAM tag that encodes UMI, if not provided, NA is output in the umi column, defaults to None

  • barcode_tag (str, optional) – BAM tag that encodes cell barcode, if not provided, NA is output in the barcode column, defaults to None

  • gene_tag (str, optional) – BAM tag that encodes gene assignment, defaults to GX

  • barcodes (list, optional) – list of barcodes to be considered. All barcodes are considered if not provided, defaults to None

  • temp_dir (str, optional) – path to temporary directory, defaults to None

  • velocity (bool, optional) – whether or not velocities were assigned

Returns

coverage CSV path

Return type

str