`dynast.preprocessing.snp`

Module Contents

Functions

`read_snps`(snps_path)	Read SNPs CSV as a dictionary
`read_snp_csv`(snp_csv)	Read a user-provided SNPs CSV
`extract_conversions_part`(conversions_path, counter, lock, index, alignments=None, conversions=None, quality=27, update_every=5000)	Extract number of conversions for every genomic position.
`extract_conversions`(conversions_path, index_path, alignments=None, conversions=None, quality=27, n_threads=8)	Wrapper around extract_conversions_part that works in parallel
`detect_snps`(conversions_path, index_path, coverage, snps_path, alignments=None, conversions=None, quality=27, threshold=0.5, min_coverage=1, n_threads=8)	Detect SNPs.

Attributes

SNP_COLUMNS

dynast.preprocessing.snp.SNP_COLUMNS = ['contig', 'genome_i', 'conversion']

dynast.preprocessing.snp.read_snps(snps_path)

Read SNPs CSV as a dictionary

Parameters: snps_path (str) – path to SNPs CSV
Returns: dictionary of contigs as keys and sets of genomic positions with SNPs as values
Return type: dictionary

dynast.preprocessing.snp.read_snp_csv(snp_csv)

Read a user-provided SNPs CSV

Parameters: snp_csv (str) – path to SNPs CSV
Returns: dictionary of contigs as keys and sets of genomic positions with SNPs as values
Return type: dictionary

dynast.preprocessing.snp.extract_conversions_part(conversions_path, counter, lock, index, alignments=None, conversions=None, quality=27, update_every=5000)

Extract number of conversions for every genomic position.

Parameters

conversions_path (str) – path to conversions CSV
counter (multiprocessing.Value) – counter that keeps track of how many reads have been processed
lock (multiprocessing.Lock) – semaphore for the counter so that multiple processes do not modify it at the same time
index (list) – list of (file position, number of lines) tuples to process
alignments (set, optional) – set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.
conversions (set, optional) – set of conversions to consider
quality (int, optional) – only count conversions with PHRED quality greater than this value, defaults to 27
update_every (int, optional) – update the counter every this many reads, defaults to 5000

Returns

nested dictionary that contains number of conversions for each contig and position

Return type

dictionary

dynast.preprocessing.snp.extract_conversions(conversions_path, index_path, alignments=None, conversions=None, quality=27, n_threads=8)

Wrapper around extract_conversions_part that works in parallel

Parameters

conversions_path (str) – path to conversions CSV
index_path (str) – path to conversions index
alignments (set, optional) – set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.
conversions (set, optional) – set of conversions to consider
quality (int, optional) – only count conversions with PHRED quality greater than this value, defaults to 27
n_threads (int, optional) – number of threads, defaults to 8

Returns

nested dictionary that contains number of conversions for each contig and position

Return type

dictionary

dynast.preprocessing.snp.detect_snps(conversions_path, index_path, coverage, snps_path, alignments=None, conversions=None, quality=27, threshold=0.5, min_coverage=1, n_threads=8)

Detect SNPs.

Parameters

conversions_path (str) – path to conversions CSV
index_path (str) – path to conversions index
coverage (dict) – dictionary containing genomic coverage
snps_path (str) – path to output SNPs
alignments (set, optional) – set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.
conversions (set, optional) – set of conversions to consider
quality (int, optional) – only count conversions with PHRED quality greater than this value, defaults to 27
threshold (float, optional) – positions with conversions / coverage > threshold will be considered as SNPs, defaults to 0.5
min_coverage (int, optional) – only positions with at least this many mapping read_snps are considered, defaults to 1
n_threads (int, optional) – number of threads, defaults to 8

dynast.preprocessing.snp

Module Contents

Functions

Attributes

`dynast.preprocessing.snp`