dynast.preprocessing.snp
Module Contents
Functions
|
Read SNPs CSV as a dictionary |
|
Read a user-provided SNPs CSV |
|
Extract number of conversions for every genomic position. |
|
Wrapper around extract_conversions_part that works in parallel |
|
Detect SNPs. |
Attributes
- dynast.preprocessing.snp.SNP_COLUMNS = ['contig', 'genome_i', 'conversion']
- dynast.preprocessing.snp.read_snps(snps_path)
Read SNPs CSV as a dictionary
- Parameters
snps_path (str) – path to SNPs CSV
- Returns
dictionary of contigs as keys and sets of genomic positions with SNPs as values
- Return type
dictionary
- dynast.preprocessing.snp.read_snp_csv(snp_csv)
Read a user-provided SNPs CSV
- Parameters
snp_csv (str) – path to SNPs CSV
- Returns
dictionary of contigs as keys and sets of genomic positions with SNPs as values
- Return type
dictionary
- dynast.preprocessing.snp.extract_conversions_part(conversions_path, counter, lock, index, alignments=None, conversions=None, quality=27, update_every=5000)
Extract number of conversions for every genomic position.
- Parameters
conversions_path (str) – path to conversions CSV
counter (multiprocessing.Value) – counter that keeps track of how many reads have been processed
lock (multiprocessing.Lock) – semaphore for the counter so that multiple processes do not modify it at the same time
index (list) – list of (file position, number of lines) tuples to process
alignments (set, optional) – set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.
conversions (set, optional) – set of conversions to consider
quality (int, optional) – only count conversions with PHRED quality greater than this value, defaults to 27
update_every (int, optional) – update the counter every this many reads, defaults to 5000
- Returns
nested dictionary that contains number of conversions for each contig and position
- Return type
dictionary
- dynast.preprocessing.snp.extract_conversions(conversions_path, index_path, alignments=None, conversions=None, quality=27, n_threads=8)
Wrapper around extract_conversions_part that works in parallel
- Parameters
conversions_path (str) – path to conversions CSV
index_path (str) – path to conversions index
alignments (set, optional) – set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.
conversions (set, optional) – set of conversions to consider
quality (int, optional) – only count conversions with PHRED quality greater than this value, defaults to 27
n_threads (int, optional) – number of threads, defaults to 8
- Returns
nested dictionary that contains number of conversions for each contig and position
- Return type
dictionary
- dynast.preprocessing.snp.detect_snps(conversions_path, index_path, coverage, snps_path, alignments=None, conversions=None, quality=27, threshold=0.5, min_coverage=1, n_threads=8)
Detect SNPs.
- Parameters
conversions_path (str) – path to conversions CSV
index_path (str) – path to conversions index
coverage (dict) – dictionary containing genomic coverage
snps_path (str) – path to output SNPs
alignments (set, optional) – set of (read_id, alignment_index) tuples to process. All alignments are processed if this option is not provided.
conversions (set, optional) – set of conversions to consider
quality (int, optional) – only count conversions with PHRED quality greater than this value, defaults to 27
threshold (float, optional) – positions with conversions / coverage > threshold will be considered as SNPs, defaults to 0.5
min_coverage (int, optional) – only positions with at least this many mapping read_snps are considered, defaults to 1
n_threads (int, optional) – number of threads, defaults to 8