seismicrna.demult package

Submodules

class seismicrna.demult.demultiplex.Sequence_Obj(sequence, name, fastq1_path, fastq2_path, workspace, paired=True, fwd_primer='', rev_primer='', secondary_signature='', secondary_signature_start='', secondary_signature_end='', rev_secondary_signature='', rev_secondary_signature_start='', rev_secondary_signature_end='', barcode_start=-1, barcode_end=-1, barcode='', rev_barcode='', rev_barcode_start='', rev_barcode_end=''): Bases: object

seismicrna.demult.demultiplex.append_files(files, new_file_name)

seismicrna.demult.demultiplex.check_all_done(seq_objects: {})

seismicrna.demult.demultiplex.check_done(sequence_folder: str) → bool

seismicrna.demult.demultiplex.create_report(sequence_objects: dict, fq1: str, fq2: str, working_directory: str, unioned_sets: dict, sample_name: str)

seismicrna.demult.demultiplex.demultiplex_run(refs_file_csv, demulti_workspace, report_folder, fq_unit: FastqUnit, fasta, barcode_start=0, barcode_end=0, split: int = 10, clipped: int = 0, rev_clipped: int = 0, index_tolerance: int = 0, parallel: bool = False, mismatch_tolerence: int = 0, overwrite: bool = False, keep_tmp: bool = True)

seismicrna.demult.demultiplex.finds_multigrepped_reads(sequence_objects: dict, remove: bool = True, resolve: bool = False, print_multi_grep_dict: bool = True, demultiplex_workspace: str = None) → dict: filters reads based on weather or not they map to multiple constructs returns a dictionary mapping each read to a list of the reads it mapped to

seismicrna.demult.demultiplex.grep_both_fastq(sequence_object: Sequence_Obj, clipped: int, rev_clipped: int, index_tolerence: int, delete_fastqs: bool, mismatches_allowed: int)

seismicrna.demult.demultiplex.make_dict_from_fasta(fasta_path) → dict

seismicrna.demult.demultiplex.make_sequence_objects_from_csv(input_csv, barcode_start, barcode_end, fasta, fastq1_path, fastq2_path, paired, workspace) → dict

seismicrna.demult.demultiplex.makes_dict_from_fastq(fpath)

seismicrna.demult.demultiplex.parallel_grepping(sequence_objects: dict, fwd_clips: int, rev_clips: int, index_tolerence: int, delete_fastq: bool, paired: bool = True, mismatches: int = 0, threads=10, iteration: int = 0, overwrite: bool = True): runs grep in parallel

seismicrna.demult.demultiplex.regular_grepping(sequence_objects: dict, fwd_clips: int, rev_clips: int, index_tolerence: int, delete_fastq: bool, paired: bool = True, mismatches: int = 0, iteration: int = 0, overwrite: bool = False): runs grep in parallel

seismicrna.demult.demultiplex.resolve_or_analyze_multigrepped_reads(union_sets: dict, remove: bool = True, resolve: bool = False)

seismicrna.demult.demultiplex.reverse_compliment(sequence)

seismicrna.demult.demultiplex.run_multi_greps(read_id_dict: dict, clipped: int, index_tolerence: int, delete_fastqs: bool, mismatches_allowed: int, pattern_type: str, pattern: str, pattern_start: int, pattern_end: int, fastq: str, seq_folder: str, front: bool = False)

seismicrna.demult.demultiplex.run_seqkit_grep(sequence_object: Sequence_Obj, clipped: int, rev_clipped: int, index_tolerence: int, delete_fastqs: bool, fastq_id: int, mismatches_allowed: int)

seismicrna.demult.demultiplex.run_seqkit_grep_function(pattern: str, search_start_ind: int, search_end_index: int, fastq_to_search: str, fastq_to_write: str, threads: int = 20, mismatch_threshhold: int = 0, append_bool: bool = False, tolerance: int = 0, delete_fq: bool = False): 1 indexed?

class seismicrna.demult.demultiplex.super_fastq(fpath: str, split_count=10, fastq_name: str = None, super_dir='')

Bases: object

check_exists()

check_set()

destroy_temp_data()

fastq_to_dict(fpath)

split_fastq(delete_text_fastqs: bool, temp_delete_idSets_to_pickle_dict: bool)

super_write_fastqs(union_dict: dict, directoy_to_write_to: str, fastq_id: int, sequence_objects: dict, sample_name: str): organizes reads into sets per k, based on which pickle the read is in

seismicrna.demult.main.run_dm(fasta: str | Path, refs_meta: str = None, out_dir: str | Path = './out', fastqx: Iterable[str | Path] = (), phred_enc: int = 33, barcode_start: int = 0, barcode_end: int = 0, clipped: int = 0, index_tolerance: int = 0, parallel_demultiplexing: bool = False, mismatch_tolerence: int = 0, demulti_overwrite: bool = False, keep_tmp: bool = False, *, tmp_pfx='./tmp')

Split multiplexed FASTQ files by their barcodes.

Parameters:

refs_meta (str) – Add reference metadata from this CSV file to exported results [positional or keyword, default: None]
out_dir (str | pathlib._local.Path) – Write all output files to this directory [positional or keyword, default: ‘./out’]
fastqx (Iterable) – FASTQ files of paired-end reads with mates 1 and 2 in separate files [positional or keyword, default: ()]
phred_enc (int) – Specify the Phred score encoding of FASTQ and SAM/BAM/CRAM files [positional or keyword, default: 33]
barcode_start (int) – Index of start of barcode [positional or keyword, default: 0]
barcode_end (int) – Length of barcode [positional or keyword, default: 0]
clipped (int) – Designates the amount of clipped patterns to search for in the sample, will raise compution time [positional or keyword, default: 0]
index_tolerance (int) – Designates the allowable amount of distance you allow the pattern to be found in a read from the reference index [positional or keyword, default: 0]
parallel_demultiplexing (bool) – Whether to run demultiplexing at maximum speed by submitting multithreaded grep functions [positional or keyword, default: False]
mismatch_tolerence (int) – Designates the allowable amount of mismatches allowed in a string and still be considered a valid pattern find. will increase non-parallel computation at a factorial rate. use caution going above 2 mismatches. does not apply to clipped sequences. [positional or keyword, default: 0]
demulti_overwrite (bool) – Desiginates whether to overwrite the grepped fastq. should only be used if changing setting on the same sample [positional or keyword, default: False]
keep_tmp (bool) – Keep temporary files after finishing [positional or keyword, default: False]
tmp_pfx – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp’]