seismicrna.demult package
Submodules
- class seismicrna.demult.demultiplex.Sequence_Obj(sequence, name, fastq1_path, fastq2_path, workspace, paired=True, fwd_primer='', rev_primer='', secondary_signature='', secondary_signature_start='', secondary_signature_end='', rev_secondary_signature='', rev_secondary_signature_start='', rev_secondary_signature_end='', barcode_start=-1, barcode_end=-1, barcode='', rev_barcode='', rev_barcode_start='', rev_barcode_end='')
Bases:
object
- seismicrna.demult.demultiplex.append_files(files, new_file_name)
- seismicrna.demult.demultiplex.check_all_done(seq_objects: {})
- seismicrna.demult.demultiplex.create_report(sequence_objects: dict, fq1: str, fq2: str, working_directory: str, unioned_sets: dict, sample_name: str)
- seismicrna.demult.demultiplex.demultiplex_run(refs_file_csv, demulti_workspace, report_folder, fq_unit: FastqUnit, fasta, barcode_start=0, barcode_end=0, split: int = 10, clipped: int = 0, rev_clipped: int = 0, index_tolerance: int = 0, parallel: bool = False, mismatch_tolerence: int = 0, overwrite: bool = False, keep_tmp: bool = True)
- seismicrna.demult.demultiplex.finds_multigrepped_reads(sequence_objects: dict, remove: bool = True, resolve: bool = False, print_multi_grep_dict: bool = True, demultiplex_workspace: str = None) dict
filters reads based on weather or not they map to multiple constructs returns a dictionary mapping each read to a list of the reads it mapped to
- seismicrna.demult.demultiplex.grep_both_fastq(sequence_object: Sequence_Obj, clipped: int, rev_clipped: int, index_tolerence: int, delete_fastqs: bool, mismatches_allowed: int)
- seismicrna.demult.demultiplex.make_sequence_objects_from_csv(input_csv, barcode_start, barcode_end, fasta, fastq1_path, fastq2_path, paired, workspace) dict
- seismicrna.demult.demultiplex.makes_dict_from_fastq(fpath)
- seismicrna.demult.demultiplex.parallel_grepping(sequence_objects: dict, fwd_clips: int, rev_clips: int, index_tolerence: int, delete_fastq: bool, paired: bool = True, mismatches: int = 0, threads=10, iteration: int = 0, overwrite: bool = True)
runs grep in parallel
- seismicrna.demult.demultiplex.regular_grepping(sequence_objects: dict, fwd_clips: int, rev_clips: int, index_tolerence: int, delete_fastq: bool, paired: bool = True, mismatches: int = 0, iteration: int = 0, overwrite: bool = False)
runs grep in parallel
- seismicrna.demult.demultiplex.resolve_or_analyze_multigrepped_reads(union_sets: dict, remove: bool = True, resolve: bool = False)
- seismicrna.demult.demultiplex.reverse_compliment(sequence)
- seismicrna.demult.demultiplex.run_multi_greps(read_id_dict: dict, clipped: int, index_tolerence: int, delete_fastqs: bool, mismatches_allowed: int, pattern_type: str, pattern: str, pattern_start: int, pattern_end: int, fastq: str, seq_folder: str, front: bool = False)
- seismicrna.demult.demultiplex.run_seqkit_grep(sequence_object: Sequence_Obj, clipped: int, rev_clipped: int, index_tolerence: int, delete_fastqs: bool, fastq_id: int, mismatches_allowed: int)
- seismicrna.demult.demultiplex.run_seqkit_grep_function(pattern: str, search_start_ind: int, search_end_index: int, fastq_to_search: str, fastq_to_write: str, threads: int = 20, mismatch_threshhold: int = 0, append_bool: bool = False, tolerance: int = 0, delete_fq: bool = False)
1 indexed?
- class seismicrna.demult.demultiplex.super_fastq(fpath: str, split_count=10, fastq_name: str = None, super_dir='')
Bases:
object
- check_exists()
- check_set()
- destroy_temp_data()
- fastq_to_dict(fpath)
- seismicrna.demult.main.run_dm(fasta: str | Path, refs_meta: str = None, out_dir: str | Path = './out', fastqx: Iterable[str | Path] = (), phred_enc: int = 33, barcode_start: int = 0, barcode_end: int = 0, clipped: int = 0, index_tolerance: int = 0, parallel_demultiplexing: bool = False, mismatch_tolerence: int = 0, demulti_overwrite: bool = False, keep_tmp: bool = False, *, tmp_pfx='./tmp')
Split multiplexed FASTQ files by their barcodes.
- Parameters:
refs_meta (
str
) – Add reference metadata from this CSV file to exported results [positional or keyword, default: None]out_dir (
str | pathlib._local.Path
) – Write all output files to this directory [positional or keyword, default: ‘./out’]fastqx (
Iterable
) – FASTQ files of paired-end reads with mates 1 and 2 in separate files [positional or keyword, default: ()]phred_enc (
int
) – Specify the Phred score encoding of FASTQ and SAM/BAM/CRAM files [positional or keyword, default: 33]barcode_start (
int
) – Index of start of barcode [positional or keyword, default: 0]barcode_end (
int
) – Length of barcode [positional or keyword, default: 0]clipped (
int
) – Designates the amount of clipped patterns to search for in the sample, will raise compution time [positional or keyword, default: 0]index_tolerance (
int
) – Designates the allowable amount of distance you allow the pattern to be found in a read from the reference index [positional or keyword, default: 0]parallel_demultiplexing (
bool
) – Whether to run demultiplexing at maximum speed by submitting multithreaded grep functions [positional or keyword, default: False]mismatch_tolerence (
int
) – Designates the allowable amount of mismatches allowed in a string and still be considered a valid pattern find. will increase non-parallel computation at a factorial rate. use caution going above 2 mismatches. does not apply to clipped sequences. [positional or keyword, default: 0]demulti_overwrite (
bool
) – Desiginates whether to overwrite the grepped fastq. should only be used if changing setting on the same sample [positional or keyword, default: False]keep_tmp (
bool
) – Keep temporary files after finishing [positional or keyword, default: False]tmp_pfx – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp’]