seismicrna.relate package
Subpackages
- seismicrna.relate.aux package
- seismicrna.relate.cx package
- seismicrna.relate.py package- Subpackages
- Submodules
 
- seismicrna.relate.tests package- Submodules- TestFromReads- TestFromReads.test_from_0_reads()
- TestFromReads.test_from_1_read_0_segs_drop_empty()
- TestFromReads.test_from_1_read_0_segs_keep_empty()
- TestFromReads.test_from_1_read_1_segs_no_cover_drop_empty()
- TestFromReads.test_from_1_read_1_segs_no_cover_keep_empty()
- TestFromReads.test_from_1_read_1_segs_no_muts()
- TestFromReads.test_from_2_reads_1_2_segs()
- TestFromReads.test_from_2_reads_1_segs()
- TestFromReads.test_from_2_reads_2_1_segs()
- TestFromReads.test_from_2_reads_2_segs()
- TestFromReads.test_from_4_reads_varied_segs_drop_empty()
- TestFromReads.test_from_4_reads_varied_segs_keep_empty()
 
- TestRelate
- TestRelateEmpty
- TestRelatePaired- TestRelatePaired.get_sam_data()
- TestRelatePaired.test_batch_size_1()
- TestRelatePaired.test_batch_size_4()
- TestRelatePaired.test_batch_size_5()
- TestRelatePaired.test_batch_size_6()
- TestRelatePaired.test_clip()
- TestRelatePaired.test_min_phred()
- TestRelatePaired.test_min_reads()
- TestRelatePaired.test_noargs()
 
- TestRelateSingle
- extract_batches()
- load_refseq()
- write_fasta_file()
- write_sam_file()
- TestCalcRelsLinesPaired- TestCalcRelsLinesPaired.evaluate()
- TestCalcRelsLinesPaired.relate()
- TestCalcRelsLinesPaired.relate_error()
- TestCalcRelsLinesPaired.test_abut()
- TestCalcRelsLinesPaired.test_abut_int()
- TestCalcRelsLinesPaired.test_contain()
- TestCalcRelsLinesPaired.test_contain_con_int()
- TestCalcRelsLinesPaired.test_contain_flush3()
- TestCalcRelsLinesPaired.test_contain_flush3_con_int()
- TestCalcRelsLinesPaired.test_contain_flush3_inc_int()
- TestCalcRelsLinesPaired.test_contain_flush5()
- TestCalcRelsLinesPaired.test_contain_flush53()
- TestCalcRelsLinesPaired.test_contain_flush53_con_int()
- TestCalcRelsLinesPaired.test_contain_flush53_inc_int()
- TestCalcRelsLinesPaired.test_contain_flush5_con_int()
- TestCalcRelsLinesPaired.test_contain_flush5_inc_int()
- TestCalcRelsLinesPaired.test_contain_inc_int()
- TestCalcRelsLinesPaired.test_diff_names()
- TestCalcRelsLinesPaired.test_gap()
- TestCalcRelsLinesPaired.test_gap_int()
- TestCalcRelsLinesPaired.test_improper()
- TestCalcRelsLinesPaired.test_read_marks()
- TestCalcRelsLinesPaired.test_read_orientation()
- TestCalcRelsLinesPaired.test_staggered()
- TestCalcRelsLinesPaired.test_staggered_con_int()
- TestCalcRelsLinesPaired.test_staggered_inc_int()
- TestCalcRelsLinesPaired.test_unpaired()
 
- TestCalcRelsLinesSingle- TestCalcRelsLinesSingle.iter_cases()
- TestCalcRelsLinesSingle.iter_cases_insert3()
- TestCalcRelsLinesSingle.relate()
- TestCalcRelsLinesSingle.relate_error()
- TestCalcRelsLinesSingle.relate_truncated()
- TestCalcRelsLinesSingle.test_4nt_2ins()
- TestCalcRelsLinesSingle.test_4nt_2ins_paired()
- TestCalcRelsLinesSingle.test_5nt_2ins()
- TestCalcRelsLinesSingle.test_6nt_2ins()
- TestCalcRelsLinesSingle.test_7nt_0ins()
- TestCalcRelsLinesSingle.test_8nt_0ins()
- TestCalcRelsLinesSingle.test_all_matches()
- TestCalcRelsLinesSingle.test_ambig_delet_low_qual()
- TestCalcRelsLinesSingle.test_error_cigar_adj_ins_del()
- TestCalcRelsLinesSingle.test_error_cigar_adj_int_del()
- TestCalcRelsLinesSingle.test_error_cigar_consecutive()
- TestCalcRelsLinesSingle.test_error_cigar_del_first_rel()
- TestCalcRelsLinesSingle.test_error_cigar_del_last_rel()
- TestCalcRelsLinesSingle.test_error_cigar_empty()
- TestCalcRelsLinesSingle.test_error_cigar_ins_first_rel()
- TestCalcRelsLinesSingle.test_error_cigar_ins_last_rel()
- TestCalcRelsLinesSingle.test_error_cigar_int_first_rel()
- TestCalcRelsLinesSingle.test_error_cigar_int_last_rel()
- TestCalcRelsLinesSingle.test_error_cigar_missing()
- TestCalcRelsLinesSingle.test_error_cigar_op_read_diff()
- TestCalcRelsLinesSingle.test_error_cigar_op_ref_long()
- TestCalcRelsLinesSingle.test_error_cigar_op_ref_zero()
- TestCalcRelsLinesSingle.test_error_cigar_parse()
- TestCalcRelsLinesSingle.test_error_cigar_soft_clips()
- TestCalcRelsLinesSingle.test_error_flag_large()
- TestCalcRelsLinesSingle.test_error_flag_missing()
- TestCalcRelsLinesSingle.test_error_flag_parse()
- TestCalcRelsLinesSingle.test_error_line_improper_flag_proper()
- TestCalcRelsLinesSingle.test_error_line_paired_flag_unpaired()
- TestCalcRelsLinesSingle.test_error_line_unpaired_flag_paired()
- TestCalcRelsLinesSingle.test_error_mapq()
- TestCalcRelsLinesSingle.test_error_mapq_insufficient()
- TestCalcRelsLinesSingle.test_error_mapq_missing()
- TestCalcRelsLinesSingle.test_error_name_missing()
- TestCalcRelsLinesSingle.test_error_pos_large()
- TestCalcRelsLinesSingle.test_error_pos_missing()
- TestCalcRelsLinesSingle.test_error_pos_parse()
- TestCalcRelsLinesSingle.test_error_pos_zero()
- TestCalcRelsLinesSingle.test_error_qual_missing()
- TestCalcRelsLinesSingle.test_error_read_missing()
- TestCalcRelsLinesSingle.test_error_read_qual_diff()
- TestCalcRelsLinesSingle.test_error_ref_mismatch()
- TestCalcRelsLinesSingle.test_error_ref_missing()
- TestCalcRelsLinesSingle.test_example_1()
- TestCalcRelsLinesSingle.test_example_2()
- TestCalcRelsLinesSingle.test_example_3()
- TestCalcRelsLinesSingle.test_example_4()
- TestCalcRelsLinesSingle.test_long_ambindels()
- TestCalcRelsLinesSingle.test_n_read()
- TestCalcRelsLinesSingle.test_n_ref()
- TestCalcRelsLinesSingle.test_soft_clips()
 
- TestMergeMates
- as_sam()
- TestIterRecordsPaired- TestIterRecordsPaired.run_test_invalid()
- TestIterRecordsPaired.run_test_valid()
- TestIterRecordsPaired.test_blank()
- TestIterRecordsPaired.test_one_improper()
- TestIterRecordsPaired.test_one_proper()
- TestIterRecordsPaired.test_one_single()
- TestIterRecordsPaired.test_two_mated_improper()
- TestIterRecordsPaired.test_two_mated_improper_1()
- TestIterRecordsPaired.test_two_mated_improper_2()
- TestIterRecordsPaired.test_two_mated_proper()
- TestIterRecordsPaired.test_two_unmated_improper()
- TestIterRecordsPaired.test_two_unmated_proper()
 
- TestLineAttrs
- delete_sam()
- write_sam()
 
 
- Submodules
Submodules
- class seismicrna.relate.batch.FullReadBatch(*, batch: int, **kwargs)
- 
- property max_read
- Maximum possible value for a read index. 
 - property read_indexes
- Map each read number to its index in self.read_nums. 
 - property read_nums
- Read numbers. 
 
- class seismicrna.relate.batch.ReadNamesBatch(*, names: list[str] | ndarray, **kwargs)
- Bases: - FullReadBatch- property num_reads
- Number of reads. 
 - classmethod simulate(branches: dict[str, str], batch: int, num_reads: int, formatter: ~typing.Callable[[int, int], str] = <function format_read_name>, **kwargs)
- Simulate a batch. 
 
- class seismicrna.relate.batch.RelateMutsBatch(*, region: Region, sanitize: bool = True, muts: dict[int, dict[int, list[int] | ndarray]], masked_read_nums: ndarray | list[int] | None = None, **kwargs)
- Bases: - FullReadBatch,- MutsBatch,- ABC- property read_weights
- Weights for each read when computing counts. 
 
- class seismicrna.relate.batch.RelateRegionMutsBatch(*, region: Region, **kwargs)
- Bases: - RelateMutsBatch,- RegionMutsBatch- classmethod simulate(ref: str, pmut: DataFrame, uniq_end5s: ndarray, uniq_end3s: ndarray, pends: ndarray, paired: bool, read_length: int, p_rev: float, min_mut_gap: int, num_reads: int, **kwargs)
- Simulate a batch. - Parameters:
- ref ( - str) – Name of the reference.
- pmut ( - pd.DataFrame) – Rate of each type of mutation at each position.
- uniq_end5s ( - np.ndarray) – Unique read 5’ end coordinates.
- uniq_end3s ( - np.ndarray) – Unique read 3’ end coordinates.
- pends ( - np.ndarray) – Probability of each set of unique end coordinates.
- paired ( - bool) – Whether to simulate paired-end or single-end reads.
- read_length ( - int) – Length of each read segment (paired-end reads only).
- p_rev ( - float) – Probability that mate 1 is reversed (paired-end reads only).
- min_mut_gap ( - int) – Minimum number of positions between two mutations.
- num_reads ( - int) – Number of reads in the batch.
 
 
 
- class seismicrna.relate.dataset.AverageDataset(report_file: str | Path, verify_times: bool = True)
- 
Dataset of population average data. - property best_k
- Best number of clusters. 
 - property ks
- Numbers of clusters. 
 
- class seismicrna.relate.dataset.NamesDataset(report_file: str | Path, verify_times: bool = True)
- Bases: - AverageDataset,- ABC- classmethod kind()
 
- class seismicrna.relate.dataset.PoolDataset(*args, **kwargs)
- Bases: - RelateDataset,- TallDataset,- MutsDataset,- MergedRegionDataset- Load pooled batches of relationships. - classmethod get_dataset_load_func()
- Function to load one constituent dataset. 
 - classmethod get_report_type()
- Type of report. 
 - property region
- Region of the dataset. 
 
- class seismicrna.relate.dataset.PoolReadNamesDataset(*args, **kwargs)
- Bases: - NamesDataset,- TallDataset- Pooled Dataset of read names. - classmethod get_dataset_load_func()
- Function to load one constituent dataset. 
 - classmethod get_report_type()
- Type of report. 
 
- class seismicrna.relate.dataset.ReadNamesDataset(report_file: str | Path, verify_times: bool = True)
- Bases: - NamesDataset,- LoadedDataset- Dataset of read names from the Relate step. - classmethod get_batch_type()
- Type of batch. 
 - classmethod get_report_type()
- Type of report. 
 - property pattern
- Pattern of mutations to count. 
 
- class seismicrna.relate.dataset.RelateDataset(report_file: str | Path, verify_times: bool = True)
- Bases: - AverageDataset,- ABC- Dataset of relationships. 
- class seismicrna.relate.dataset.RelateMutsDataset(report_file: str | Path, verify_times: bool = True)
- Bases: - RelateDataset,- LoadedDataset,- MutsDataset- Dataset of mutations from the Relate step. - classmethod get_batch_type()
- Type of batch. 
 - classmethod get_report_type()
- Type of report. 
 - property paired
- Whether the reads are paired-end. 
 - property pattern
- Pattern of mutations to count. 
 - property refseq
- Sequence of the reference. 
 - property region
- Region of the dataset. 
 
- class seismicrna.relate.io.ReadNamesBatchIO(*, names: list[str] | ndarray, **kwargs)
- Bases: - ReadNamesBatch,- ReadBatchIO,- RefBrickleIO,- RelateIO- classmethod get_file_seg_type()
- Type of the last segment in the path. 
 
- class seismicrna.relate.io.RefseqIO(*args, refseq: DNA, **kwargs)
- Bases: - RefBrickleIO,- RelateIO- classmethod get_file_seg_type()
- Type of the last segment in the path. 
 - property refseq
 
- class seismicrna.relate.io.RelateBatchIO(*args, region: Region, **kwargs)
- Bases: - RelateMutsBatch,- MutsBatchIO,- RefBrickleIO,- RelateIO- classmethod from_region_batch(batch: RelateRegionMutsBatch, *, sample: str, branches: dict[str, str])
- Create an instance from a RelateRegionMutsBatch. 
 - classmethod get_file_seg_type()
- Type of the last segment in the path. 
 
- class seismicrna.relate.io.RelateFile
- Bases: - HasRefFilePath,- ABC- classmethod get_step()
- Step of the workflow. 
 
- class seismicrna.relate.io.RelateIO
- Bases: - RelateFile,- RefFileIO,- ABC
- seismicrna.relate.io.from_reads(reads: Iterable[tuple[str, tuple[tuple[list[int], list[int]], dict[int, int]]]], *, sample: str, branches: dict[str, str], ref: str, refseq: DNA, batch: int, write_read_names: bool, drop_empty_reads: bool = True)
- Gather reads into a batch of relationships. 
- class seismicrna.relate.lists.RelateList(*, sample: str, branches: Iterable[str], ref: str, data: DataFrame, **kwargs)
- Bases: - List,- RelateFile,- ABC
- class seismicrna.relate.lists.RelatePositionList(*, sample: str, branches: Iterable[str], ref: str, data: DataFrame, **kwargs)
- Bases: - PositionList,- RelateList- classmethod get_table_type()
- Type of table that this type of list can process. 
 
- seismicrna.relate.main.check_duplicates(xam_files: list[Path])
- Check if any combination of sample, reference, and branches occurs more than once. 
- seismicrna.relate.main.run(fasta: str | Path, input_path: Iterable[str | Path], *, out_dir: str | Path = './out', branch: str = '', min_reads: int = 1000, min_mapq: int = 25, phred_enc: int = 33, min_phred: int = 25, batch_size: int = 65536, insert3: bool = True, ambindel: bool = True, overhangs: bool = True, clip_end5: int = 4, clip_end3: int = 4, sep_strands: bool = False, rev_label: str = '-rev', write_read_names: bool = False, relate_pos_table: bool = True, relate_read_table: bool = False, relate_cx: bool = True, num_cpus: int = 4, brotli_level: int = 10, force: bool = False, keep_tmp: bool = False, tmp_pfx='./tmp')
- Compute relationships between references and aligned reads. - Parameters:
- out_dir ( - str | pathlib._local.Path) – Write all output files to this directory [keyword-only, default: ‘./out’]
- branch ( - str) – Create a new branch of the workflow with this name [keyword-only, default: ‘’]
- min_reads ( - int) – Discard alignment maps with fewer than this many reads [keyword-only, default: 1000]
- min_mapq ( - int) – Discard reads with mapping qualities below this threshold [keyword-only, default: 25]
- phred_enc ( - int) – Specify the Phred score encoding of FASTQ and SAM/BAM/CRAM files [keyword-only, default: 33]
- min_phred ( - int) – Mark base calls with Phred scores lower than this threshold as ambiguous [keyword-only, default: 25]
- batch_size ( - int) – Limit batches to at most this many reads [keyword-only, default: 65536]
- insert3 ( - bool) – Mark each insertion on the base to its 3’ (True) or 5’ (False) side [keyword-only, default: True]
- ambindel ( - bool) – Mark all ambiguous insertions and deletions (indels) [keyword-only, default: True]
- overhangs ( - bool) – Retain the overhangs of paired-end mates that dovetail [keyword-only, default: True]
- clip_end5 ( - int) – Clip this many bases from the 5’ end of each read [keyword-only, default: 4]
- clip_end3 ( - int) – Clip this many bases from the 3’ end of each read [keyword-only, default: 4]
- sep_strands ( - bool) – Separate each alignment map into forward- and reverse-strand reads [keyword-only, default: False]
- rev_label ( - str) – With –sep-strands, add this label to each reverse-strand reference [keyword-only, default: ‘-rev’]
- write_read_names ( - bool) – Write the name of each read in a second set of batches (necessary for the options –mask-read or –mask-read-file) [keyword-only, default: False]
- relate_pos_table ( - bool) – Tabulate relationships per position for relate data [keyword-only, default: True]
- relate_read_table ( - bool) – Tabulate relationships per read for relate data [keyword-only, default: False]
- relate_cx ( - bool) – Use a fast (C extension module) version of the relate algorithm; the slow (Python) version is still avilable as a fallback if the C extension cannot be loaded, and for debugging/benchmarking [keyword-only, default: True]
- num_cpus ( - int) – Use up to this many CPUs simultaneously [keyword-only, default: 4]
- brotli_level ( - int) – Compress pickle files with this level of Brotli (0 - 11) [keyword-only, default: 10]
- force ( - bool) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]
- keep_tmp ( - bool) – Keep temporary files after finishing [keyword-only, default: False]
- tmp_pfx – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp’] 
 
 
- class seismicrna.relate.report.BaseRelateReport(**kwargs: Any | Callable[[Report], Any])
- Bases: - RefReport,- RelateIO,- ABC- classmethod get_file_seg_type()
- Type of the last segment in the path. 
 
- class seismicrna.relate.report.PoolReport(**kwargs: Any | Callable[[Report], Any])
- Bases: - BaseRelateReport- classmethod get_param_report_fields()
- Parameter fields of the report. 
 
- class seismicrna.relate.report.RelateReport(**kwargs: Any | Callable[[Report], Any])
- Bases: - BatchedReport,- BaseRelateReport- classmethod get_checksum_report_fields()
- Checksum fields of the report. 
 - classmethod get_param_report_fields()
- Parameter fields of the report. 
 - classmethod get_result_report_fields()
- Result fields of the report. 
 
- class seismicrna.relate.sam.XamViewer(xam_input: Path, tmp_dir: Path, branch: str, batch_size: int, num_cpus: int = 1)
- Bases: - object- property ancestors
 - property branches
 - create_tmp_sam()
- Create the temporary SAM file. 
 - delete_tmp_sam()
- Delete the temporary SAM file. 
 - property flagstats
 - property indexes
 - property n_reads
- Total number of reads. 
 - property paired
- Whether the reads are paired. 
 - property ref
 - property sample
 - property tmp_sam_path
- Get the path to the temporary SAM file. 
 
- seismicrna.relate.sam.get_line_attrs(line: str) tuple[str, bool, bool]
- Read attributes from a line in a SAM file. 
- seismicrna.relate.sam.tmp_xam_cmd(xam_in: Path, xam_out: Path, paired: bool, num_cpus: int = 1)
- Collate and create a temporary XAM file. 
- seismicrna.relate.sim.simulate_batch(sample: str, branches: dict[str, str], ref: str, batch: int, write_read_names: bool, pmut: ~pandas.core.frame.DataFrame, uniq_end5s: ~numpy.ndarray, uniq_end3s: ~numpy.ndarray, pends: ~numpy.ndarray, paired: bool, read_length: int, p_rev: float, min_mut_gap: int, num_reads: int, formatter: ~typing.Callable[[int, int], str] = <function format_read_name>)
- Simulate a pair of RelateBatchIO and ReadNamesBatchIO. 
- seismicrna.relate.sim.simulate_batches(batch_size: int, pmut: DataFrame, pclust: Series, num_reads: int, **kwargs)
- seismicrna.relate.sim.simulate_cluster(first_batch: int, batch_size: int, num_reads: int, **kwargs)
- Simulate all batches for one cluster. 
- seismicrna.relate.sim.simulate_relate(*, out_dir: Path, tmp_dir: Path, branch: str, sample: str, ref: str, refseq: DNA, batch_size: int, num_reads: int, write_read_names: bool, pmut: DataFrame, uniq_end5s: ndarray, uniq_end3s: ndarray, pends: ndarray, pclust: Series, brotli_level: int, force: bool, **kwargs)
- Simulate an entire relate step. 
- seismicrna.relate.strands.generate_both_strands(ref: str, seq: DNA, rev_label: str)
- Yield both the forward and reverse strand for each sequence. 
- seismicrna.relate.strands.write_both_strands(fasta_in: Path, fasta_out: Path, rev_label: str)
- Write a FASTA file of both forward and reverse strands. 
- class seismicrna.relate.table.AverageTable
- Bases: - RelTypeTable,- ABC- Average over an ensemble of RNA structures. - classmethod get_header_type()
- Type of the header for the table. 
 
- class seismicrna.relate.table.AverageTabulator(*, top: Path, branches: dict[str, str], sample: str, region: Region, count_ends: bool, count_pos: bool, count_read: bool, validate: bool = True)
- 
- property data_per_clust
- Series of per-cluster data (or None if no clusters). 
 
- class seismicrna.relate.table.FullTabulator(*, ref: str, refseq: DNA, count_ends: bool = False, **kwargs)
- 
- classmethod get_null_value()
- The null value for a count: either 0 or NaN. 
 
- class seismicrna.relate.table.RelateBatchTabulator(*, get_batch_count_all: Callable, num_batches: int, num_cpus: int = 1, **kwargs)
- Bases: - BatchTabulator,- RelateTabulator
- class seismicrna.relate.table.RelateCountTabulator(*, batch_counts: Iterable[tuple[Any, Any, Any, Any]], **kwargs)
- Bases: - CountTabulator,- RelateTabulator
- class seismicrna.relate.table.RelateDatasetTabulator(*, dataset: MutsDataset, validate: bool = False, **kwargs)
- Bases: - DatasetTabulator,- RelateTabulator- classmethod init_kws()
- Attributes of the dataset to use as keyword arguments in super().__init__(). 
 
- class seismicrna.relate.table.RelatePositionTable
- Bases: - RelateTable,- PositionTable,- ABC
- class seismicrna.relate.table.RelatePositionTableLoader(table_file: str | Path, **kwargs)
- Bases: - PositionTableLoader,- RelatePositionTable- Load relate data indexed by position. 
- class seismicrna.relate.table.RelateReadTable
- Bases: - RelateTable,- ReadTable,- ABC
- class seismicrna.relate.table.RelateReadTableLoader(table_file: str | Path, **kwargs)
- Bases: - ReadTableLoader,- RelateReadTable- Load relate data indexed by read. 
- class seismicrna.relate.table.RelateReadTableWriter(tabulator: Tabulator)
- Bases: - ReadTableWriter,- RelateReadTable
- class seismicrna.relate.table.RelateTable
- Bases: - AverageTable,- RelateFile,- ABC- classmethod get_load_function()
- LoadFunction for all Dataset types for this Table. 
 
- class seismicrna.relate.table.RelateTabulator(*, ref: str, refseq: DNA, count_ends: bool = False, **kwargs)
- Bases: - FullTabulator,- AverageTabulator,- ABC- classmethod table_types()
- Types of tables that this tabulator can write. 
 
- class seismicrna.relate.write.RelationWriter(xam_view: XamViewer, fasta_file: str | Path)
- Bases: - object- Compute and write relationships for all reads from one sample aligned to one reference sequence. - property branches
 - property num_reads
 - property ref
 - property refseq
 - property sample
 - write(*, out_dir: Path, release_dir: Path, min_mapq: int, min_reads: int, min_phred: int, phred_enc: int, insert3: bool, ambindel: bool, overhangs: bool, clip_end5: int, clip_end3: int, relate_pos_table: bool, relate_read_table: bool, brotli_level: int, force: bool, num_cpus: int, **kwargs)
- Compute relationships for every record in a XAM file. 
 
- seismicrna.relate.write.generate_batch(batch: int, *, xam_view: XamViewer, top: Path, refseq: DNA, brotli_level: int, count_pos: bool, count_read: bool, write_read_names: bool, **kwargs)
- Compute relationships for every SAM record in one batch.