seismicrna.relate package
Subpackages
- seismicrna.relate.aux package
- seismicrna.relate.cx package
- seismicrna.relate.py package
- Subpackages
- Submodules
- seismicrna.relate.tests package
- Submodules
TestFromReads
TestFromReads.test_from_0_reads()
TestFromReads.test_from_1_read_0_segs_drop_empty()
TestFromReads.test_from_1_read_0_segs_keep_empty()
TestFromReads.test_from_1_read_1_segs_no_cover_drop_empty()
TestFromReads.test_from_1_read_1_segs_no_cover_keep_empty()
TestFromReads.test_from_1_read_1_segs_no_muts()
TestFromReads.test_from_2_reads_1_2_segs()
TestFromReads.test_from_2_reads_1_segs()
TestFromReads.test_from_2_reads_2_1_segs()
TestFromReads.test_from_2_reads_2_segs()
TestFromReads.test_from_4_reads_varied_segs_drop_empty()
TestFromReads.test_from_4_reads_varied_segs_keep_empty()
TestRelate
TestRelateEmpty
TestRelatePaired
TestRelatePaired.get_sam_data()
TestRelatePaired.test_batch_size_1()
TestRelatePaired.test_batch_size_4()
TestRelatePaired.test_batch_size_5()
TestRelatePaired.test_batch_size_6()
TestRelatePaired.test_clip()
TestRelatePaired.test_min_phred()
TestRelatePaired.test_min_reads()
TestRelatePaired.test_noargs()
TestRelateSingle
extract_batches()
load_refseq()
write_fasta_file()
write_sam_file()
TestCalcRelsLinesPaired
TestCalcRelsLinesPaired.evaluate()
TestCalcRelsLinesPaired.relate()
TestCalcRelsLinesPaired.relate_error()
TestCalcRelsLinesPaired.test_abut()
TestCalcRelsLinesPaired.test_abut_int()
TestCalcRelsLinesPaired.test_contain()
TestCalcRelsLinesPaired.test_contain_con_int()
TestCalcRelsLinesPaired.test_contain_flush3()
TestCalcRelsLinesPaired.test_contain_flush3_con_int()
TestCalcRelsLinesPaired.test_contain_flush3_inc_int()
TestCalcRelsLinesPaired.test_contain_flush5()
TestCalcRelsLinesPaired.test_contain_flush53()
TestCalcRelsLinesPaired.test_contain_flush53_con_int()
TestCalcRelsLinesPaired.test_contain_flush53_inc_int()
TestCalcRelsLinesPaired.test_contain_flush5_con_int()
TestCalcRelsLinesPaired.test_contain_flush5_inc_int()
TestCalcRelsLinesPaired.test_contain_inc_int()
TestCalcRelsLinesPaired.test_diff_names()
TestCalcRelsLinesPaired.test_gap()
TestCalcRelsLinesPaired.test_gap_int()
TestCalcRelsLinesPaired.test_improper()
TestCalcRelsLinesPaired.test_read_marks()
TestCalcRelsLinesPaired.test_read_orientation()
TestCalcRelsLinesPaired.test_staggered()
TestCalcRelsLinesPaired.test_staggered_con_int()
TestCalcRelsLinesPaired.test_staggered_inc_int()
TestCalcRelsLinesPaired.test_unpaired()
TestCalcRelsLinesSingle
TestCalcRelsLinesSingle.iter_cases()
TestCalcRelsLinesSingle.iter_cases_insert3()
TestCalcRelsLinesSingle.relate()
TestCalcRelsLinesSingle.relate_error()
TestCalcRelsLinesSingle.relate_truncated()
TestCalcRelsLinesSingle.test_4nt_2ins()
TestCalcRelsLinesSingle.test_4nt_2ins_paired()
TestCalcRelsLinesSingle.test_5nt_2ins()
TestCalcRelsLinesSingle.test_6nt_2ins()
TestCalcRelsLinesSingle.test_7nt_0ins()
TestCalcRelsLinesSingle.test_8nt_0ins()
TestCalcRelsLinesSingle.test_all_matches()
TestCalcRelsLinesSingle.test_ambig_delet_low_qual()
TestCalcRelsLinesSingle.test_error_cigar_adj_ins_del()
TestCalcRelsLinesSingle.test_error_cigar_adj_int_del()
TestCalcRelsLinesSingle.test_error_cigar_consecutive()
TestCalcRelsLinesSingle.test_error_cigar_del_first_rel()
TestCalcRelsLinesSingle.test_error_cigar_del_last_rel()
TestCalcRelsLinesSingle.test_error_cigar_empty()
TestCalcRelsLinesSingle.test_error_cigar_ins_first_rel()
TestCalcRelsLinesSingle.test_error_cigar_ins_last_rel()
TestCalcRelsLinesSingle.test_error_cigar_int_first_rel()
TestCalcRelsLinesSingle.test_error_cigar_int_last_rel()
TestCalcRelsLinesSingle.test_error_cigar_missing()
TestCalcRelsLinesSingle.test_error_cigar_op_read_diff()
TestCalcRelsLinesSingle.test_error_cigar_op_ref_long()
TestCalcRelsLinesSingle.test_error_cigar_op_ref_zero()
TestCalcRelsLinesSingle.test_error_cigar_parse()
TestCalcRelsLinesSingle.test_error_cigar_soft_clips()
TestCalcRelsLinesSingle.test_error_flag_large()
TestCalcRelsLinesSingle.test_error_flag_missing()
TestCalcRelsLinesSingle.test_error_flag_parse()
TestCalcRelsLinesSingle.test_error_line_improper_flag_proper()
TestCalcRelsLinesSingle.test_error_line_paired_flag_unpaired()
TestCalcRelsLinesSingle.test_error_line_unpaired_flag_paired()
TestCalcRelsLinesSingle.test_error_mapq()
TestCalcRelsLinesSingle.test_error_mapq_insufficient()
TestCalcRelsLinesSingle.test_error_mapq_missing()
TestCalcRelsLinesSingle.test_error_name_missing()
TestCalcRelsLinesSingle.test_error_pos_large()
TestCalcRelsLinesSingle.test_error_pos_missing()
TestCalcRelsLinesSingle.test_error_pos_parse()
TestCalcRelsLinesSingle.test_error_pos_zero()
TestCalcRelsLinesSingle.test_error_qual_missing()
TestCalcRelsLinesSingle.test_error_read_missing()
TestCalcRelsLinesSingle.test_error_read_qual_diff()
TestCalcRelsLinesSingle.test_error_ref_mismatch()
TestCalcRelsLinesSingle.test_error_ref_missing()
TestCalcRelsLinesSingle.test_example_1()
TestCalcRelsLinesSingle.test_example_2()
TestCalcRelsLinesSingle.test_example_3()
TestCalcRelsLinesSingle.test_example_4()
TestCalcRelsLinesSingle.test_long_ambindels()
TestCalcRelsLinesSingle.test_n_read()
TestCalcRelsLinesSingle.test_n_ref()
TestCalcRelsLinesSingle.test_soft_clips()
TestMergeMates
as_sam()
TestIterRecordsPaired
TestIterRecordsPaired.run_test_invalid()
TestIterRecordsPaired.run_test_valid()
TestIterRecordsPaired.test_blank()
TestIterRecordsPaired.test_one_improper()
TestIterRecordsPaired.test_one_proper()
TestIterRecordsPaired.test_one_single()
TestIterRecordsPaired.test_two_mated_improper()
TestIterRecordsPaired.test_two_mated_improper_1()
TestIterRecordsPaired.test_two_mated_improper_2()
TestIterRecordsPaired.test_two_mated_proper()
TestIterRecordsPaired.test_two_unmated_improper()
TestIterRecordsPaired.test_two_unmated_proper()
TestLineAttrs
delete_sam()
write_sam()
- Submodules
Submodules
- class seismicrna.relate.batch.FullReadBatch(*, batch: int, **kwargs)
-
- property max_read
Maximum possible value for a read index.
- property read_indexes
Map each read number to its index in self.read_nums.
- property read_nums
Read numbers.
- class seismicrna.relate.batch.ReadNamesBatch(*, names: list[str] | ndarray, **kwargs)
Bases:
FullReadBatch
- property num_reads
Number of reads.
- classmethod simulate(branches: dict[str, str], batch: int, num_reads: int, formatter: ~typing.Callable[[int, int], str] = <function format_read_name>, **kwargs)
Simulate a batch.
- class seismicrna.relate.batch.RelateMutsBatch(*, region: Region, sanitize: bool = True, muts: dict[int, dict[int, list[int] | ndarray]], masked_read_nums: ndarray | list[int] | None = None, **kwargs)
Bases:
FullReadBatch
,MutsBatch
,ABC
- property read_weights
Weights for each read when computing counts.
- class seismicrna.relate.batch.RelateRegionMutsBatch(*, region: Region, **kwargs)
Bases:
RelateMutsBatch
,RegionMutsBatch
- classmethod simulate(ref: str, pmut: DataFrame, uniq_end5s: ndarray, uniq_end3s: ndarray, pends: ndarray, paired: bool, read_length: int, p_rev: float, min_mut_gap: int, num_reads: int, **kwargs)
Simulate a batch.
- Parameters:
ref (
str
) – Name of the reference.pmut (
pd.DataFrame
) – Rate of each type of mutation at each position.uniq_end5s (
np.ndarray
) – Unique read 5’ end coordinates.uniq_end3s (
np.ndarray
) – Unique read 3’ end coordinates.pends (
np.ndarray
) – Probability of each set of unique end coordinates.paired (
bool
) – Whether to simulate paired-end or single-end reads.read_length (
int
) – Length of each read segment (paired-end reads only).p_rev (
float
) – Probability that mate 1 is reversed (paired-end reads only).min_mut_gap (
int
) – Minimum number of positions between two mutations.num_reads (
int
) – Number of reads in the batch.
- class seismicrna.relate.dataset.AverageDataset(report_file: str | Path, verify_times: bool = True)
-
Dataset of population average data.
- property best_k
Best number of clusters.
- property ks
Numbers of clusters.
- class seismicrna.relate.dataset.NamesDataset(report_file: str | Path, verify_times: bool = True)
Bases:
AverageDataset
,ABC
- classmethod kind()
- class seismicrna.relate.dataset.PoolDataset(*args, **kwargs)
Bases:
RelateDataset
,TallDataset
,MutsDataset
,MergedRegionDataset
Load pooled batches of relationships.
- classmethod get_dataset_load_func()
Function to load one constituent dataset.
- classmethod get_report_type()
Type of report.
- property region
Region of the dataset.
- class seismicrna.relate.dataset.PoolReadNamesDataset(*args, **kwargs)
Bases:
NamesDataset
,TallDataset
Pooled Dataset of read names.
- classmethod get_dataset_load_func()
Function to load one constituent dataset.
- classmethod get_report_type()
Type of report.
- class seismicrna.relate.dataset.ReadNamesDataset(report_file: str | Path, verify_times: bool = True)
Bases:
NamesDataset
,LoadedDataset
Dataset of read names from the Relate step.
- classmethod get_batch_type()
Type of batch.
- classmethod get_report_type()
Type of report.
- property pattern
Pattern of mutations to count.
- class seismicrna.relate.dataset.RelateDataset(report_file: str | Path, verify_times: bool = True)
Bases:
AverageDataset
,ABC
Dataset of relationships.
- class seismicrna.relate.dataset.RelateMutsDataset(report_file: str | Path, verify_times: bool = True)
Bases:
RelateDataset
,LoadedDataset
,MutsDataset
Dataset of mutations from the Relate step.
- classmethod get_batch_type()
Type of batch.
- classmethod get_report_type()
Type of report.
- property paired
Whether the reads are paired-end.
- property pattern
Pattern of mutations to count.
- property refseq
Sequence of the reference.
- property region
Region of the dataset.
- class seismicrna.relate.io.ReadNamesBatchIO(*, names: list[str] | ndarray, **kwargs)
Bases:
ReadNamesBatch
,ReadBatchIO
,RefBrickleIO
,RelateIO
- classmethod get_file_seg_type()
Type of the last segment in the path.
- class seismicrna.relate.io.RefseqIO(*args, refseq: DNA, **kwargs)
Bases:
RefBrickleIO
,RelateIO
- classmethod get_file_seg_type()
Type of the last segment in the path.
- property refseq
- class seismicrna.relate.io.RelateBatchIO(*args, region: Region, **kwargs)
Bases:
RelateMutsBatch
,MutsBatchIO
,RefBrickleIO
,RelateIO
- classmethod from_region_batch(batch: RelateRegionMutsBatch, *, sample: str, branches: dict[str, str])
Create an instance from a RelateRegionMutsBatch.
- classmethod get_file_seg_type()
Type of the last segment in the path.
- class seismicrna.relate.io.RelateFile
Bases:
HasRefFilePath
,ABC
- classmethod get_step()
Step of the workflow.
- class seismicrna.relate.io.RelateIO
Bases:
RelateFile
,RefFileIO
,ABC
- seismicrna.relate.io.from_reads(reads: Iterable[tuple[str, tuple[tuple[list[int], list[int]], dict[int, int]]]], *, sample: str, branches: dict[str, str], ref: str, refseq: DNA, batch: int, write_read_names: bool, drop_empty_reads: bool = True)
Gather reads into a batch of relationships.
- class seismicrna.relate.lists.RelateList(*, sample: str, branches: Iterable[str], ref: str, data: DataFrame, **kwargs)
Bases:
List
,RelateFile
,ABC
- class seismicrna.relate.lists.RelatePositionList(*, sample: str, branches: Iterable[str], ref: str, data: DataFrame, **kwargs)
Bases:
PositionList
,RelateList
- classmethod get_table_type()
Type of table that this type of list can process.
- seismicrna.relate.main.check_duplicates(xam_files: list[Path])
Check if any combination of sample, reference, and branches occurs more than once.
- seismicrna.relate.main.run(fasta: str | Path, input_path: Iterable[str | Path], *, out_dir: str | Path = './out', branch: str = '', min_reads: int = 1000, min_mapq: int = 25, phred_enc: int = 33, min_phred: int = 25, batch_size: int = 65536, insert3: bool = True, ambindel: bool = True, overhangs: bool = True, clip_end5: int = 4, clip_end3: int = 4, sep_strands: bool = False, rev_label: str = '-rev', write_read_names: bool = False, relate_pos_table: bool = True, relate_read_table: bool = False, relate_cx: bool = True, num_cpus: int = 4, brotli_level: int = 10, force: bool = False, keep_tmp: bool = False, tmp_pfx='./tmp')
Compute relationships between references and aligned reads.
- Parameters:
out_dir (
str | pathlib._local.Path
) – Write all output files to this directory [keyword-only, default: ‘./out’]branch (
str
) – Create a new branch of the workflow with this name [keyword-only, default: ‘’]min_reads (
int
) – Discard alignment maps with fewer than this many reads [keyword-only, default: 1000]min_mapq (
int
) – Discard reads with mapping qualities below this threshold [keyword-only, default: 25]phred_enc (
int
) – Specify the Phred score encoding of FASTQ and SAM/BAM/CRAM files [keyword-only, default: 33]min_phred (
int
) – Mark base calls with Phred scores lower than this threshold as ambiguous [keyword-only, default: 25]batch_size (
int
) – Limit batches to at most this many reads [keyword-only, default: 65536]insert3 (
bool
) – Mark each insertion on the base to its 3’ (True) or 5’ (False) side [keyword-only, default: True]ambindel (
bool
) – Mark all ambiguous insertions and deletions (indels) [keyword-only, default: True]overhangs (
bool
) – Retain the overhangs of paired-end mates that dovetail [keyword-only, default: True]clip_end5 (
int
) – Clip this many bases from the 5’ end of each read [keyword-only, default: 4]clip_end3 (
int
) – Clip this many bases from the 3’ end of each read [keyword-only, default: 4]sep_strands (
bool
) – Separate each alignment map into forward- and reverse-strand reads [keyword-only, default: False]rev_label (
str
) – With –sep-strands, add this label to each reverse-strand reference [keyword-only, default: ‘-rev’]write_read_names (
bool
) – Write the name of each read in a second set of batches (necessary for the options –mask-read or –mask-read-file) [keyword-only, default: False]relate_pos_table (
bool
) – Tabulate relationships per position for relate data [keyword-only, default: True]relate_read_table (
bool
) – Tabulate relationships per read for relate data [keyword-only, default: False]relate_cx (
bool
) – Use a fast (C extension module) version of the relate algorithm; the slow (Python) version is still avilable as a fallback if the C extension cannot be loaded, and for debugging/benchmarking [keyword-only, default: True]num_cpus (
int
) – Use up to this many CPUs simultaneously [keyword-only, default: 4]brotli_level (
int
) – Compress pickle files with this level of Brotli (0 - 11) [keyword-only, default: 10]force (
bool
) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]keep_tmp (
bool
) – Keep temporary files after finishing [keyword-only, default: False]tmp_pfx – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp’]
- class seismicrna.relate.report.BaseRelateReport(**kwargs: Any | Callable[[Report], Any])
Bases:
RefReport
,RelateIO
,ABC
- classmethod get_file_seg_type()
Type of the last segment in the path.
- class seismicrna.relate.report.PoolReport(**kwargs: Any | Callable[[Report], Any])
Bases:
BaseRelateReport
- classmethod get_param_report_fields()
Parameter fields of the report.
- class seismicrna.relate.report.RelateReport(**kwargs: Any | Callable[[Report], Any])
Bases:
BatchedReport
,BaseRelateReport
- classmethod get_checksum_report_fields()
Checksum fields of the report.
- classmethod get_param_report_fields()
Parameter fields of the report.
- classmethod get_result_report_fields()
Result fields of the report.
- class seismicrna.relate.sam.XamViewer(xam_input: Path, tmp_dir: Path, branch: str, batch_size: int, num_cpus: int = 1)
Bases:
object
- property ancestors
- property branches
- create_tmp_sam()
Create the temporary SAM file.
- delete_tmp_sam()
Delete the temporary SAM file.
- property flagstats
- property indexes
- property n_reads
Total number of reads.
- property paired
Whether the reads are paired.
- property ref
- property sample
- property tmp_sam_path
Get the path to the temporary SAM file.
- seismicrna.relate.sam.get_line_attrs(line: str) tuple[str, bool, bool]
Read attributes from a line in a SAM file.
- seismicrna.relate.sam.tmp_xam_cmd(xam_in: Path, xam_out: Path, paired: bool, num_cpus: int = 1)
Collate and create a temporary XAM file.
- seismicrna.relate.sim.simulate_batch(sample: str, branches: dict[str, str], ref: str, batch: int, write_read_names: bool, pmut: ~pandas.core.frame.DataFrame, uniq_end5s: ~numpy.ndarray, uniq_end3s: ~numpy.ndarray, pends: ~numpy.ndarray, paired: bool, read_length: int, p_rev: float, min_mut_gap: int, num_reads: int, formatter: ~typing.Callable[[int, int], str] = <function format_read_name>)
Simulate a pair of RelateBatchIO and ReadNamesBatchIO.
- seismicrna.relate.sim.simulate_batches(batch_size: int, pmut: DataFrame, pclust: Series, num_reads: int, **kwargs)
- seismicrna.relate.sim.simulate_cluster(first_batch: int, batch_size: int, num_reads: int, **kwargs)
Simulate all batches for one cluster.
- seismicrna.relate.sim.simulate_relate(*, out_dir: Path, tmp_dir: Path, branch: str, sample: str, ref: str, refseq: DNA, batch_size: int, num_reads: int, write_read_names: bool, pmut: DataFrame, uniq_end5s: ndarray, uniq_end3s: ndarray, pends: ndarray, pclust: Series, brotli_level: int, force: bool, **kwargs)
Simulate an entire relate step.
- seismicrna.relate.strands.generate_both_strands(ref: str, seq: DNA, rev_label: str)
Yield both the forward and reverse strand for each sequence.
- seismicrna.relate.strands.write_both_strands(fasta_in: Path, fasta_out: Path, rev_label: str)
Write a FASTA file of both forward and reverse strands.
- class seismicrna.relate.table.AverageTable
Bases:
RelTypeTable
,ABC
Average over an ensemble of RNA structures.
- classmethod get_header_type()
Type of the header for the table.
- class seismicrna.relate.table.AverageTabulator(*, top: Path, branches: dict[str, str], sample: str, region: Region, count_ends: bool, count_pos: bool, count_read: bool, validate: bool = True)
-
- property data_per_clust
Series of per-cluster data (or None if no clusters).
- class seismicrna.relate.table.FullTabulator(*, ref: str, refseq: DNA, count_ends: bool = False, **kwargs)
-
- classmethod get_null_value()
The null value for a count: either 0 or NaN.
- class seismicrna.relate.table.RelateBatchTabulator(*, get_batch_count_all: Callable, num_batches: int, num_cpus: int = 1, **kwargs)
Bases:
BatchTabulator
,RelateTabulator
- class seismicrna.relate.table.RelateCountTabulator(*, batch_counts: Iterable[tuple[Any, Any, Any, Any]], **kwargs)
Bases:
CountTabulator
,RelateTabulator
- class seismicrna.relate.table.RelateDatasetTabulator(*, dataset: MutsDataset, validate: bool = False, **kwargs)
Bases:
DatasetTabulator
,RelateTabulator
- classmethod init_kws()
Attributes of the dataset to use as keyword arguments in super().__init__().
- class seismicrna.relate.table.RelatePositionTable
Bases:
RelateTable
,PositionTable
,ABC
- class seismicrna.relate.table.RelatePositionTableLoader(table_file: str | Path, **kwargs)
Bases:
PositionTableLoader
,RelatePositionTable
Load relate data indexed by position.
- class seismicrna.relate.table.RelateReadTable
Bases:
RelateTable
,ReadTable
,ABC
- class seismicrna.relate.table.RelateReadTableLoader(table_file: str | Path, **kwargs)
Bases:
ReadTableLoader
,RelateReadTable
Load relate data indexed by read.
- class seismicrna.relate.table.RelateReadTableWriter(tabulator: Tabulator)
Bases:
ReadTableWriter
,RelateReadTable
- class seismicrna.relate.table.RelateTable
Bases:
AverageTable
,RelateFile
,ABC
- classmethod get_load_function()
LoadFunction for all Dataset types for this Table.
- class seismicrna.relate.table.RelateTabulator(*, ref: str, refseq: DNA, count_ends: bool = False, **kwargs)
Bases:
FullTabulator
,AverageTabulator
,ABC
- classmethod table_types()
Types of tables that this tabulator can write.
- class seismicrna.relate.write.RelationWriter(xam_view: XamViewer, fasta_file: str | Path)
Bases:
object
Compute and write relationships for all reads from one sample aligned to one reference sequence.
- property branches
- property num_reads
- property ref
- property refseq
- property sample
- write(*, out_dir: Path, release_dir: Path, min_mapq: int, min_reads: int, min_phred: int, phred_enc: int, insert3: bool, ambindel: bool, overhangs: bool, clip_end5: int, clip_end3: int, relate_pos_table: bool, relate_read_table: bool, brotli_level: int, force: bool, num_cpus: int, **kwargs)
Compute relationships for every record in a XAM file.
- seismicrna.relate.write.generate_batch(batch: int, *, xam_view: XamViewer, top: Path, refseq: DNA, brotli_level: int, count_pos: bool, count_read: bool, write_read_names: bool, **kwargs)
Compute relationships for every SAM record in one batch.