seismicrna.relate package
Subpackages
- seismicrna.relate.aux package
- seismicrna.relate.cx package
- seismicrna.relate.py package
- seismicrna.relate.tests package
- Submodules
TestRelate
TestRelateEmpty
TestRelatePaired
TestRelatePaired.get_sam_data()
TestRelatePaired.test_batch_size_1()
TestRelatePaired.test_batch_size_4()
TestRelatePaired.test_batch_size_5()
TestRelatePaired.test_batch_size_6()
TestRelatePaired.test_clip()
TestRelatePaired.test_min_phred()
TestRelatePaired.test_min_reads()
TestRelatePaired.test_noargs()
TestRelateSingle
extract_batches()
load_refseq()
write_fasta_file()
write_sam_file()
TestCalcRelsLinesPaired
TestCalcRelsLinesPaired.evaluate()
TestCalcRelsLinesPaired.relate()
TestCalcRelsLinesPaired.relate_error()
TestCalcRelsLinesPaired.test_abut()
TestCalcRelsLinesPaired.test_contain()
TestCalcRelsLinesPaired.test_contain_flush3()
TestCalcRelsLinesPaired.test_contain_flush5()
TestCalcRelsLinesPaired.test_contain_flush53()
TestCalcRelsLinesPaired.test_diff_names()
TestCalcRelsLinesPaired.test_gap()
TestCalcRelsLinesPaired.test_improper()
TestCalcRelsLinesPaired.test_read_marks()
TestCalcRelsLinesPaired.test_read_orientation()
TestCalcRelsLinesPaired.test_staggered()
TestCalcRelsLinesPaired.test_unpaired()
TestCalcRelsLinesSingle
TestCalcRelsLinesSingle.iter_cases()
TestCalcRelsLinesSingle.iter_cases_insert3()
TestCalcRelsLinesSingle.relate()
TestCalcRelsLinesSingle.relate_error()
TestCalcRelsLinesSingle.relate_truncated()
TestCalcRelsLinesSingle.test_4nt_2ins()
TestCalcRelsLinesSingle.test_4nt_2ins_paired()
TestCalcRelsLinesSingle.test_5nt_2ins()
TestCalcRelsLinesSingle.test_6nt_2ins()
TestCalcRelsLinesSingle.test_7nt_0ins()
TestCalcRelsLinesSingle.test_8nt_0ins()
TestCalcRelsLinesSingle.test_all_matches()
TestCalcRelsLinesSingle.test_ambig_delet_low_qual()
TestCalcRelsLinesSingle.test_error_cigar_adj_ins_del()
TestCalcRelsLinesSingle.test_error_cigar_consecutive()
TestCalcRelsLinesSingle.test_error_cigar_del_first_rel()
TestCalcRelsLinesSingle.test_error_cigar_del_last_rel()
TestCalcRelsLinesSingle.test_error_cigar_empty()
TestCalcRelsLinesSingle.test_error_cigar_ins_first_rel()
TestCalcRelsLinesSingle.test_error_cigar_ins_last_rel()
TestCalcRelsLinesSingle.test_error_cigar_missing()
TestCalcRelsLinesSingle.test_error_cigar_op_read_diff()
TestCalcRelsLinesSingle.test_error_cigar_op_ref_long()
TestCalcRelsLinesSingle.test_error_cigar_op_ref_zero()
TestCalcRelsLinesSingle.test_error_cigar_parse()
TestCalcRelsLinesSingle.test_error_cigar_soft_clips()
TestCalcRelsLinesSingle.test_error_flag_large()
TestCalcRelsLinesSingle.test_error_flag_missing()
TestCalcRelsLinesSingle.test_error_flag_parse()
TestCalcRelsLinesSingle.test_error_line_improper_flag_proper()
TestCalcRelsLinesSingle.test_error_line_paired_flag_unpaired()
TestCalcRelsLinesSingle.test_error_line_unpaired_flag_paired()
TestCalcRelsLinesSingle.test_error_mapq()
TestCalcRelsLinesSingle.test_error_mapq_insufficient()
TestCalcRelsLinesSingle.test_error_mapq_missing()
TestCalcRelsLinesSingle.test_error_name_missing()
TestCalcRelsLinesSingle.test_error_pos_large()
TestCalcRelsLinesSingle.test_error_pos_missing()
TestCalcRelsLinesSingle.test_error_pos_parse()
TestCalcRelsLinesSingle.test_error_pos_zero()
TestCalcRelsLinesSingle.test_error_qual_missing()
TestCalcRelsLinesSingle.test_error_read_missing()
TestCalcRelsLinesSingle.test_error_read_qual_diff()
TestCalcRelsLinesSingle.test_error_ref_mismatch()
TestCalcRelsLinesSingle.test_error_ref_missing()
TestCalcRelsLinesSingle.test_example_1()
TestCalcRelsLinesSingle.test_example_2()
TestCalcRelsLinesSingle.test_example_3()
TestCalcRelsLinesSingle.test_long_ambindels()
TestCalcRelsLinesSingle.test_n_read()
TestCalcRelsLinesSingle.test_n_ref()
TestCalcRelsLinesSingle.test_soft_clips()
TestMergeMates
as_sam()
TestIterRecordsPaired
TestIterRecordsPaired.run_test_invalid()
TestIterRecordsPaired.run_test_valid()
TestIterRecordsPaired.test_blank()
TestIterRecordsPaired.test_one_improper()
TestIterRecordsPaired.test_one_proper()
TestIterRecordsPaired.test_one_single()
TestIterRecordsPaired.test_two_mated_improper()
TestIterRecordsPaired.test_two_mated_improper_1()
TestIterRecordsPaired.test_two_mated_improper_2()
TestIterRecordsPaired.test_two_mated_proper()
TestIterRecordsPaired.test_two_unmated_improper()
TestIterRecordsPaired.test_two_unmated_proper()
TestLineAttrs
delete_sam()
write_sam()
- Submodules
Submodules
- class seismicrna.relate.batch.FullReadBatch(*, batch: int)
-
- property max_read
Maximum possible value for a read index.
- property read_indexes
Map each read number to its index in self.read_nums.
- property read_nums
Read numbers.
- class seismicrna.relate.batch.FullRegionMutsBatch(*, region: Region, **kwargs)
Bases:
FullReadBatch
,RegionMutsBatch
,ABC
- class seismicrna.relate.batch.ReadNamesBatch(*, names: list[str] | ndarray, **kwargs)
Bases:
FullReadBatch
- property num_reads
Number of reads.
- classmethod simulate(batch: int, num_reads: int, formatter: ~typing.Callable[[int, int], str] = <function format_read_name>, **kwargs)
Simulate a batch.
- class seismicrna.relate.batch.RelateBatch(*, region: Region, **kwargs)
Bases:
FullRegionMutsBatch
- property read_weights
Weights for each read when computing counts.
- classmethod simulate(batch: int, ref: str, pmut: DataFrame, uniq_end5s: ndarray, uniq_end3s: ndarray, pends: ndarray, paired: bool, read_length: int, p_rev: float, min_mut_gap: int, num_reads: int, **kwargs)
Simulate a batch.
- Parameters:
batch (
int
) – Batch number.ref (
str
) – Name of the reference.pmut (
pd.DataFrame
) – Rate of each type of mutation at each position.uniq_end5s (
np.ndarray
) – Unique read 5’ end coordinates.uniq_end3s (
np.ndarray
) – Unique read 3’ end coordinates.pends (
np.ndarray
) – Probability of each set of unique end coordinates.paired (
bool
) – Whether to simulate paired-end or single-end reads.read_length (
int
) – Length of each read segment (paired-end reads only).p_rev (
float
) – Probability that mate 1 is reversed (paired-end reads only).min_mut_gap (
int
) – Minimum number of positions between two mutations.num_reads (
int
) – Number of reads in the batch.
- class seismicrna.relate.dataset.AverageDataset(report_file: Path, verify_times: bool = True)
-
Dataset of population average data.
- property best_k
Best number of clusters.
- property ks
Numbers of clusters.
- class seismicrna.relate.dataset.NamesDataset(report_file: Path, verify_times: bool = True)
Bases:
AverageDataset
,ABC
- classmethod kind()
- class seismicrna.relate.dataset.PoolDataset(report_file: Path, verify_times: bool = True)
Bases:
RelateDataset
,TallDataset
,MutsDataset
,MergedRegionDataset
Load pooled batches of relationships.
- classmethod get_dataset_load_func()
Function to load one constituent dataset.
- classmethod get_report_type()
Type of report.
- property region
Region of the dataset.
- class seismicrna.relate.dataset.PoolReadNamesDataset(report_file: Path, verify_times: bool = True)
Bases:
NamesDataset
,TallDataset
Pooled Dataset of read names.
- classmethod get_dataset_load_func()
Function to load one constituent dataset.
- classmethod get_report_type()
Type of report.
- class seismicrna.relate.dataset.ReadNamesDataset(report_file: Path, verify_times: bool = True)
Bases:
NamesDataset
,LoadedDataset
Dataset of read names from the Relate step.
- classmethod get_batch_type()
Type of batch.
- classmethod get_report_type()
Type of report.
- property pattern
Pattern of mutations to count.
- class seismicrna.relate.dataset.RelateDataset(report_file: Path, verify_times: bool = True)
Bases:
AverageDataset
,ABC
Dataset of relationships.
- class seismicrna.relate.dataset.RelateMutsDataset(report_file: Path, verify_times: bool = True)
Bases:
RelateDataset
,LoadedDataset
,MutsDataset
Dataset of mutations from the Relate step.
- classmethod get_batch_type()
Type of batch.
- classmethod get_report_type()
Type of report.
- property paired
Whether the reads are paired-end.
- property pattern
Pattern of mutations to count.
- property refseq
Sequence of the reference.
- property region
Region of the dataset.
- class seismicrna.relate.io.ReadNamesBatchIO(*, sample: str, ref: str, **kwargs)
Bases:
ReadBatchIO
,RelateIO
,ReadNamesBatch
- classmethod file_seg_type()
Type of the last segment in the path.
- class seismicrna.relate.io.RelateBatchIO(*args, region: Region, **kwargs)
Bases:
MutsBatchIO
,RelateIO
,RelateBatch
- classmethod file_seg_type()
Type of the last segment in the path.
- class seismicrna.relate.io.RelateIO(*, sample: str, ref: str, **kwargs)
-
- classmethod auto_fields()
Names and automatic values of selected fields.
- seismicrna.relate.io.from_reads(reads: Iterable[tuple[str, tuple[tuple[list[int], list[int]], dict[int, int]]]], sample: str, ref: str, refseq: DNA, batch: int, write_read_names: bool)
Accumulate reads into relation vectors.
- seismicrna.relate.main.check_duplicates(xam_files: list[Path])
Check if any sample-reference pair occurs more than once.
- seismicrna.relate.main.run(fasta: str | Path, input_path: Iterable[str | Path], *, out_dir: str | Path = './out', min_reads: int = 1000, min_mapq: int = 25, phred_enc: int = 33, min_phred: int = 25, batch_size: int = 65536, insert3: bool = True, ambindel: bool = True, overhangs: bool = True, clip_end5: int = 4, clip_end3: int = 4, sep_strands: bool = False, rev_label: str = '-rev', write_read_names: bool = False, relate_pos_table: bool = True, relate_read_table: bool = False, relate_cx: bool = True, max_procs: int = 4, brotli_level: int = 10, force: bool = False, keep_tmp: bool = False, tmp_pfx='./tmp')
Compute relationships between references and aligned reads.
- Parameters:
out_dir (
str | pathlib._local.Path
) – Write all output files to this directory [keyword-only, default: ‘./out’]min_reads (
int
) – Discard alignment maps with fewer than this many reads [keyword-only, default: 1000]min_mapq (
int
) – Discard reads with mapping qualities below this threshold [keyword-only, default: 25]phred_enc (
int
) – Specify the Phred score encoding of FASTQ and SAM/BAM/CRAM files [keyword-only, default: 33]min_phred (
int
) – Mark base calls with Phred scores lower than this threshold as ambiguous [keyword-only, default: 25]batch_size (
int
) – Limit batches to at most this many reads [keyword-only, default: 65536]insert3 (
bool
) – Mark each insertion on the base to its 3’ (True) or 5’ (False) side [keyword-only, default: True]ambindel (
bool
) – Mark all ambiguous insertions and deletions (indels) [keyword-only, default: True]overhangs (
bool
) – Retain the overhangs of paired-end mates that dovetail [keyword-only, default: True]clip_end5 (
int
) – Clip this many bases from the 5’ end of each read [keyword-only, default: 4]clip_end3 (
int
) – Clip this many bases from the 3’ end of each read [keyword-only, default: 4]sep_strands (
bool
) – Separate each alignment map into forward- and reverse-strand reads [keyword-only, default: False]rev_label (
str
) – With –sep-strands, add this label to each reverse-strand reference [keyword-only, default: ‘-rev’]write_read_names (
bool
) – Write the name of each read in a second set of batches (necessary for the options –mask-read or –mask-read-file) [keyword-only, default: False]relate_pos_table (
bool
) – Tabulate relationships per position for relate data [keyword-only, default: True]relate_read_table (
bool
) – Tabulate relationships per read for relate data [keyword-only, default: False]relate_cx (
bool
) – Use a fast (C extension module) version of the relate algorithm; the slow (Python) version is still avilable as a fallback if the C extension cannot be loaded, and for debugging/benchmarking [keyword-only, default: True]max_procs (
int
) – Run up to this many processes simultaneously [keyword-only, default: 4]brotli_level (
int
) – Compress pickle files with this level of Brotli (0 - 11) [keyword-only, default: 10]force (
bool
) – Force all tasks to run, overwriting any existing output files [keyword-only, default: False]keep_tmp (
bool
) – Keep temporary files after finishing [keyword-only, default: False]tmp_pfx – Write all temporary files to a directory with this prefix [keyword-only, default: ‘./tmp’]
- class seismicrna.relate.report.PoolReport(**kwargs: Any | Callable[[Report], Any])
-
- classmethod auto_fields()
Names and automatic values of selected fields.
- classmethod fields()
All fields of the report.
- classmethod file_seg_type()
Type of the last segment in the path.
- classmethod path_segs()
- class seismicrna.relate.report.RelateReport(**kwargs: Any | Callable[[Report], Any])
Bases:
BatchedRefseqReport
,RelateIO
- classmethod fields()
All fields of the report.
- classmethod file_seg_type()
Type of the last segment in the path.
- refseq_file(top: Path)
- seismicrna.relate.report.refseq_file_auto_fields()
- seismicrna.relate.report.refseq_file_seg_types()
- class seismicrna.relate.sam.XamViewer(xam_input: Path, tmp_dir: Path, batch_size: int, n_procs: int = 1)
Bases:
object
- create_tmp_sam()
Create the temporary SAM file.
- delete_tmp_sam()
Delete the temporary SAM file.
- property flagstats
- property indexes
- property n_reads
Total number of reads.
- property paired
Whether the reads are paired.
- property ref
- property sample
- property tmp_sam_path
Get the path to the temporary SAM file.
- seismicrna.relate.sam.line_attrs(line: str) tuple[str, bool, bool]
Read attributes from a line in a SAM file.
- seismicrna.relate.sam.tmp_xam_cmd(xam_in: Path, xam_out: Path, paired: bool, n_procs: int = 1)
Collate and create a temporary XAM file.
- seismicrna.relate.sim.simulate_batch(sample: str, ref: str, batch: int, write_read_names: bool, pmut: ~pandas.core.frame.DataFrame, uniq_end5s: ~numpy.ndarray, uniq_end3s: ~numpy.ndarray, pends: ~numpy.ndarray, paired: bool, read_length: int, p_rev: float, min_mut_gap: int, num_reads: int, formatter: ~typing.Callable[[int, int], str] = <function format_read_name>)
Simulate a pair of RelateBatchIO and ReadNamesBatchIO.
- seismicrna.relate.sim.simulate_batches(batch_size: int, pmut: DataFrame, pclust: Series, num_reads: int, **kwargs)
- seismicrna.relate.sim.simulate_cluster(first_batch: int, batch_size: int, num_reads: int, **kwargs)
Simulate all batches for one cluster.
- seismicrna.relate.sim.simulate_relate(*, out_dir: Path, tmp_dir: Path, sample: str, ref: str, refseq: DNA, batch_size: int, num_reads: int, write_read_names: bool, pmut: DataFrame, uniq_end5s: ndarray, uniq_end3s: ndarray, pends: ndarray, pclust: Series, brotli_level: int, force: bool, **kwargs)
Simulate an entire relate step.
- seismicrna.relate.strands.generate_both_strands(ref: str, seq: DNA, rev_label: str)
Yield both the forward and reverse strand for each sequence.
- seismicrna.relate.strands.write_both_strands(fasta_in: Path, fasta_out: Path, rev_label: str)
Write a FASTA file of both forward and reverse strands.
- class seismicrna.relate.table.AverageTable
Bases:
RelTypeTable
,ABC
Average over an ensemble of RNA structures.
- classmethod header_type()
Type of the header for the table.
- class seismicrna.relate.table.AverageTabulator(*, top: Path, sample: str, region: Region, count_ends: bool, count_pos: bool, count_read: bool, validate: bool = True)
-
- property data_per_clust
Series of per-cluster data (or None if no clusters).
- class seismicrna.relate.table.FullPositionTable
Bases:
FullTable
,PositionTable
,ABC
- classmethod path_segs()
Table’s path segments.
- class seismicrna.relate.table.FullReadTable
Bases:
FullTable
,ReadTable
,ABC
- classmethod path_segs()
Table’s path segments.
- class seismicrna.relate.table.FullTabulator(*, ref: str, refseq: DNA, count_ends: bool = False, **kwargs)
-
- classmethod get_null_value()
The null value for a count: either 0 or NaN.
- class seismicrna.relate.table.PositionTableLoader(table_file: Path)
Bases:
RelTypeTableLoader
,PositionTable
,ABC
Load data indexed by position.
- class seismicrna.relate.table.ReadTableLoader(table_file: Path)
Bases:
RelTypeTableLoader
,ReadTable
,ABC
Load data indexed by read.
- class seismicrna.relate.table.RelTypeTableLoader(table_file: Path)
Bases:
TableLoader
,RelTypeTable
,ABC
Load a table of relationship types.
- property data: DataFrame
Table’s data.
- class seismicrna.relate.table.RelateBatchTabulator(*, get_batch_count_all: Callable, num_batches: int, max_procs: int = 1, **kwargs)
Bases:
BatchTabulator
,RelateTabulator
- class seismicrna.relate.table.RelateCountTabulator(*, batch_counts: Iterable[tuple[Any, Any, Any, Any]], **kwargs)
Bases:
CountTabulator
,RelateTabulator
- class seismicrna.relate.table.RelateDatasetTabulator(*, dataset: MutsDataset, validate: bool = False, **kwargs)
Bases:
DatasetTabulator
,RelateTabulator
- classmethod init_kws()
Attributes of the dataset to use as keyword arguments in super().__init__().
- classmethod load_function()
LoadFunction for all Dataset types for this Tabulator.
- class seismicrna.relate.table.RelatePositionTable
Bases:
RelateTable
,FullPositionTable
,ABC
- class seismicrna.relate.table.RelatePositionTableLoader(table_file: Path)
Bases:
PositionTableLoader
,RelatePositionTable
Load relate data indexed by position.
- class seismicrna.relate.table.RelateReadTable
Bases:
RelateTable
,FullReadTable
,ABC
- class seismicrna.relate.table.RelateReadTableLoader(table_file: Path)
Bases:
ReadTableLoader
,RelateReadTable
Load relate data indexed by read.
- class seismicrna.relate.table.RelateReadTableWriter(tabulator: Tabulator)
Bases:
ReadTableWriter
,RelateReadTable
- class seismicrna.relate.table.RelateTable
Bases:
AverageTable
,ABC
- classmethod kind()
Kind of table.
- class seismicrna.relate.table.RelateTabulator(*, ref: str, refseq: DNA, count_ends: bool = False, **kwargs)
Bases:
FullTabulator
,AverageTabulator
,ABC
- classmethod table_types()
Types of tables that this tabulator can write.
- class seismicrna.relate.table.TableLoader(table_file: Path)
-
Load a table from a file.
- classmethod find_tables(paths: Iterable[str | Path])
Yield files of the tables within the given paths.
- property refseq
Reference sequence.
- property top: Path
Path of the table’s output directory.
- class seismicrna.relate.write.RelationWriter(xam_view: XamViewer, refseq: DNA)
Bases:
object
Compute and write relationships for all reads from one sample aligned to one reference sequence.
- property num_reads
- property ref
- property sample
- write(*, out_dir: Path, release_dir: Path, min_mapq: int, min_reads: int, min_phred: int, phred_enc: int, insert3: bool, ambindel: bool, overhangs: bool, clip_end5: int, clip_end3: int, relate_pos_table: bool, relate_read_table: bool, brotli_level: int, force: bool, n_procs: int, **kwargs)
Compute relationships for every record in a XAM file.
- seismicrna.relate.write.generate_batch(batch: int, *, xam_view: XamViewer, top: Path, refseq: DNA, brotli_level: int, count_pos: bool, count_read: bool, write_read_names: bool, **kwargs)
Compute relationships for every SAM record in one batch.