Relate Batch
Each batch of relation vectors is a RelateBatchIO object.
Relate batch: Structure
The following attributes encode the relationships between each read and each position in the reference sequence:
| Attribute | Data Type | Description | 
|---|---|---|
| 
 | 
 | array of the first position of the most upstream mate in each read | 
| 
 | 
 | array of the first position of the most downstream mate in each read | 
| 
 | 
 | array of the last position of the most upstream mate in each read | 
| 
 | 
 | array of the last position of the most downstream mate in each read | 
| 
 | 
 | array of the reads with each type of mutation at each position | 
Note
The positions of the first and last bases in the reference sequence are defined to be 1 and the length of the sequence, respectively.
Relate batch: Structure of read numbers
Each read, or pair of paired-end reads, is labeled with a non-negative integer: 0 for the first read in each batch, and incrementing by 1 for each subsequent read. Within one batch, all read numbers are unique. However, two different batches can have reads that share numbers.
Relate batch: Structure of 5’ and 3’ end positions
- end5s,- mid5s,- mid3s, and- end3sare all 1-dimensional- numpy.ndarrayobjects.
- For any relate batch, - end5s,- mid5s,- mid3s, and- end3sall have the same length (which may be any integer ≥ 0).
- A read with index - icorresponds to the- ith values of- end5s,- mid5s,- mid3s, and- end3s; denoted (respectively)- end5s[i],- mid5s[i],- mid3s[i], and- end3s[i].
- For every read - i:- 1 ≤ - end5s[i]≤- end3s[i]≤ length of reference sequence
- If paired-end and there is a gap of ≥ 1 nt between mates 1 and 2: - end5s[i]≤- mid3s[i]<- mid5s[i]≤- end3s[i]
 
- Otherwise: - end5s[i]=- mid5s[i]≤- mid3s[i]=- end3s[i]
 
 
Relate batch: Structure of mutations
muts is a dict wherein
- each key is a position in the reference sequence ( - int)
- each value is a - dictwherein- each key is a type of mutation ( - int, see Relation Vectors for more information)
- each value is an array of the numbers of the reads that have the given type of mutation at the given position ( - numpy.ndarray)
 
Relate batch: Example
For example, suppose that the reference sequence is TCAGAACC and a
batch contains five paired-end reads, numbered 0 to 4:
| Read | Mate | Alignment | 
|---|---|---|
| 0 | 1 | 
 | 
| 0 | 2 | 
 | 
| 1 | 1 | 
 | 
| 1 | 2 | 
 | 
| 2 | 1 | 
 | 
| 2 | 2 | 
 | 
| 3 | 1 | 
 | 
| 3 | 2 | 
 | 
| 4 | 1 | 
 | 
| 4 | 2 | 
 | 
| Ref | 
 | 
The positions, reads, and relationships can be shown explicitly as a matrix (see Relation Vectors for information on the relationship codes):
| Read | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 
|---|---|---|---|---|---|---|---|---|
| 0 | 255 | 1 | 1 | 1 | 255 | 1 | 64 | 1 | 
| 1 | 1 | 1 | 128 | 1 | 128 | 1 | 255 | 255 | 
| 2 | 255 | 1 | 1 | 255 | 1 | 1 | 1 | 255 | 
| 3 | 1 | 16 | 1 | 1 | 128 | 255 | 1 | 1 | 
| 4 | 255 | 255 | 1 | 1 | 3 | 3 | 1 | 255 | 
In a relate batch, they would be encoded as follows:
- end5s:- [2, 1, 2, 1, 3]
- mid5s:- [4, 1, 3, 5, 3]
- mid3s:- [6, 6, 5, 7, 7]
- end3s:- [8, 6, 7, 8, 7]
- muts:- {1: {}, 2: {16: [3]}, 3: {128: [1]}, 4: {}, 5: {3: [4], 128: [1, 3]}, 6: {3: [4]}, 7: {64: [0]}, 8: {}} - Note that the numbers are shown here for visual simplicity as - listobjects, but would really be- numpy.ndarrayobjects.