Array operations
Low level operations that are used to implement the genomic interval operations.
Functions:
|
Create concatenated ranges of integers for multiple start/length. |
|
For every interval in set 1, return the indices of k closest intervals from set 2. |
|
Interweave two arrays. |
|
Merge overlapping intervals. |
|
Take two sets of intervals and return the indices of pairs of overlapping intervals. |
|
Take two sets of intervals and return the indices of pairs of overlapping intervals, as well as the indices of the intervals that do not overlap any other interval. |
|
Calculate sums of slices of an array. |
- arange_multi(starts, stops=None, lengths=None)[source]
Create concatenated ranges of integers for multiple start/length.
- Parameters:
starts (numpy.ndarray) – Starts for each range
stops (numpy.ndarray) – Stops for each range
lengths (numpy.ndarray) – Lengths for each range. Either stops or lengths must be provided.
- Returns:
concat_ranges – Concatenated ranges.
- Return type:
numpy.ndarray
Notes
See the following illustrative example:
starts = np.array([1, 3, 4, 6]) stops = np.array([1, 5, 7, 6])
print arange_multi(starts, lengths) >>> [3 4 4 5 6]
- closest_intervals(starts1, ends1, starts2=None, ends2=None, k=1, tie_arr=None, ignore_overlaps=False, ignore_upstream=False, ignore_downstream=False, direction=None)[source]
For every interval in set 1, return the indices of k closest intervals from set 2.
- Parameters:
starts1 (numpy.ndarray) – Interval coordinates. Warning: if provided as pandas.Series, indices will be ignored. If start2 and ends2 are None, find closest intervals within the same set.
ends1 (numpy.ndarray) – Interval coordinates. Warning: if provided as pandas.Series, indices will be ignored. If start2 and ends2 are None, find closest intervals within the same set.
starts2 (numpy.ndarray) – Interval coordinates. Warning: if provided as pandas.Series, indices will be ignored. If start2 and ends2 are None, find closest intervals within the same set.
ends2 (numpy.ndarray) – Interval coordinates. Warning: if provided as pandas.Series, indices will be ignored. If start2 and ends2 are None, find closest intervals within the same set.
k (int) – The number of neighbors to report.
tie_arr (numpy.ndarray or None) – Extra data describing intervals in set 2 to break ties when multiple intervals are located at the same distance. Intervals with lower tie_arr values will be given priority.
ignore_overlaps (bool) – If True, ignore set 2 intervals that overlap with set 1 intervals.
ignore_upstream (bool) – If True, ignore set 2 intervals upstream/downstream of set 1 intervals.
ignore_downstream (bool) – If True, ignore set 2 intervals upstream/downstream of set 1 intervals.
direction (numpy.ndarray with dtype bool or None) – Strand vector to define the upstream/downstream orientation of the intervals.
- Returns:
closest_ids – An Nx2 array containing the indices of pairs of closest intervals. The 1st column contains ids from the 1st set, the 2nd column has ids from the 2nd set.
- Return type:
numpy.ndarray
- interweave(a, b)[source]
Interweave two arrays.
- Parameters:
a (numpy.ndarray) – Arrays to interweave, must have the same length/
b (numpy.ndarray) – Arrays to interweave, must have the same length/
- Returns:
out – Array of interweaved values from a and b.
- Return type:
numpy.ndarray
Notes
From https://stackoverflow.com/questions/5347065/interweaving-two-numpy-arrays
- merge_intervals(starts, ends, min_dist=0)[source]
Merge overlapping intervals.
- Parameters:
starts (numpy.ndarray) – Interval coordinates. Warning: if provided as pandas.Series, indices will be ignored.
ends (numpy.ndarray) – Interval coordinates. Warning: if provided as pandas.Series, indices will be ignored.
min_dist (float or None) – If provided, merge intervals separated by this distance or less. If None, do not merge non-overlapping intervals. Using min_dist=0 and min_dist=None will bring different results. bioframe uses semi-open intervals, so interval pairs [0,1) and [1,2) do not overlap, but are separated by a distance of 0. Such intervals are not merged when min_dist=None, but are merged when min_dist=0.
- Returns:
cluster_ids (numpy.ndarray) – The indices of interval clusters that each interval belongs to.
cluster_starts (numpy.ndarray)
cluster_ends (numpy.ndarray) – The spans of the merged intervals.
Notes
From https://stackoverflow.com/questions/43600878/merging-overlapping-intervals/58976449#58976449
- overlap_intervals(starts1, ends1, starts2, ends2, closed=False, sort=False)[source]
Take two sets of intervals and return the indices of pairs of overlapping intervals.
- Parameters:
starts1 (numpy.ndarray) – Interval coordinates. Warning: if provided as pandas.Series, indices will be ignored.
ends1 (numpy.ndarray) – Interval coordinates. Warning: if provided as pandas.Series, indices will be ignored.
starts2 (numpy.ndarray) – Interval coordinates. Warning: if provided as pandas.Series, indices will be ignored.
ends2 (numpy.ndarray) – Interval coordinates. Warning: if provided as pandas.Series, indices will be ignored.
closed (bool) – If True, then treat intervals as closed and report single-point overlaps.
- Returns:
overlap_ids – An Nx2 array containing the indices of pairs of overlapping intervals. The 1st column contains ids from the 1st set, the 2nd column has ids from the 2nd set.
- Return type:
numpy.ndarray
- overlap_intervals_outer(starts1, ends1, starts2, ends2, closed=False)[source]
Take two sets of intervals and return the indices of pairs of overlapping intervals, as well as the indices of the intervals that do not overlap any other interval.
- Parameters:
starts1 (numpy.ndarray) – Interval coordinates. Warning: if provided as pandas.Series, indices will be ignored.
ends1 (numpy.ndarray) – Interval coordinates. Warning: if provided as pandas.Series, indices will be ignored.
starts2 (numpy.ndarray) – Interval coordinates. Warning: if provided as pandas.Series, indices will be ignored.
ends2 (numpy.ndarray) – Interval coordinates. Warning: if provided as pandas.Series, indices will be ignored.
closed (bool) – If True, then treat intervals as closed and report single-point overlaps.
- Returns:
overlap_ids (numpy.ndarray) – An Nx2 array containing the indices of pairs of overlapping intervals. The 1st column contains ids from the 1st set, the 2nd column has ids from the 2nd set.
no_overlap_ids1, no_overlap_ids2 (numpy.ndarray) – Two 1D arrays containing the indices of intervals in sets 1 and 2 respectively that do not overlap with any interval in the other set.