Segment Tree¶
Overview¶
Segment tree is a data structure designed for storing one-dimensional intervals or segments, either overlapping or non-overlapping. It is useful for detecting all the segments that contain a specific point. Each segment has start and end positions where the start position is inclusive while the end position is not. The version of segment tree implemented in mdds allows associating a value with each segment so that you can use it as an associative container.
Quickstart¶
In this section we are going to demonstrate how to use
segment_tree
with some code examples. First, let’s include
some headers:
#include <mdds/segment_tree.hpp>
#include <string>
#include <iostream>
The <mdds/segment_tree.hpp>
header defines the
mdds::segment_tree
template class, and we need to include the
<string>
so that we can use std::string
as its value type. We also need
the <iostream>
header to write some outputs. The next step is to define a
type alias for the concrete type instance of segment_tree
:
using tree_type = mdds::segment_tree<long, std::string>;
Here, we are using long
as the key type and std::string
as the value type.
Let’s create an instance of this type and insert some segments with values:
tree_type tree;
tree.insert(0, 10, "A");
tree.insert(2, 15, "B");
tree.insert(-2, 5, "C");
tree.insert(5, 22, "D");
tree.insert(20, 35, "E");
We have created a new instance called tree
, and inserted five segment values
into it using the insert()
method, which takes:
start position of the segment,
end position of the segment, and
the value associated with the segment
as its arguments. As mentioned earlier, the start position of a segment is
inclusive but the end position isn’t. Once all the segment data have been
inserted, the next step is to build the tree by simply calling the
build_tree()
method:
tree.build_tree();
Building the tree is required before you can perform any queries. Now that the tree is built, let’s run some queries. But first we are going to define the following helper function:
void search_and_print(const tree_type& tree, long pos)
{
auto results = tree.search(pos);
std::cout << "--" << std::endl;
std::cout << "search at " << pos << std::endl;
std::cout << "number of results: " << results.size() << std::endl;
for (const auto& [start, end, value] : results)
std::cout << "range: [" << start << ":" << end << "); value: " << value << std::endl;
}
This helper function takes the segment tree instance and a search point,
performs a search by calling search()
, and
iterates through and prints its results. The return value from
search()
is of type
search_results
, which provides the
size()
method to quickly find
the total number of the results. It also allows you to iterate through the
results through its begin()
end()
methods, or simply by
plugging it into a range-based for loop as you see in this
function.
With this help function in place, let’s find all the segments that contain 5 by calling:
search_and_print(tree, 5);
which will print the following output:
--
search at 5
number of results: 3
range: [2:15); value: B
range: [0:10); value: A
range: [5:22); value: D
It’s worth pointing out that the results here don’t include the “C” segment,
which extends from -2 to 5, because the end point is not inclusive. Given a
search point of x
, For a segment to be included in the results, it must
satisfy start <= x < end
. On the other hand, the results do include the
“D” segment because the start point is inclusive.
Let’s do another search, and this time let’s find all the segments that contain 0:
search_and_print(tree, 0);
This will print the following output:
--
search at 0
number of results: 2
range: [-2:5); value: C
range: [0:10); value: A
The results look just about right. As an aside, it is entirely safe to call
search()
with points that are below the minimum
position or above the maximum position in the tree; you will simply get empty
results in such cases.
Removing segments¶
So far we have covered how to insert segment values and perform searches, but you can also remove values from the tree. There are two ways to remove values: one is to use an iterator from the search results object, and another is to specify a match condition predicate and remove all values that the predicate evaluates to true.
Removing with iterator¶
Let’s first cover how to remove a value with an iterator. Our goal here is to
remove the segment value “A” that extends from 0 to 10. To obtain an iterator,
you first need to perform a search then get an iterator from the results object.
Once you have an iterator, iterate through the result set until it finds the
right iterator position, then call erase()
to
remove that value, as the following code illustrates:
auto results = tree.search(5);
for (auto it = results.begin(); it != results.end(); ++it)
{
if (it->value == "A")
{
tree.erase(it);
break;
}
}
Let’s run the same search with the search point of 5:
search_and_print(tree, 5);
This time it will produce:
--
search at 5
number of results: 2
range: [2:15); value: B
range: [5:22); value: D
As you can see, the “A” segment has been removed.
One thing to note is that removing a value does not invalidate the tree itself; you can continue to perform follow-up searches without having to re-build the tree. However, it does invalidate the iterators, which necessitates you to exit your iteration once a value has been removed using an iterator. Note that the search results object itself remains valid even after the value removal.
Even though removing a value does not invalidate the tree, if you remove a large enough number of values re-building it may reduce the overall size of the tree, as the size of the tree is dependent upon the number of unique end points of all the stored segments. And smaller the tree, the better the search performance.
Removing with predicate¶
Another way to remove values is to call erase_if()
with a predicate that matches the value to be removed. The following code removes
all the segments that contains 5:
auto pred = [](long start, long end, const std::string& /*value*/)
{
return start <= 5 && 5 < end;
};
auto n = tree.erase_if(pred);
std::cout << "--" << std::endl;
std::cout << n << " segments have been removed" << std::endl;
The predicate function takes three parameters that are start position, end position, and a value of a segment. Running this code produces the following output:
--
3 segments have been removed
indicating that a total of 3 segments have been removed with this call. Running the same search again after the value removal:
search_and_print(tree, 5);
yields the following output:
--
search at 5
number of results: 0
indicating that the tree no longer stores segments that contain 5.
Performance considerations¶
Given a set of segments, the cost of building a tree increases with the number of unique start and end positions of all of the segments combined, as that determines how deep the tree has to be. The tree building process involves building the tree structure itself, followed by the “insertions” of individual segments into the tree. The insertion of a segment involves descending the tree from the root node and marking the appropriate nodes. This gets repeated for the number of segments to be inserted. This process is moderately expensive.
The search process, on the other hand, typically consists of two steps: 1) retrieving the results set, and 2) iterating through the results. Step 1 is proportional to the depth of the tree, and should be a quick process even with a sufficiently large number of segments. Step 2, on the other hand, is proportional to the number of the results one needs to iterate through, regardless of the size of the results set.
Removing a segment can be considerably expensive especially when a large number
of segments need to be removed from the tree. Excessive use of either
erase()
or
erase_if()
is therefore not recommended. Use
them sparingly.
Given these performance characteristics of segment_tree
, an
ideal use case for this data structure would be one where:
you already have all or most of the segments known ahead of time,
you can only build the tree once, and mostly need to run queries afterward, and
you need to remove segments only in some rare occasions, or none at all.
Note that occasionally updating the data set and re-building the tree is reasonable, as long as your use case can tolerate the cost of building the tree.
API Reference¶
-
template<typename KeyT, typename ValueT>
class segment_tree¶ Segment tree is a data structure designed for storing one-dimensional intervals or segments, either overlapping or non-overlapping. It is useful for detecting all the segments that contain a specific point.
Each segment has start and end positions where the start position is inclusive while the end position is not. This segment tree implementation allows associating a value with each segment so that you can use it as an associative container.
- Template Parameters:
KeyT – Key type. The key type must be copyable. If it’s moveable then it gets moved instead of copied where possible. Additionally, the key type must support the
<
,<=
and==
operators. To use to_string(), the key type must support the ostream operator.ValueT – Value type. The value type does not need to be copyable but must be at least moveable. Additionally, the value type must support the
==
operator. To use to_string(), the value type must support the ostream operator.
Public Types
-
using size_type = std::size_t¶
Public Functions
-
segment_tree()¶
-
segment_tree(const segment_tree &r)¶
-
segment_tree(segment_tree &&r)¶
-
~segment_tree()¶
-
segment_tree &operator=(const segment_tree &r)¶
-
segment_tree &operator=(segment_tree &&r)¶
-
bool operator==(const segment_tree &r) const¶
Check equality with another instance.
Note
Equality between two segment_tree instances is evaluated by comparing the logically stored segments only; the tree parts of the structures are not compared.
-
bool operator!=(const segment_tree &r) const¶
-
inline bool valid_tree() const noexcept¶
Check whether or not the internal tree is in a valid state. The tree must be valid in order to perform searches.
- Returns:
true if the tree is valid, false otherwise.
-
void build_tree()¶
Build or re-build a tree based on the current set of segments.
Note
Building a tree with no logical stored segments will not result in a valid tree.
-
void insert(key_type start_key, key_type end_key, value_type value)¶
Insert a new segment. Duplicate segments are allowed.
- Parameters:
start_key – start key of a segment. The value is inclusive.
end_key – end key of a segment. The value is non-inclusive. It must be greater than the start key.
value – value to associate with this segment.
-
search_results search(const key_type &point) const¶
Search the tree to find all the segments that contain a specified point.
Note
You need to have a valid tree prior to calling this method. Call build_tree() to build one. To check if you have a valid tree, call valid_tree() and see if it returns true.
Note
A point value can be any value allowed by the key_type. If the point is outside of the range of the tree, empty results will be returned.
- Parameters:
point – A point to search the tree with.
- Returns:
Results object containing the segments that contain the specified point.
-
void erase(const typename search_results::const_iterator &pos)¶
Remove a segment referenced by an iterator. Calling this method will not invalidate the tree; however, you might want to re-build the tree to compact its size if you have removed a large number of segments.
- Parameters:
pos – Position that references the segment to be removed.
-
template<typename Pred>
size_type erase_if(Pred pred)¶ Remove all segments that satisfy a predicate.
The predicate must take three parameters that are a start position, end position and a value of a segment in this order. The first two parameters are of the same type as the key_type while the last parameter is of the same type as the value_type.
// Given int as key_type and std::string as value_type, this predicate removes // all the segments that 1) contain 5 and 2) store a value of "A". auto pred = [](int start, int end, const std::string& value) { return start <= 5 && 5 < end && value == "A"; };
- Parameters:
pred – Predicate that tests each segment. A segment that evaluates true by the predicate gets removed.
- Returns:
Number of erased elements.
-
void swap(segment_tree &r) noexcept¶
Exchange the content of the tree with that of another.
- Parameters:
r – Another tree instance to swap the contents with.
-
void clear()¶
Remove all segments data.
-
bool empty() const¶
Return whether or not the container stores any segments or none at all.
-
std::string to_string() const¶
Create a string representation of the internal state of a tree.
- Returns:
String representation of the internal state of a tree.
-
std::vector<key_type> boundary_keys() const¶
Create a sorted sequence of unique boundary keys. A boundary key is a key that is either the start or the end key of a stored segment.
- Returns:
A sorted sequence of unique boundary keys.
-
void check_integrity(const integrity_check_properties &props) const¶
-
class search_results : public mdds::segment_tree<KeyT, ValueT>::search_results_base¶
Public Functions
-
inline bool empty() const¶
Check if this results pool is empty or not.
- Returns:
true if this results pool is empty, otherwise false.
-
inline size_type size() const¶
Count the number of results in this results pool.
- Returns:
number of results in this results pool.
-
inline search_results::const_iterator begin() const¶
-
inline search_results::const_iterator end() const¶
-
class const_iterator : public mdds::segment_tree<KeyT, ValueT>::const_iterator_base¶
Public Functions
-
inline const_iterator()¶
-
inline const_iterator(const const_iterator &r)¶
-
inline const_iterator &operator=(const const_iterator &r)¶
Friends
- friend class segment_tree< KeyT, ValueT >::search_results
-
inline const_iterator()¶
-
inline bool empty() const¶