Segment Tree

Overview

Segment tree is a data structure designed for storing one-dimensional intervals or segments, either overlapping or non-overlapping. It is useful for detecting all the segments that contain a specific point. Each segment has start and end positions where the start position is inclusive while the end position is not. The version of segment tree implemented in mdds allows associating a value with each segment so that you can use it as an associative container.

Quickstart

In this section we are going to demonstrate how to use segment_tree with some code examples. First, let’s include some headers:

#include <mdds/segment_tree.hpp>
#include <string>
#include <iostream>

The <mdds/segment_tree.hpp> header defines the mdds::segment_tree template class, and we need to include the <string> so that we can use std::string as its value type. We also need the <iostream> header to write some outputs. The next step is to define a type alias for the concrete type instance of segment_tree:

using tree_type = mdds::segment_tree<long, std::string>;

Here, we are using long as the key type and std::string as the value type. Let’s create an instance of this type and insert some segments with values:

tree_type tree;

tree.insert(0, 10, "A");
tree.insert(2, 15, "B");
tree.insert(-2, 5, "C");
tree.insert(5, 22, "D");
tree.insert(20, 35, "E");

We have created a new instance called tree, and inserted five segment values into it using the insert() method, which takes:

  • start position of the segment,

  • end position of the segment, and

  • the value associated with the segment

as its arguments. As mentioned earlier, the start position of a segment is inclusive but the end position isn’t. Once all the segment data have been inserted, the next step is to build the tree by simply calling the build_tree() method:

tree.build_tree();

Building the tree is required before you can perform any queries. Now that the tree is built, let’s run some queries. But first we are going to define the following helper function:

void search_and_print(const tree_type& tree, long pos)
{
    auto results = tree.search(pos);

    std::cout << "--" << std::endl;
    std::cout << "search at " << pos << std::endl;
    std::cout << "number of results: " << results.size() << std::endl;

    for (const auto& [start, end, value] : results)
        std::cout << "range: [" << start << ":" << end << "); value: " << value << std::endl;
}

This helper function takes the segment tree instance and a search point, performs a search by calling search(), and iterates through and prints its results. The return value from search() is of type search_results, which provides the size() method to quickly find the total number of the results. It also allows you to iterate through the results through its begin() end() methods, or simply by plugging it into a range-based for loop as you see in this function.

With this help function in place, let’s find all the segments that contain 5 by calling:

search_and_print(tree, 5);

which will print the following output:

--
search at 5
number of results: 3
range: [2:15); value: B
range: [0:10); value: A
range: [5:22); value: D

It’s worth pointing out that the results here don’t include the “C” segment, which extends from -2 to 5, because the end point is not inclusive. Given a search point of x, For a segment to be included in the results, it must satisfy start <= x < end. On the other hand, the results do include the “D” segment because the start point is inclusive.

Let’s do another search, and this time let’s find all the segments that contain 0:

search_and_print(tree, 0);

This will print the following output:

--
search at 0
number of results: 2
range: [-2:5); value: C
range: [0:10); value: A

The results look just about right. As an aside, it is entirely safe to call search() with points that are below the minimum position or above the maximum position in the tree; you will simply get empty results in such cases.

Removing segments

So far we have covered how to insert segment values and perform searches, but you can also remove values from the tree. There are two ways to remove values: one is to use an iterator from the search results object, and another is to specify a match condition predicate and remove all values that the predicate evaluates to true.

Removing with iterator

Let’s first cover how to remove a value with an iterator. Our goal here is to remove the segment value “A” that extends from 0 to 10. To obtain an iterator, you first need to perform a search then get an iterator from the results object. Once you have an iterator, iterate through the result set until it finds the right iterator position, then call erase() to remove that value, as the following code illustrates:

auto results = tree.search(5);
for (auto it = results.begin(); it != results.end(); ++it)
{
    if (it->value == "A")
    {
        tree.erase(it);
        break;
    }
}

Let’s run the same search with the search point of 5:

search_and_print(tree, 5);

This time it will produce:

--
search at 5
number of results: 2
range: [2:15); value: B
range: [5:22); value: D

As you can see, the “A” segment has been removed.

One thing to note is that removing a value does not invalidate the tree itself; you can continue to perform follow-up searches without having to re-build the tree. However, it does invalidate the iterators, which necessitates you to exit your iteration once a value has been removed using an iterator. Note that the search results object itself remains valid even after the value removal.

Even though removing a value does not invalidate the tree, if you remove a large enough number of values re-building it may reduce the overall size of the tree, as the size of the tree is dependent upon the number of unique end points of all the stored segments. And smaller the tree, the better the search performance.

Removing with predicate

Another way to remove values is to call erase_if() with a predicate that matches the value to be removed. The following code removes all the segments that contains 5:

auto pred = [](long start, long end, const std::string& /*value*/)
{
    return start <= 5 && 5 < end;
};

auto n = tree.erase_if(pred);

std::cout << "--" << std::endl;
std::cout << n << " segments have been removed" << std::endl;

The predicate function takes three parameters that are start position, end position, and a value of a segment. Running this code produces the following output:

--
3 segments have been removed

indicating that a total of 3 segments have been removed with this call. Running the same search again after the value removal:

search_and_print(tree, 5);

yields the following output:

--
search at 5
number of results: 0

indicating that the tree no longer stores segments that contain 5.

Performance considerations

Given a set of segments, the cost of building a tree increases with the number of unique start and end positions of all of the segments combined, as that determines how deep the tree has to be. The tree building process involves building the tree structure itself, followed by the “insertions” of individual segments into the tree. The insertion of a segment involves descending the tree from the root node and marking the appropriate nodes. This gets repeated for the number of segments to be inserted. This process is moderately expensive.

The search process, on the other hand, typically consists of two steps: 1) retrieving the results set, and 2) iterating through the results. Step 1 is proportional to the depth of the tree, and should be a quick process even with a sufficiently large number of segments. Step 2, on the other hand, is proportional to the number of the results one needs to iterate through, regardless of the size of the results set.

Removing a segment can be considerably expensive especially when a large number of segments need to be removed from the tree. Excessive use of either erase() or erase_if() is therefore not recommended. Use them sparingly.

Given these performance characteristics of segment_tree, an ideal use case for this data structure would be one where:

  • you already have all or most of the segments known ahead of time,

  • you can only build the tree once, and mostly need to run queries afterward, and

  • you need to remove segments only in some rare occasions, or none at all.

Note that occasionally updating the data set and re-building the tree is reasonable, as long as your use case can tolerate the cost of building the tree.

API Reference

template<typename KeyT, typename ValueT>
class segment_tree

Segment tree is a data structure designed for storing one-dimensional intervals or segments, either overlapping or non-overlapping. It is useful for detecting all the segments that contain a specific point.

Each segment has start and end positions where the start position is inclusive while the end position is not. This segment tree implementation allows associating a value with each segment so that you can use it as an associative container.

Template Parameters:
  • KeyT – Key type. The key type must be copyable. If it’s moveable then it gets moved instead of copied where possible. Additionally, the key type must support the <, <= and == operators. To use to_string(), the key type must support the ostream operator.

  • ValueT – Value type. The value type does not need to be copyable but must be at least moveable. Additionally, the value type must support the == operator. To use to_string(), the value type must support the ostream operator.

Public Types

using key_type = KeyT

The key type must be copyable, and may be moveable.

using value_type = ValueT

The value type must be either copyable or moveable.

using size_type = std::size_t
using node = st::detail::node<key_type, leaf_value_type>
using node_ptr = typename node::node_ptr
using nonleaf_node = typename st::detail::nonleaf_node<key_type, nonleaf_value_type>

Public Functions

segment_tree()
segment_tree(const segment_tree &r)
segment_tree(segment_tree &&r)
~segment_tree()
segment_tree &operator=(const segment_tree &r)
segment_tree &operator=(segment_tree &&r)
bool operator==(const segment_tree &r) const

Check equality with another instance.

Note

Equality between two segment_tree instances is evaluated by comparing the logically stored segments only; the tree parts of the structures are not compared.

bool operator!=(const segment_tree &r) const
inline bool valid_tree() const noexcept

Check whether or not the internal tree is in a valid state. The tree must be valid in order to perform searches.

Returns:

true if the tree is valid, false otherwise.

void build_tree()

Build or re-build a tree based on the current set of segments.

Note

Building a tree with no logical stored segments will not result in a valid tree.

void insert(key_type start_key, key_type end_key, value_type value)

Insert a new segment. Duplicate segments are allowed.

Parameters:
  • start_key – start key of a segment. The value is inclusive.

  • end_key – end key of a segment. The value is non-inclusive. It must be greater than the start key.

  • value – value to associate with this segment.

search_results search(const key_type &point) const

Search the tree to find all the segments that contain a specified point.

Note

You need to have a valid tree prior to calling this method. Call build_tree() to build one. To check if you have a valid tree, call valid_tree() and see if it returns true.

Note

A point value can be any value allowed by the key_type. If the point is outside of the range of the tree, empty results will be returned.

Parameters:

point – A point to search the tree with.

Returns:

Results object containing the segments that contain the specified point.

void erase(const typename search_results::const_iterator &pos)

Remove a segment referenced by an iterator. Calling this method will not invalidate the tree; however, you might want to re-build the tree to compact its size if you have removed a large number of segments.

Parameters:

pos – Position that references the segment to be removed.

template<typename Pred>
size_type erase_if(Pred pred)

Remove all segments that satisfy a predicate.

The predicate must take three parameters that are a start position, end position and a value of a segment in this order. The first two parameters are of the same type as the key_type while the last parameter is of the same type as the value_type.

// Given int as key_type and std::string as value_type, this predicate removes
// all the segments that 1) contain 5 and 2) store a value of "A".
auto pred = [](int start, int end, const std::string& value) {
    return start <= 5 && 5 < end && value == "A";
};
Parameters:

pred – Predicate that tests each segment. A segment that evaluates true by the predicate gets removed.

Returns:

Number of erased elements.

void swap(segment_tree &r) noexcept

Exchange the content of the tree with that of another.

Parameters:

r – Another tree instance to swap the contents with.

void clear()

Remove all segments data.

size_type size() const

Return the number of segments currently stored in this container.

bool empty() const

Return whether or not the container stores any segments or none at all.

size_type leaf_size() const

Return the number of leaf nodes.

Returns:

number of leaf nodes.

std::string to_string() const

Create a string representation of the internal state of a tree.

Returns:

String representation of the internal state of a tree.

std::vector<key_type> boundary_keys() const

Create a sorted sequence of unique boundary keys. A boundary key is a key that is either the start or the end key of a stored segment.

Returns:

A sorted sequence of unique boundary keys.

void check_integrity(const integrity_check_properties &props) const
struct integrity_check_properties

Public Members

std::vector<leaf_node> leaf_nodes
struct leaf_node

Public Members

key_type key = {}
std::vector<value_type> value_chain
class search_results : public mdds::segment_tree<KeyT, ValueT>::search_results_base

Public Functions

inline bool empty() const

Check if this results pool is empty or not.

Returns:

true if this results pool is empty, otherwise false.

inline size_type size() const

Count the number of results in this results pool.

Returns:

number of results in this results pool.

inline search_results::const_iterator begin() const
inline search_results::const_iterator end() const
class const_iterator : public mdds::segment_tree<KeyT, ValueT>::const_iterator_base

Public Functions

inline const_iterator()
inline const_iterator(const const_iterator &r)
inline const_iterator &operator=(const const_iterator &r)

Friends

friend class segment_tree< KeyT, ValueT >::search_results