The STL provides several standard algorithms for the processing of elements
of collections. These algorithms offer general fundamental services, such as
searching, sorting, copying, reordering, modifying, and numeric processing.
Algorithms are not member functions of the container classes. Instead, they
are global functions that operate with iterators. This has an important
advantage: Instead of each algorithm being implemented for each container type,
all are implemented only once for any container type. The algorithm might even
operate on elements of different container types. You can also use the
algorithms for user-defined container types. All in all, this concept reduces
the amount of code and increases the power and the flexibility of the
library.
Note that this is not an object-oriented programming paradigm; it is a
generic functional programming paradigm. Instead of data and operations being
unified, as in object-oriented programming, they are separated into distinct
parts that can interact via a certain interface. However, this concept also has
its price: First, the usage is not intuitive. Second, some combinations of data
structures and algorithms might not work. Even worse, a combination of a
container type and an algorithm might be possible but not useful (for example,
it may lead to bad performance). Thus, it is important to learn the concepts and
the pitfalls of the STL to benefit from it without abusing it. I provide
examples and more details about this throughout the rest of this post.
Let's start with a simple example of the use of STL algorithms. Consider the
following program, which shows some algorithms and their usage:
To be able to call the algorithms, you must include the header file <algorithm>:
The first two algorithms called are min_element() and
max_element(). They are called with two parameters that
define the range of the processed elements. To process all elements of a
container you simply use begin() and end(). Both algorithms return an iterator for the minimum
and maximum elements respectively. Thus, in the statement
the min_element() algorithm returns the position of
the minimum element (if there is more than one, the algorithm returns the
first). The next statement prints that element:
Of course, you could do both in one statement:
The next algorithm called is sort(). As the name
indicates, it sorts the elements of the range defined by the two arguments. As
usual, you could pass an optional sorting criterion. The default sorting
criterion is operator <. Thus, in this example all
elements of the container are sorted in ascending order:
So afterward, the vector contains the elements in this order:
The find() algorithm searches for a value inside the
given range. In this example, it searches the first element that is equal to the
value 3 in the whole container:
If the find() algorithm is successful, it returns the
iterator position of the element found. If it fails, it returns the end of the
range, the past-the-end iterator, which is passed as the second argument. In
this example, the value 3 is found as the third element,
so afterward pos refers to the third element of coll.
The last algorithm called in the example is reverse(), which reverses the elements of the passed range.
Here the third element that was found by the find()
algorithms and the past-the-end iterator are passed as arguments:
This call reverses the order of the third element up to the last one. The
output of the program is as follows:
Ranges
All algorithms process one or more ranges of elements. Such a range
might, but is not required to, embrace all elements of a container. Therefore,
to be able to handle subsets of container elements, you pass the beginning and
the end of the range as two separate arguments rather than the whole collection
as one argument.
This interface is flexible but dangerous. The caller must ensure that the
first and second arguments define a valid range. This is the case if the
end of the range is reachable from the beginning by iterating through the
elements. This means, it is up to the programmer to ensure that both iterators
belong to the same container and that the beginning is not behind the end. If
this is not the case, the behavior is undefined and endless loops or forbidden
memory access may result. In this respect, iterators are just as unsafe as
ordinary pointers. But note that undefined behavior also means that an
implementation of the STL is free to find such kinds of errors and handle them
accordingly. The following paragraphs show that ensuring that ranges are valid
is not always as easy as it sounds.
Every algorithm processes half-open ranges. Thus, a range is defined
so that it includes the position used as the beginning of the range but excludes
the position used as the end. This concept is often described by using the
traditional mathematical notations for half-open ranges:
[begin,end)or
[begin,end[
We use the first alternative in this series
The half-open range concept has the advantages that were described on page 84
(it is simple and avoids special handling for empty collections). However, it
also has some disadvantages. Consider the following example:
In this example, the collection is initialized with integral values from 20 to 40. When the search for an
element with the value 3 fails, find() returns the end of the processed range (coll.end() in this example) and assigns it to pos. Using
that return value as the beginning of the range in the following call of reverse() poses no problem because it results in the
following call:
This is simply a call to reverse an empty range. Thus, it is an operation
that has no effect (a so-called "no-op").
However, if find() is used to find the first and the
last elements of a subset, you should consider that passing these iterator
positions as a range will exclude the last element. So, the first call of max_element()
finds 34 and not 35:
To process the last element, you have to pass the position that is one past the last element:
Doing this yields the correct result:
Note that this example uses a list as the container. Thus, you must use
operator ++ to get the position that is behind pos35. If you have random access iterators, as with vectors
and deques, you also could use the expression pos35 + 1.
This is because random access iterators allow "iterator arithmetic".
Of course, you could use pos25 and pos35 to find something in that subrange. Again, to search
including pos35 you have to pass the position after pos35. For example:
All the examples in this section work only because you know that pos25 is in front of pos35.
Otherwise, [pos25,pos35) would not be a valid range. If
you are not sure which element is in front, things are getting more complicated
and undefined behavior may occur.
Suppose you don't know whether the element that has value 25 is in front of the element that has value 35. It might even be possible that one or both values are
not present. By using random access iterators, you can call operator < to check this:
However, without random access iterators you have no simple, fast way to find
out which iterator is in front. You can only search for one iterator in the
range of the beginning to the other iterator or in the range of the other
iterator to the end. In this case, you should change your algorithm as follows:
Instead of searching for both values in the whole source range, you should try
to find out, while searching for them, which value comes first. For example:
In contrast to the previous version, here you don't search for pos35 in the full range of all elements of coll. Instead, you first search for it from the beginning to
pos25. Then, if it's not found, you search for it in the
part that contains the remaining elements after pos25.
As a result you know which iterator position comes first and which subrange is
valid.
This implementation is not very efficient. A more efficient way to find the
first element that either has value 25 or value 35 is to search exactly for that. You could do this by using
some abilities of the STL that are not introduced yet as follows:
Here, a special expression is used as a sorting criterion that allows a search of the first element that has either value 25 or value 35.
Handling Multiple Ranges
Several algorithms process more than one range. In this case you usually must
define both the beginning and the end only for the first range. For all other
ranges you need to pass only their beginnings. The ends of the other ranges
follow from the number of elements of the first range. For example, the
following call of equal() compares all elements of the
collection coll1 element-by-element with the elements of
coll2 beginning with its first element:
Thus, the number of elements of coll2 that are
compared with the elements of coll1 is specified
indirectly by the number of elements in coll1.
This leads to an important consequence: When you call algorithms for multiple
ranges, make sure the second and additional ranges have at least as many
elements as the first range. In particular, make sure that destination ranges
are big enough for algorithms that write to collections!
Consider the following program:
Here, the copy() algorithm is called. It simply
copies all elements of the first range into the destination range. As usual, for
the first range, the beginning and the end are defined, whereas for the second
range, only the beginning is specified. However, the algorithm overwrites rather
than inserts. So, the algorithm requires that the destination has enough
elements to be overwritten. If there is not enough room, as in this case, the
result is undefined behavior. In practice, this often means that you overwrite
whatever comes after the coll2.end(). If you're in luck,
you'll get a crash; at least then you'll know that you did something wrong.
However, you can force your luck by using a safe version of the STL for which
the undefined behavior is defined as leading to a certain error handling
procedure.
To avoid these errors, you can (1) ensure that the destination has enough
elements on entry, or (2) use insert iterators. Insert iterators are
covered incoming posts, We'll first
explain how to modify the destination so that it is big enough on entry.
To make the destination big enough, you must either create it with the
correct size or change its size explicitly. Both alternatives apply only to
sequence containers (vectors, deques, and lists). This is not really a problem
because associative containers cannot be used as a destination for purposes for
overwriting algorithms. The following program shows how to increase the size of
containers:
Here, resize() is used to change the number of
elements in the existing container coll2:
coll3 is initialized with a special initial size so that it has enough room for all elements of coll1:
Note that both resizing and initializing the size create new elements. These
elements are initialized by their default constructor because no arguments are
passed to them. You can pass an additional argument both for the constructor and
for resize() to initialize the new elements.
-----------------------------------------------------------------
See Also:
-----------------------------------------------------------------
See Also:
-----------------------------------------------------------------
- Complete Tutorial of C++ Template's
- Standard Template Library Tutorial
- Inter Process Communication Tutorial
- Advance Programming in C & C++
-----------------------------------------------------------------
No comments:
Post a Comment