A updated version of this article is available: C++ benchmark – std::vector VS std::list VS std::deque
In C++, the two most used data structures are the std::vector and the std::list. In this article, we will compare the performance in practice of these two data structures on several different workloads. In this article, when I talk about a list it is the std::list implementation and vector refers to the std::vector implementation.
It is generally said that a list should be used when random insert and remove will be performed (performed in O(1) versus O(n) for a vector). If we look only at the complexity, search in both data structures should be roughly equivalent, complexity being in O(n). When random insert/replace operations are performed on a vector, all the subsequent data needs to be moved and so each element will be copied. That is why the size of the data type is an important factor when comparing those two data structures.
However, in practice, there is a huge difference, the usage of the memory caches. All the data in a vector is contiguous where the std::list allocates separately memory for each element. How does that change the results in practice ?
Keep in mind that all the tests performed are made on vector and list even if other data structures could be better suited to the given workload.
In the graphs and in the text, n is used to refer to the number of elements of the collection.
All the tests performed have been performed on an Intel Core i7 Q 820 @ 1.73GHz. The code has been compiled in 64 bits with GCC 4.7.2 with -02 and -march=native. The code has been compiled with C++11 support (-std=c++11).
The first test that is performed is to fill the data structures by adding elements to the back of the container. Two variations of vector are used, vector_pre being a std::vector with the size passed in parameters to the constructor, resulting in only one allocation of memory.
All data structures are impacted the same way when the data size increases, because there will be more memory to allocate. The vector_pre is clearly the winner of this test, being one order of magnitude faster than a list and about twice faster than a vector without pre-allocation. The result are directly linked to the allocations that have to be performed, allocation being slow. Whatever the data size is, push_back to a vector will always be faster than to a list. This is logical becomes vector allocates more memory than necessary and so does not need to allocate memory for each element.
But this test is not very interesting, generally building the data structure is not critical. What is critical is the operations that are performed on the data structure. That will be tested in the next sections.
The first operation is that is tested is the search. The container is filled with all the numbers in [0, N] and shuffled. Then, each number in [0,N] is searched in the container with std::find that performs a simple linear search.
Yes, vector is present in the graph, its line is the same as the x line ! Performing a linear search in a vector is several orders of magnitude faster than in a list.
The only reason is the usage of the cache line. When a data is accessed, the data is fetched from the main memory to the cache. Not only the accessed data is accessed, but a whole cacheline is fetched. As the elements in a vector are contiguous, when you access an element, the next element is automatically in the cache. As the main memory is orders of magnitude slower than the cache, this makes a huge difference. In the list case, the processor spends its whole time waiting for data being fetched from memory to the cache.
If we augment the size of the data type to 1KB, the results remain the same, but slower:
In the case of random insert, in theory, the list should be much faster, its insert operation being in O(1) versus O(n) for a vector.
The container is filled with all the numbers in [0, N] and shuffled. Then, 1000 random values are inserted at a random position in the container. The random position is found by linear search. In both cases, the complexity of the search is O(n), the only difference comes from the insert that follow the search.
When, the vector should be slower than the list, it is almost an order of magnitude faster. Again, this is because finding the position in a list is much slower than copying a lot of small elements.
If we increase the size:
The two lines are getting closer, but vector is still faster.
Increase it to 1KB:
This time, list outperforms vector by an order of magnitude ! The performance of random insert in a list are not impacted much by the size of the data type, where vector suffers a lot when big sizes are used. We can also see that list doesn't seem to care about the size of the collection. It is because the size of the collection only impact the search and not the insertion and as few search are performed, it does not change the results a lot.
If the iterator was already known (no need for linear search), it would be faster to insert into a list than into the vector.
In theory, random remove is the same case than random insert. Now that we've seen the results with random insert, we could expect the same behavior for random remove.
The container is filled with all the numbers in [0, N] and shuffled. Then, 1000 random values are removed from a random position in the container.
Again, vector is several times faster and looks to scale better. Again, this is because it is very cheap to copy small elements.
Let's increase it directly to 1KB element.
The two lines have been reversed !
The behavior of random remove is the same as the behavior of random insert, for the same reasons.
The next operation that we will compare is inserting elements in front of the collection. This is the worst case for vector, because after each insertion, all the previously inserted will be moved and copied. For a list, it does not make a difference compared to pushing to the back.
The results are crystal-clear and as expected. vector is very bad at inserting elements to the front. This does not need further explanations. There is no need to change the data size, it will only make vector much slower.
The next operation that is tested is the performance of sorting a vector or a list. For a vector std::sort is used and for a list the member function sort is used.
We can see that sorting a list is several times slower. It comes from the poor usage of the cache.
If we increase the size of the element to 1KB:
This time the list is faster than the vector. It is not very clear on the graph, but the values for the list are almost the same as for the previous results. That is because std::list::sort() does not perform any copy, only pointers to the elements are changed. On the other hand, swapping two elements in a vector involves at least three copies, so the cost of sorting will increase as the cost of copying increases.
Finally, we can also test a number crunching operation. Here, random elements are inserted into the container that is kept sorted. It means, that the position where the element has to be inserted is first searched by iterating through elements and the inserted. As we talk about number crunching, only 8 bytes elements are tested.
We can clearly see that vector is more than an order of magnitude faster than list and this will only be more as the size of the collection increase. This is because traversing the list is much more expensive than copying the elements of the vector.
To conclude, we can get some facts about each data structure:
- std::vector is insanely faster than std::list to find an element
- std::vector performs always faster than std::list with very small data
- std::vector is always faster to push elements at the back than std::list
- std::list handles very well large elements, especially for sorting or inserting in the front
This draw simple conclusions on usage of each data structure:
- Number crunching: use std::vector
- Linear search: use std::vector
- Random Insert/Remove: use std::list (if data size very small (< 64B on my computer), use std::vector)
- Big data size: use std::list (not if intended for searching)
If you have the time, in practice, the best way to decide is always to benchmark both versions, or even to try another data structures.
I hope that you found this article interesting. If you have any comment or have an idea about an other workload that you would like to test, don't hesitate to post a comment ;) If you have a question on results, don't hesitate as well.
The code source of the benchmark is available online: https://github.com/wichtounet/articles/blob/master/src/vector_list/bench.cpp