This post introduces implementations of queue and deque data structures using dynamic arrays. Deque implementation is suitable as a default container to replace standard dynamic arrays as it has the advantage of having constant time complexity for front push/pop operations with little or no speed/locality overhead. This may especially be a game changer for dynamic languages such as python or ruby where people care less about the complexity and use lists/arrays for anything required.

To my knowledge, this is a novel way of implementing deque as usual implementations make use of either linked lists which have poor locality and random access performance but excels at random insertions or multiple arrays of fixed sizes which again have poor locality but instead avoids moving data while resizing. Results from simple benchmarks are given at the end.

Introduction

Deques are the unknown giants of collection libraries. They can be used as vectors, stacks or queues. Of course flexibility almost always comes with a cost and for deques it is performance overhead that is caused either by poor locality or poor random access performance. This post introduces an alternative implementation that will either minimize or completely eliminate these drawbacks.

Example implementations are written in C++ while algorithms should be language independent. Technically speaking these implementations are not STL containers as they do not have all the required methods specified in the standard most notably iterators. Making complete implementations are beyond the scope of this post but encouraged for the curious reader.

Data Structures

In this section, push and pop operations are described and visualized for different data structures. Growth factor is chosen to be $2$ and the initial size is set to $1$ for the examples.

Vector

Just to get you started with the notation, here given how a classical vector works. To simply sum up, vectors have actually a fixed size memory and when they are full they reallocate a bigger sized memory, move the existing data to the new location and then release the old memory. Reallocations are when pushing 2, 3, 5 and 9 in the example:

| |
|1|
|1|2|
|1|2|3| |
|1|2|3|4|
|1|2|3|4|5| | | |
|1|2|3|4|5|6| | |
|1|2|3|4|5|6|7| |
|1|2|3|4|5|6|7|8|
|1|2|3|4|5|6|7|8|9| | | | | | | |

Complexity of push back operation is amortized O(1).

Popping an element from back usually requires only changing an internal counter of the number of elements existing in the vector:

|1|2|3|4|5|6|7|8|9| | | | | | | |
|1|2|3|4|5|6|7|8| | | | | | | | |
|1|2|3|4|5|6|7| | | | | | | | | |
|1|2|3|4|5|6| | | | | | | | | | |
|1|2|3|4|5| | | | | | | | | | | |
|1|2|3|4| | | | | | | | | | | | |
|1|2|3| | | | | | | | | | | | | |
|1|2| | | | | | | | | | | | | | |
|1| | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | |

Most implementations I know (C++, Java) do not reclaim memory and requires a manual intervention to shrink the underlying array. One of the reasons for this is the operating systems do not usually provide a guaranteed method that would shrink the memory without copying, for instance GNU specification has the following to say about realloc:

You can also use realloc to make a block smaller. The reason you would do this is to avoid tying up a lot of memory space when only a little is needed. In several allocation implementations, making a block smaller sometimes necessitates copying it, so it can fail if no other space is available.

Complexity of pop back operation is O(1).

Vector interfaces usually do not provide a push front operation hence this is simulated by inserting an element to the beginning (e.g. insert(v.begin(), ..);). When this is the case, a vector needs to shift all elements forward to make a spot for the new one and do the reallocation trick if it runs out of space while doing so:

| |
|1|
|2|1|
|3|2|1| |
|4|3|2|1|
|5|4|3|2|1| | | |
|6|5|4|3|2|1| | |
|7|6|5|4|3|2|1| |
|8|7|6|5|4|3|2|1|
|9|8|7|6|5|4|3|2|1| | | | | | | |

Complexity of push front operation is O(n).

Popping an element from front again requires shifting all elements much like push front and does not do shrinking in most implementations.

|9|8|7|6|5|4|3|2|1| | | | | | | |
|8|7|6|5|4|3|2|1| | | | | | | | |
|7|6|5|4|3|2|1| | | | | | | | | |
|6|5|4|3|2|1| | | | | | | | | | |
|5|4|3|2|1| | | | | | | | | | | |
|4|3|2|1| | | | | | | | | | | | |
|3|2|1| | | | | | | | | | | | | |
|2|1| | | | | | | | | | | | | | |
|1| | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | |

Complexity of pop front operation is O(n).

Queue

Queue is my first proposal to solve shifting elements problem in front popping. The idea is just to hold an offset variable and do the indexing accordingly instead of shifting elements. Queues are actually quite similar to golang slices with some of the differences given below:

  • Multiple slices can refer to the same underlying array.
  • Reallocations in slices (through built-in append function) does not release the old array but instead leave it to the garbage collector. This is necessary since there might be other slices referring to the old array.
  • Slices provides an upper bound as well.
  • Queues can grow backward while pushing front provided there is some offset whereas this is not possible in slices as far as I imagine since there is no way to reach the underlying array from a slice. This may still be possible by explicitly saving a pointer to the underlying array or a bigger slice but then reallocations would be cumbersome since they might change the underlying array.

If you're into these stuff, I highly encourage you to read golang slices internals. While you're at it, you may also want to read dlang arrays and slices as well which are very similar to golang slices at least for the sake of this discussion. To my understanding, the difference between the two is that multiple dlang slices derived from the same array are meant to represent different things while minimizing the need for copying by using the common underlying array as much as possible. In order to simplify the management, responsibility of not stomping on each other's elements is handled by the runtime rather than the programmer. Here is a more comprehensive discussion about slices comparing golang and dlang using behaviors given at different cases. A word of advice is that when you see the term dynamic array you should first assume that it refers to dynamically allocated arrays not the dynamic arrays from an algorithms perspective.

Pushing back for queue is identical to vector:

| |
|1|
|1|2|
|1|2|3| |
|1|2|3|4|
|1|2|3|4|5| | | |
|1|2|3|4|5|6| | |
|1|2|3|4|5|6|7| |
|1|2|3|4|5|6|7|8|
|1|2|3|4|5|6|7|8|9| | | | | | | |

Complexity of push back operation is amortized O(1).

Popping back is also identical to vector:

|1|2|3|4|5|6|7|8|9| | | | | | | |
|1|2|3|4|5|6|7|8| | | | | | | | |
|1|2|3|4|5|6|7| | | | | | | | | |
|1|2|3|4|5|6| | | | | | | | | | |
|1|2|3|4|5| | | | | | | | | | | |
|1|2|3|4| | | | | | | | | | | | |
|1|2|3| | | | | | | | | | | | | |
|1|2| | | | | | | | | | | | | | |
|1| | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | |

Complexity of pop back operation is O(1).

There are two possibilities when pushing an element to the front. If the offset is $0$ then the behavior is identical to vectors (i.e. shifting all elements):

| |
|1|
|2|1|
|3|2|1| |
|4|3|2|1|
|5|4|3|2|1| | | |
|6|5|4|3|2|1| | |
|7|6|5|4|3|2|1| |
|8|7|6|5|4|3|2|1|
|9|8|7|6|5|4|3|2|1| | | | | | | |

If it's not zero then we can just decrease the offset by $1$ and use the corresponding space for the new element until the offset is $0$ again.

| | | | .2|1| | |  (arbitrary state)
| | | .3|2|1| | |
| | .4|3|2|1| | |
| .5|4|3|2|1| | |
|6|5|4|3|2|1| | |
|7|6|5|4|3|2|1| |
|8|7|6|5|4|3|2|1|
|9|8|7|6|5|4|3|2|1| | | | | | | |

Complexity of push front operation is either O(n) or O(1) depending on whether the offset is 0 or not.

Popping front is the case where the queue really shines. Instead of shifting all elements, we just increase the offset by one (marked with . in the diagram):

|9|8|7|6|5|4|3|2|1| | | | | | | |
| .8|7|6|5|4|3|2|1| | | | | | | |
| | .7|6|5|4|3|2|1| | | | | | | |
| | | .6|5|4|3|2|1| | | | | | | |
| | | | .5|4|3|2|1| | | | | | | |
| | | | | .4|3|2|1| | | | | | | |
| | | | | | .3|2|1| | | | | | | |
| | | | | | | .2|1| | | | | | | |
| | | | | | | | .1| | | | | | | |
| | | | | | | | | . | | | | | | |

Complexity of pop front operation is O(1).

Deque

Queues are good but they still have an Achilles' heel, pushing front. We can not rely on having a non-zero offset all the time. That is why I propose a second data structure which is appropriate when operations involve pushing and popping from both front and back.

The idea behind this implementation is that we can use two structures to hold data in the front and back respectively. Instead of vectors, it is better to use our queue implementation so that things will stay predictable when pushes and pops cross the middle boundary (marked with : in diagrams).

Note that in the diagrams below queues are attached together and the front queue is reversed while doing so in order to simulate the reverse indexing effect when the element is at front. In reality these are just two different regular queues that reside in different parts of memory.

Although this means a small locality drawback, much of the overhead of deques over vectors should instead result from either comparisons to determine where an element resides or from adding offsets to the requested indexes. That said, these are mostly compensated by the CPU as modern branch predictors are extremely good at these things.

Pushing back to an empty deque involves only the back queue and looks like this:

                              | : |
                              | :1|
                              | :1|2|
                              | :1|2|3| |
                              | :1|2|3|4|
                              | :1|2|3|4|5| | | |
                              | :1|2|3|4|5|6| | |
                              | :1|2|3|4|5|6|7| |
                              | :1|2|3|4|5|6|7|8|
                              | :1|2|3|4|5|6|7|8|9| | | | | | | |

Complexity of push back operation is amortized O(1).

Popping back in the same example also involves only the back queue and looks like this:

                              | :1|2|3|4|5|6|7|8|9| | | | | | | |
                              | :1|2|3|4|5|6|7|8| | | | | | | | |
                              | :1|2|3|4|5|6|7| | | | | | | | | |
                              | :1|2|3|4|5|6| | | | | | | | | | |
                              | :1|2|3|4|5| | | | | | | | | | | |
                              | :1|2|3|4| | | | | | | | | | | | |
                              | :1|2|3| | | | | | | | | | | | | |
                              | :1|2| | | | | | | | | | | | | | |
                              | :1| | | | | | | | | | | | | | | |
                              | : | | | | | | | | | | | | | | | |

Complexity of pop back operation is O(1).

Pushing front to an empty deque involves only the front queue symmetrically and looks like this:

                              | : |
                              |1: |
                            |2|1: |
                        | |3|2|1: |
                        |4|3|2|1: |
                | | | |5|4|3|2|1: |
                | | |6|5|4|3|2|1: |
                | |7|6|5|4|3|2|1: |
                |8|7|6|5|4|3|2|1: |
| | | | | | | |9|8|7|6|5|4|3|2|1: |

Complexity of push front operation is amortized O(1).

Popping front works just as expected and looks like this:

| | | | | | | |9|8|7|6|5|4|3|2|1: |
| | | | | | | | |8|7|6|5|4|3|2|1: |
| | | | | | | | | |7|6|5|4|3|2|1: |
| | | | | | | | | | |6|5|4|3|2|1: |
| | | | | | | | | | | |5|4|3|2|1: |
| | | | | | | | | | | | |4|3|2|1: |
| | | | | | | | | | | | | |3|2|1: |
| | | | | | | | | | | | | | |2|1: |
| | | | | | | | | | | | | | | |1: |
| | | | | | | | | | | | | | | | : |

Complexity of pop front operation is O(1).

Since there are many different possibilities with deques, following example is given to visualize what happens in more complex cases. In the example we start with an empty deque and then:

  • push 5 elements to back
  • push 5 elements to front
  • pop 8 elements from front
  • push 5 elements to back.

The most recent element is marked with o while the existing ones are marked with x:

                              | : |
                              | :o|
                              | :x|o|
                              | :x|x|o| |
                              | :x|x|x|o|
                              | :x|x|x|x|o| | | |
                              |o:x|x|x|x|x| | | |
                            |o|x:x|x|x|x|x| | | |
                        | |o|x|x:x|x|x|x|x| | | |
                        |o|x|x|x:x|x|x|x|x| | | |
                | | | |o|x|x|x|x:x|x|x|x|x| | | |
                | | | | |x|x|x|x:x|x|x|x|x| | | |
                | | | | | |x|x|x:x|x|x|x|x| | | |
                | | | | | | |x|x:x|x|x|x|x| | | |
                | | | | | | | |x:x|x|x|x|x| | | |
                | | | | | | | | :x|x|x|x|x| | | |
                | | | | | | | | : .x|x|x|x| | | |
                | | | | | | | | : | .x|x|x| | | |
                | | | | | | | | : | | .x|x| | | |
                | | | | | | | | : | | .x|x|o| | |
                | | | | | | | | : | | .x|x|x|o| |
                | | | | | | | | : | | .x|x|x|x|o|
                | | | | | | | | :x|x|x|x|x|o| | | | | | | | | | | (see next section)
                | | | | | | | | :x|x|x|x|x|x|o| | | | | | | | | |

All operations above take amortized O(1) time.

Memory Reclaiming

One of my main concerns with the queue implementation was the extra memory used. In this section different solutions are discussed to minimize the wasted space in queues.

Strategies

Consider the case where this structure is used as a FIFO queue, that is elements arrive at one side and exit at the other:

| |
|1|
|1|2|
| .2|
| .2|3| |
| | .3| |
| | .3|4|
| | | .4|
| | | .4|5| | | |
| | | | .5| | | |
| | | | .5|6| | |
| | | | | .6| | |
| | | | | .6|7| |
| | | | | | .7| |
| | | | | | .7|8|
| | | | | | | .8|
| | | | | | | .8|9| | | | | | | |

Structure will get larger and larger over time although the total number of elements it has will always stay few (at most $2$ in the example above), meaning much of the space is wasted. In order to avoid this, we need to reclaim empty memory periodically. An intuitive idea is to overlap resizes with reclaims so that it won't incur any overhead:

| |
|1|
|1|2|
| .2|
|2|3| | |
| .3| | |
| .3|4| |
| | .4| |
| | .4|5|
| | | .5|
|5|6| | | | | | |
| .6| | | | | | |
| .6|7| | | | | |
| | .7| | | | | |
| | .7|8| | | | |
| | | .8| | | | |
| | | .8|9| | | |

As you can see, memory usage has gotten a little bit better but it will still increase over time. So the next idea is to check the load factor at resizes and only increase the size if it is more than the multiplicative inverse of growth factor, otherwise just reclaim the empty space by moving the data in the existing array:

| |
|1|
|1|2|
| .2|
|2|3| | |
| .3| | |
| .3|4| |
| | .4| |
| | .4|5|
| | | .5|
|5|6| | |
| .6| | |
| .6|7| |
| | .7| |
| | .7|8|
| | | .8|
|8|9| | |

This time assuming the number of elements in the array will at most be $2$, the size of array will get fixed at $4$.

Amortized Analysis

Since reclaims will occur periodically rather than geometrically in the last strategy, we need to be sure that we still have constant time amortized complexity. In order to do this, we start with regular vectors and iteratively progress to other strategies. Most of the analysis will be verbal rather than formal.

Let's look at classical vector push back analysis. We start by filling some parameters:

 i   s   c
--- --- ---
 0   1   0  | |              i: step
 1   1   1  |1|              s: size
 2   2   2  |1|2|            c: cost
 3   4   3  |1|2|3| |        g: growth factor = 2
 4   4   1  |1|2|3|4|
 5   8   5  |1|2|3|4|5| | | |
 6   8   1  |1|2|3|4|5|6| | |
 7   8   1  |1|2|3|4|5|6|7| |
 8   8   1  |1|2|3|4|5|6|7|8|
 9  16   9  |1|2|3|4|5|6|7|8|9| | | | | | | |
10  16   1  |1|2|3|4|5|6|7|8|9|0| | | | | | |
11  16   1  |1|2|3|4|5|6|7|8|9|0|1| | | | | |
12  16   1  |1|2|3|4|5|6|7|8|9|0|1|2| | | | |
13  16   1  |1|2|3|4|5|6|7|8|9|0|1|2|3| | | |
14  16   1  |1|2|3|4|5|6|7|8|9|0|1|2|3|4| | |
15  16   1  |1|2|3|4|5|6|7|8|9|0|1|2|3|4|5| |
16  16   1  |1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|

By following the aggregate method, if we write the cost of inserting $n$ elements we get:

\begin{align} Total &= n + \sum_{k=0}^{\lfloor log(n) - 1 \rfloor}2^{k} \\ \end{align}

If we then substitute the summation term with the rudimentary geometric series formula we get:

\begin{align} &= n + (\frac{1 - 2^{\lfloor log(n) \rfloor}}{1 - 2}) \\ &= n + (2^{\lfloor log(n) \rfloor} - 1) < 2n \\ &= O(n) \end{align}

This means inserting $1$ element would be amortized O(1).

First we look at how our first naive queue implementation behaves with two elements in the queue:

 i   s   c
--- --- ---
 0   1   0  | |              i: step
 1   1   1  |1|              s: size
 2   2   2  |1|2|            c: cost
 3   2   1  | .2|            g: growth factor      = 2
 4   4   2  | .2|3| |        k: number of elements = 2
 5   4   1  | | .3| |              in the queue
 6   4   1  | | .3|4|
 7   4   1  | | | .4|
 8   8   2  | | | .4|5| | | |
 9   8   1  | | | | .5| | | |
10   8   1  | | | | .5|6| | |
11   8   1  | | | | | .6| | |
12   8   1  | | | | | .6|7| |
13   8   1  | | | | | | .7| |
14   8   1  | | | | | | .7|8|
15   8   1  | | | | | | | .8|
16  16   2  | | | | | | | .8|9| | | | | | | |

Naive queue behavior is a lot similar to vectors with two differences:

  • After allocations we only need to copy $k$ elements instead of $n$.
  • Allocations occur less frequently than vectors by half after we fill the first $k$ elements to the queue.

Since these differences are both in favor of our naive queue and we already have a lower bound $O(n)$ as the cost of each step is at least $1$, it is fair to say naive queues have amortized O(1) complexity as well.

Next we looks at our second queue implementation, reclaiming queue, which looks like this:

 i   s   c
--- --- ---
 0   1   0  | |              i: step
 1   1   1  |1|              s: size
 2   2   2  |1|2|            c: cost
 3   2   1  | .2|            g: growth factor      = 2
 4   4   2  |2|3| | |        k: number of elements = 2
 5   4   1  | .3| | |              in the queue
 6   4   1  | .3|4| |
 7   4   1  | | .4| |
 8   4   1  | | .4|5|
 9   4   1  | | | .5|
10   8   2  |5|6| | | | | | |
11   8   1  | .6| | | | | | |
12   8   1  | .6|7| | | | | |
13   8   1  | | .7| | | | | |
14   8   1  | | .7|8| | | | |
15   8   1  | | | .8| | | | |
16   8   1  | | | .8|9| | | |

The only difference between this and the previous implementation is that reallocations will occur less frequently in this implementation, hence we can say that complexity of this strategy is also amortized O(1).

Lastly we look at our third queue implementation, conservative queue, which looks like this:

 i   s   c
--- --- ---
 0   1   0  | |              i: step
 1   1   1  |1|              s: size
 2   2   2  |1|2|            c: cost
 3   2   1  | .2|            g: growth factor      = 2
 4   4   2  |2|3| | |        k: number of elements = 2
 5   4   1  | .3| | |              in the queue
 6   4   1  | .3|4| |
 7   4   1  | | .4| |
 8   4   1  | | .4|5|
 9   4   1  | | | .5|
10   4   2  |5|6| | |
11   4   1  | .6| | |
12   4   1  | .6|7| |
13   4   1  | | .7| |
14   4   1  | | .7|8|
15   4   1  | | | .8|
16   4   2  |8|9| | |

The trick to analyze this strategy is to divide it into two cases.

First phase is until the reallocations allocate a bigger memory which is identical to the previous version. This corresponds to steps 0 to 4 in the toy example above.

Second phase is when reallocations starts moving data in the existing array instead of creating a new one. This corresponds to step 4 onwards in the example. If we calculate the cost of inserting $n$ elements in the second phase we get:

\begin{align} Total &= n + n \times extra \times frequency \\ &= n + n \times (k - 1) \times (\frac{1}{2 \cdot ((s - k) + 1)}) \\ \end{align}

where $s$ is the converged size of array which is always at least the size of the multiplication of $k$ and the growth factor $g$ so the equation becomes:

\begin{align} &= n + n (\frac{k - 1}{2 \cdot (((k \cdot g + c) - k) + 1)}) \\ \end{align}

in which the term in parenthesis is actually just a giant constant hence the equation reduces to:

\begin{align} &= n + n \cdot C \\ &= O(n) \end{align}

If you're like me, you are worried that $k$ is not always a constant but sometimes a variable as well. Just to be sure, we should examine our giant constant as a function of $k$ as well:

\begin{align} f(k) &= (\frac{k - 1}{2 \cdot (((k \cdot g + c) - k) + 1)}) \\ \end{align}

Rather than proving that this is a converging series, we can try to be pragmatic and plot the function instead to see how it behaves with increasing $k$ values. Picking $c = 0$ in order to maximize the function and $g = 2$ as our usual value we get:

\begin{align} f(1) &= 0.0 \\ f(10) &= 0.409090909091 \\ f(100) &= 0.490099009901 \\ f(1000) &= 0.499000999001 \\ f(10000) &= 0.499900009999 \\ &.. \end{align}

Proof is once again left to the curious reader.

Experiments ()

Note: Since there are quite a lot of charts and also some math in the page loading can be slow on old hardware and mobile devices, hence I decided to add links next to each experiments to actually draw the charts to appropriate places. You can also use Show All Charts link above if you trust your hardware.

Experiments are run on Intel(R) Core(TM) i5-2467M CPU @ 1.60GHz with 4GB DDR3 memory @ 1333MHz running on 64 bit Ubuntu 12.04.2 operating system. Speed tests run the test for $5$ times and takes the average of $3$ after excluding the best and the worst result. Three different structures as the data type are used in the experiments:

  • Small, a structure of 1 int (~4 byte)
  • Medium, a structure of 10 int (~40 byte)
  • Large, a structure of 100 int (~400 byte)

Memory tests simply use the type Small as they only compare number of elements to the number of total available slots which is independent of the underlying type.

Fill Back Test ()

Fill back test is simply pushing $n$ elements to the container in sequence using push_back.

For small object sizes, the overhead of reallocations is at minimum so that StdVector and our implementations can still compete with StdDeque and StdList. The difference among these vector family containers is most likely due to implementation details.

When we increase the object size, vector alike containers starts to fall back behind StdDeque and even StdList.

Since StdList is able to link new elements without copying, it is able to catch StdDeque for large elements. The gap between these two and the rest increases even further as the element size increases.

Fill Back Reserved Test ()

Fill back reserved test is just like fill back test but it calls reserve_back to prevent unnecessary reallocations during pushes. Since there are no such methods in StdDeque and StdList we just use values from regular fill back test for these containers. StdVector's reserve_back is just a wrapper for the reserve method.

For small elements StdList is again the slowest and others seem to get together while deque's are a little slower than queue's and StdVector. There is a break at around $n = 8000000$ for vector families which is a mystery. Some plausible explanations are operating system internals or simply a bug in the bench suite.

When we increase the element size, vector alike containers start to fall behind StdDeque but not StdList as they did in the regular fill back test. Also this time the break is at $n = 2000000$.

When we increase the element size even further, all containers start to perform very similar to each other.

Fill Front Test ()

Fill back test is simply pushing $n$ elements to the container using push_front. Note that StdVector and queues are not run in these tests as they have high complexities. This is denoted with '(NR)' prefix in the legend. These structures should be expected to behave much slower than the ones that are shown in the graph.

For small elements, results are extremely similar to push back results.

When we increase the element size, results are still the same as expected.

For large element sizes, things are still as expected with the exception that there is a small break at around $n = 700000$ for StdDeque and StdList.

Fill Front Reserved Test ()

For the sake of completeness, these are the results for fill front test which calls reserve_front method before insertions. StdVector is not run in this test as there is no way to implement it in vectors without an offset variable.

For small elements, results are similar to fill back reserved test but this time StdDeque performs better than others with no exception.

For medium sized elements, all containers approach to StdList.

For large elements, vector families fall behind StdDeque and StdList just like they do in fill back reserved test. Again there is a quite significant but mysterious break at $n = 400000$.

Queue Test ()

Queue test is the case that is described in the memory reclaiming section. In this test containers are used as a FIFO queue alternatingly pushing_back and popping_front while having a fixed number of element $k$ which is selected to be $1000$ in these runs. StdVector is once again excluded from the test for having a high complexity.

For small elements, StdList performs the worse while StdDeque is among the vector family.

For medium sized elements, the gap between StdList and the rest starts to close.

The trend continues as the element size increases. This time the mysterious break is at $n = 500000$ and only for naive implementations which keep on growing aggressively in this test.

Zigzag Test ()

Zigzag test is a made-up test that make use of pushing and popping from back and front in a balanced fashion. Specifically in this test:

  • $n / 4$ elements are pushed back
  • $n / 4$ elements are pushed front
  • $n / 2$ elements are popped back
  • $n / 4$ elements are pushed front
  • $n / 4$ elements are pushed back
  • $n / 2$ elements are popped front

StdVector and queue's are excluded from this test.

For small sized elements, deque's are slightly less performant than StdDeque while StdList is the slowest of them all.

As the element size increases, reallocations starts to cause quite an overhead as reserve methods are not called beforehand.

As the element size increases even further, deque's fall behind StdDeque and StdList while these two approaches to each other.

Traverse Test ()

Traverse test iterates over the container and simply increase the first integer in the underlying structure. It is possible to iterate over StdList but it requires using either iterators or something quite different than others as it doesn't provide [] operator hence it is excluded from this test. Since this test substantially measures the locality of containers one should expect StdList to be slower than others.

For small elements, StdVector and queue's behave almost identical while deque's are just a little bit slower than these. StdDeque performs the slowest by a large margin.

As the element size increases, containers start to get close to each other.

For large element sizes, the gap between containers diminishes.

Shuffle Test ()

Shuffle test uses the Fisher-Yates algorithm to shuffle a pre-filled container. StdList is excluded from the test for requiring a different shuffle algorithm than others.

For small sized elements, StdDeque is the slowest one while the rest is close to each other.

For medium sized elements, the gap between StdDeque and others increases while the rest get even closer to each other.

For large sized elements, the gap decreases again but still StdDeque is slower than others.

QSort Test ()

QSort test involves using in-place quicksort on pre-filled and pre-shuffled containers. Elements are compared to each other only by considering the value of their first integer in the structure. StdList is once again excluded from the test for requiring a different algorithm than others.

For small sized elements StdVector is the fastest followed by queue's and then deque's and lastly StdDeque.

As the size increases, deques get closer to queue's.

For large sized elements, the difference between containers starts to diminish.

Fill Back Test (Memory) ()

Fill back memory test is the same as fill back test but this time we measure the average load factor between insertions. StdList and StdDeque are excluded from the test for either having a load factor of $1$ or very close to $1$ at all times.

For this test all containers behave the same and have around $0.7-0.75$ of load at all times.

Fill Front Test (Memory) ()

Just for the sake of completeness same test is run for fill front.

Results are identical to fill back case.

Queue Test (Memory) ()

This test measures the memory usage of the case that is described in memory reclaiming section for $k = 1000$.

As we discussed before, except for the conservative version, containers keep on growing forever. Since $k$ is relatively small compared to $n$ in this test, load factor is very close to $0$ for non-conservative queues.

Conservative versions seem to have arguably acceptable values for this test. One of the reasons is that $k = 1000$ is very close to $2^{10} = 1024$ from the left side. When we pick a slightly larger value from a power of growth factor, say $1025$, we have a load factor of about $0.25$. More specifically, we can say that average load factor will crudely be a number between $1/g$ and $1/g^2$ for $n >> k$ and $g$ being the growth factor.

Zigzag Test (Memory) ()

This test shows the average load factor in the zigzag test.

Containers seem to have about $0.5$ average load while naive version seems to lag a little from the rest.

Conclusion

Two different containers (queue and deque) are introduced in this post each having three different memory management models (naive, reclaiming and conservative). Experiments show that there are still cases where StdDeque is more appropriate than these containers especially regarding the memory usage. Note that StdList is also sometimes useful when insertions are at random places but these are not tested in these experiments. It is unfortunate and extremely ironic that one of the vulnerable points of queue's is queueing itself. Having said that, conservative models seem to have a somewhat acceptable behavior even for this case and it may even be further improved by having either a smaller growth factor or an alternative condition to check whether growth is required or not.

One of the things I am working on at the moment it to formulate an alternative deque implementation using only one array. This may add some overhead to reallocations as you need to move both front and back queues but it should increase the locality even more for better performance. I'm not sure how and why I came up with the two array version first.

Since we are in a parallel computation era, it is also obligatory to study parallel versions of these structures. Most of the academic literature focuses on this aspect of queues already.

Lastly, it would be nice to fix incoherences in the bench suite. There are still inexplainable behaviors seen on the results. Of course a better alternative would be to implement these in place of existing structures and try to run real programs or industrial benchmarks to see how they perform.

In the light of these, I hope to see these structures implemented in the languages that I use, maybe not as a replacement as the existing ones but as alternative choices. People have been comparing vector's and deque's and also sometimes list's for a long time and I imagine there would be use cases for these new structures as well, specifically when pushes and pops are balanced with back and front and iterations are more common than insertions.


comments powered by Disqus