Dynamic Arrays in C Language

This post is about implementing dynamically growing arrays in C language. In other languages, this is implemented as std::vector in C++, ArrayList in Java, and list in Python and so on. Dynamic arrays also sometimes refer to dynamically allocated arrays which is not this post is about. Implementations more or less go from the simplest to the most complicated, the idea being you should stop whenever you find a good enough solution for your case.

There is also a complementary post "Generic Programming in C Language" which discusses two common techniques to do generic programming. These techniques can be combined with dynamic array and fat array implementations given here to overcome the single type limitation. Refer to the other post for the generic implementations along with some explanation.

Plain Arrays

Coming from a more advanced language, one might be tempted to look for a dynamic array right away. This may sound silly at first but good old plain arrays are the easiest choice for the job in many cases. This is possible when you can come up with a number for the maximum number of elements in the array.

A common example is reading a line of text from a well formed standard input or file. For most applications, you can assume lines have at most a thousand characters or some other fixed value so you can allocate that much beforehand:

#define MAXLINE 1000
char line[MAXLINE];
while (fgets(line, MAXLINE, stdin)) {
    process(line);
}

In this example, standard fgets function reads until a \n occurs or the buffer is filled. Therefore, it is always safe against buffer overflows, however splitting a line is possible if a line is too long. You can optionally check the existence of a trailing \n in the end of line to see if a line is split or not. Also note that, above example allocates on the stack but you can also allocate on the heap if you need.

Character arrays can be null terminated but this is not always possible for other types of arrays. For example, it is not possible to distinguish null termination and the number zero in integer arrays. For this reason, you might need to store an additional length parameter in the same scope, whether it is a struct, function, or a global scope. Finally, if you need a dynamically growing array, you can store a capacity parameter in addition to length and reallocate space when necessary:

int len = 0;
int cap = 2;
int *buf = malloc(cap * sizeof(int));

while ((num = get()) < limit) {
    if (cap <= len) {
        buf = realloc(buf, (cap *= 2) * sizeof(int));
    }
    buf[len++] = num;
}

for (i = 0; i < len; ++i) {
    process(buf[i]);
}

for (i = len-1; i > 0; --i) {
    buf[i] -= buf[i-1];
}

for (i = len-1; i >= 0; --i) {
    process(buf[--len]);
}

free(buf);

In this example, a new number is read into an integer named num using a get() function until the values of read numbers exceed the limit value. Since it may not be possible to determine the number of elements beforehand, we increase the array capacity dynamically. The rest of the code is just an arbitrary use case to demonstrate various operations on the array.

Plain arrays can be used dynamically relatively easily but it can sometimes be error prone. You need to pass an extra length parameter to functions and check the capacity when adding new elements. Also you need to hold these parameters in the current scope which can make things complicated.

Dynamic Arrays

In order to create an abstraction, one might decide to implement a specific data structure to be used as dynamic arrays. For this purpose, we pack the buffer, length, and capacity parameters within a struct:

#define TYPE int

struct arr {
    TYPE *buf;
    int   len;
    int   cap;
};

We also provide allocation and freeing functions with the new structure. In this example, arrnew creates an array with a default capacity, arrnewcap creates an array with a specified capacity, and arrdelete frees the array:

#define ARR_INIT_CAP 2

inline struct arr *
arrnew(void)
{
    return arrnewcap(ARR_INIT_CAP);
}

inline struct arr *
arrnewcap(int cap)
{
    struct arr *arr;

    if ((arr = malloc(sizeof(struct arr))) == NULL) {
        return NULL;
    }

    if ((arr->buf = malloc(cap * sizeof(TYPE))) == NULL) {
        free(arr);
        return NULL;
    }

    arr->len = 0;
    arr->cap = cap;

    return arr;
}

inline void
arrdelete(struct arr *arr)
{
    free(arr->buf);
    free(arr);
}

Similarly adding and removing elements is handled by arrappend and arrremove respectively. Reserving extra space or shrinking existing one is accomplished using arrresize:

#define ARR_GROW_RAT 2

inline int
arrappend(struct arr *arr, TYPE val)
{
    if (arr->cap <= arr->len && arrresize(arr, arr->cap * ARR_GROW_RAT) == -1) {
        return -1;
    }

    arr->buf[arr->len++] = val;

    return 0;
}

inline int
arrresize(struct arr *arr, int ncap)
{
    TYPE *nbuf;

    if ((nbuf = realloc(arr->buf, ncap * sizeof(TYPE))) == NULL) {
        return -1;
    }

    arr->cap = ncap;
    arr->buf = nbuf;

    return 0;
}

inline TYPE
arrremove(struct arr *arr)
{
    return arr->buf[--arr->len];
}

Array indexing is done by directly accessing the buf field of the structure. Separate indexing functions can be provided but they don't provide much advantage over this. Similarly len and cap fields are considered public and can be accessed directly. The previous scenario can now be accomplished as follows:

struct arr *arr = arrnew();

while ((num = get()) < limit) {
    arrappend(arr, num);
}

for (i = 0; i < arr->len; ++i) {
    process(arr->buf[i]);
}

for (i = arr->len-1; i > 0; --i) {
    arr->buf[i] -= arr->buf[i-1];
}

for (i = arr->len-1; i >= 0; --i) {
    process(arrremove(arr));
}

arrdelete(arr);

This code is pretty straightforward and idiomatic for C programmers, but it has a major problem. Code is not generic and you can only implement this for a single type since there is no function overloading in C. Therefore, this is only acceptable if you only need it for a single type in a small codebase.

Void Pointer Arrays

Void pointers can be used as an alternative when more than one type is needed. To define a void pointer array, we simply change the type in our previous code to a void pointer:

#define TYPE void *

Void pointer is a generic pointer type that can point to any type. We can use this implementation for any type we need. Our previous scenario now looks almost the same as before:

int num;
struct arr *arr;

arr = arrnew();

while ((num = get()) < limit) {
    arrappend(arr, num);
}

for (i = 0; i < arr->len; ++i) {
    process_int(arr->buf[i]);
}

for (i = arr->len-1; i > 0; --i) {
    arr->buf[i] -= arr->buf[i-1];
}

for (i = arr->len-1; i >= 0; --i) {
    process_int(arrremove(arr));
}

arrdelete(arr);

In this code, num is an int but it is implicitly casted to void * when we pass it to arrappend. We could also make this more explicit by putting (void *) before num. Similarly arr->buf[i] is a void * but it is casted to int when we pass it to process_int. We change the previous name process to process_int to make it more explicit what type it accepts, otherwise it is the same code.

Now this code should compile and most likely run as before. However, depending on the warning level you may see a warning as such:

warning: passing argument 2 of ‘arrappend’ makes pointer from integer without a cast [-Wint-conversion]
         arrappend(arr, num);
                        ^
note: expected ‘void *’ but argument is of type ‘int’
 arrappend(struct arr *arr, TYPE val)
 ^

This is the implicit cast we talked about before. When we try to make the cast explicit, this time we get a warning as such:

warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
         arrappend(arr, (void *)num);
                        ^

This warning is more informative than the previous one. It basically states that int and void * have different sizes. We can check the sizes of types with the following code:

printf("sizeof(int)    = %zu\n", sizeof(int));
printf("sizeof(int *)  = %zu\n", sizeof(int *));
printf("sizeof(void *) = %zu\n", sizeof(char *));

Depending on your machine, you may or may not get the following output:

sizeof(int)    = 4
sizeof(int *)  = 8
sizeof(void *) = 8

This means we are actually trying to store 4 byte integers in 8 byte fields. Memory layout of an array having three elements with values 8, 13, and 21 should look like this:

0                   1                   2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
|   8   | empty |  13   | empty |  21   | empty |

This is not an efficient way to store integers but it works without a problem. However, this is not true in general. If we were to store a type that is more than 8 bytes in size, then the elements would overwrite each other. This is the case for most structures consisting of multiple members.

Therefore, the right way to use void pointer arrays is by using them with pointer types instead of value types. Void pointers are guaranteed to have a size that can hold pointers to any other data type. Our code looks like this when using integer pointers:

int num, *ip;
struct arr *arr;

arr = arrnew();

while ((num = get()) < limit) {
    ip = malloc(sizeof(int));
    *ip = num;
    arrappend(arr, ip);
}

for (i = 0; i < arr->len; ++i) {
    process_intp(arr->buf[i]);
}

for (i = arr->len-1; i > 0; --i) {
    *(int *)(arr->buf[i]) -= *(int *)(arr->buf[i-1]);
}

for (i = arr->len-1; i >= 0; --i) {
    process_intp(arrremove(arr));
    free(arr->buf[arr->len]);
}

arrdelete(arr);

In the above code, we allocate new integers on the heap. If we use the memory address of the num variable, all added elements would point to the same element on the stack and they would all change when we read a new element.

We have already checked the size of integer pointers before. Memory layout of an array with integer pointers having values 8, 13, and 21 now looks like this:

0                   1                   2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
|    pointer    |    pointer    |    pointer    |
        |               |               |
        |               v               |
        v              +--+             |
       +-+             |13|             v
       |8|             +--+            +--+
       +-+                             |21|
                                       +--+

This works without a problem but since values are spread around the memory, performance may suffer. This may not be what you want depending on your needs. Also, since we are allocating on the heap, we also need to free memory when we remove an element.

Note that for some types this implementation is simple to use. Consider the use case with strings as below:

struct arr *arr;

arr = arrnew();

arrappend(arr, "one");
arrappend(arr, "two");
arrappend(arr, "three");

for (i = 0; i < arr->len; ++i) {
    process_charp(arr->buf[i]);
}

arrdelete(arr);

Strings in this code are allocated statically so we don't need to allocate space or free them after the use.

Fat Pointer Arrays

A frequent complaint about C language is that arrays do not store their lenghts. Fat pointers have been proposed to solve this problem by holding more than just the address of elements in a pointer. This was considered as an addition to the language specification. However, it turns out that you can also simply offset pointers to simulate fat pointers without a language change.

We can use fat pointers to simplify our dynamic array implementation. The idea behind is to allocate a little extra space in the beginning of the array when it is created. This extra space is used to hold length and capacity parameters of the array. Beginning address of elements is returned to the users instead of the actual beginning of allocated space. Thus, users are able to use the fat pointer as a regular pointer to access its elements.

Memory layout of an array having three elements with values 8, 13, and 21 with a capacity of 4 should look like this:

0                   1                   2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
|len = 3|cap = 4|   8   |  13   |  21   | empty |
|               |
|               v
v               beginning of elements (returned to users)
beginning of allocated memory

In above example, length and capacity parameters are stored as 4 byte integers. Array is also an integer array with 4 byte elements, but these two do not actually need to be the same. We define our fat pointer array as a simple pointer for an integer type using a macro:

#define TYPE int

typedef TYPE *ARR;

We implement allocation and freeing functions to simplify the management of fat pointer arrays. Similar to previous implementations, arrnew and arrnewcap allocates an array with a little extra space and returns the offset pointer as described before. For freeing, we implement arrdelete function which calculates the actual allocation address of the array and frees it properly:

#define ARR_INIT_CAP 2

inline ARR
arrnew(void)
{
    return arrnewcap(ARR_INIT_CAP);
}

inline ARR
arrnewcap(int cap)
{
    int *buf;

    if ((buf = malloc(cap * sizeof(TYPE) + 2 * sizeof(int))) == NULL) {
        return NULL;
    }

    buf[0] = 0;
    buf[1] = cap;

    return (ARR)(buf + 2);
}

inline void
arrdelete(ARR arr)
{
    free((int *)arr - 2);
}

We also need to provide functions for operations that may change the array. Appending an element is implemented as arrappend which returns a possibly new address. Adjusting the capacity of the array is implemented as arrresize which reallocates memory. Lastly, removing an element is implemented as arrremove which reduces the length of the array and returns the removed element:

inline ARR
arrappend(ARR arr, TYPE val)
{
    int *buf, len, cap;

    buf = (int *)arr - 2;
    len = buf[0];
    cap = buf[1];

    if (cap <= len && (arr = arrresize(arr, cap * ARR_GROW_RAT)) == NULL) {
        return NULL;
    }

    buf = (int *)arr - 2;
    arr[buf[0]++] = val;

    return arr;
}

inline ARR
arrresize(ARR arr, int ncap)
{
    int *buf, len;

    buf = (int *)arr - 2;
    len = buf[0];

    if ((buf = realloc(buf, ncap * sizeof(TYPE) + 2 * sizeof(int))) == NULL) {
        return NULL;
    }

    buf[0] = len;
    buf[1] = ncap;

    return (ARR)(buf + 2);
}

inline TYPE
arrremove(ARR arr)
{
    return arr[--((int *)arr - 2)[0]];
}

Unlike previous implementations, we also provide arrlen and arrcap functions to return the length and capacity of the array:

inline int
arrlen(ARR arr)
{
    return ((int *)arr - 2)[0];
}

inline int
arrcap(ARR arr)
{
    return ((int *)arr - 2)[1];
}

We can now use fat pointers in our previous scenario as such:

int *arr = arrnew();

while ((num = get()) < limit) {
    arr = arrappend(arr, num);
}

for (i = 0; i < arrlen(arr); ++i) {
    process(arr[i]);
}

for (i = arrlen(arr)-1; i > 0; --i) {
    arr[i] -= arr[i-1];
}

for (i = arrlen(arr)-1; i >= 0; --i) {
    process(arrremove(arr));
}

arrdelete(arr);

Note that we assign the result of arrappend call to the current pointer. We choose this design to make it more explicit in the code that the value of the pointer may have changed after appending an element. However, it is also possible to abstract this detail by using an extra macro depending on your taste.

Summary

Summary of each approach along with the code is given below: