This post is about implementing dynamically growing arrays in C language.
In other languages, this is implemented as std::vector
in C++, ArrayList
in Java, and list
in Python and so on.
Dynamic arrays also sometimes refer to dynamically allocated arrays which is not this post is about.
Implementations more or less go from the simplest to the most complicated, the idea being you should stop whenever you find a good enough solution for your case.
There is also a complementary post "Generic Programming in C Language" which discusses two common techniques to do generic programming. These techniques can be combined with dynamic array and fat array implementations given here to overcome the single type limitation. Refer to the other post for the generic implementations along with some explanation.
Plain Arrays
Coming from a more advanced language, one might be tempted to look for a dynamic array right away. This may sound silly at first but good old plain arrays are the easiest choice for the job in many cases. This is possible when you can come up with a number for the maximum number of elements in the array.
A common example is reading a line of text from a well formed standard input or file. For most applications, you can assume lines have at most a thousand characters or some other fixed value so you can allocate that much beforehand:
#define MAXLINE 1000
char line[MAXLINE];
while (fgets(line, MAXLINE, stdin)) {
process(line);
}
In this example, standard fgets
function reads until a \n
occurs or the buffer is filled.
Therefore, it is always safe against buffer overflows, however splitting a line is possible if a line is too long.
You can optionally check the existence of a trailing \n
in the end of line
to see if a line is split or not.
Also note that, above example allocates on the stack but you can also allocate on the heap if you need.
Character arrays can be null terminated but this is not always possible for other types of arrays. For example, it is not possible to distinguish null termination and the number zero in integer arrays. For this reason, you might need to store an additional length parameter in the same scope, whether it is a struct, function, or a global scope. Finally, if you need a dynamically growing array, you can store a capacity parameter in addition to length and reallocate space when necessary:
int len = 0;
int cap = 2;
int *buf = malloc(cap * sizeof(int));
while ((num = get()) < limit) {
if (cap <= len) {
buf = realloc(buf, (cap *= 2) * sizeof(int));
}
buf[len++] = num;
}
for (i = 0; i < len; ++i) {
process(buf[i]);
}
for (i = len-1; i > 0; --i) {
buf[i] -= buf[i-1];
}
for (i = len-1; i >= 0; --i) {
process(buf[--len]);
}
free(buf);
In this example, a new number is read into an integer named num
using a get()
function until the values of read numbers exceed the limit
value.
Since it may not be possible to determine the number of elements beforehand, we increase the array capacity dynamically.
The rest of the code is just an arbitrary use case to demonstrate various operations on the array.
Plain arrays can be used dynamically relatively easily but it can sometimes be error prone. You need to pass an extra length parameter to functions and check the capacity when adding new elements. Also you need to hold these parameters in the current scope which can make things complicated.
Dynamic Arrays
In order to create an abstraction, one might decide to implement a specific data structure to be used as dynamic arrays. For this purpose, we pack the buffer, length, and capacity parameters within a struct:
#define TYPE int
struct arr {
TYPE *buf;
int len;
int cap;
};
We also provide allocation and freeing functions with the new structure.
In this example, arrnew
creates an array with a default capacity, arrnewcap
creates an array with a specified capacity, and arrdelete
frees the array:
#define ARR_INIT_CAP 2
inline struct arr *
arrnew(void)
{
return arrnewcap(ARR_INIT_CAP);
}
inline struct arr *
arrnewcap(int cap)
{
struct arr *arr;
if ((arr = malloc(sizeof(struct arr))) == NULL) {
return NULL;
}
if ((arr->buf = malloc(cap * sizeof(TYPE))) == NULL) {
free(arr);
return NULL;
}
arr->len = 0;
arr->cap = cap;
return arr;
}
inline void
arrdelete(struct arr *arr)
{
free(arr->buf);
free(arr);
}
Similarly adding and removing elements is handled by arrappend
and arrremove
respectively.
Reserving extra space or shrinking existing one is accomplished using arrresize
:
#define ARR_GROW_RAT 2
inline int
arrappend(struct arr *arr, TYPE val)
{
if (arr->cap <= arr->len && arrresize(arr, arr->cap * ARR_GROW_RAT) == -1) {
return -1;
}
arr->buf[arr->len++] = val;
return 0;
}
inline int
arrresize(struct arr *arr, int ncap)
{
TYPE *nbuf;
if ((nbuf = realloc(arr->buf, ncap * sizeof(TYPE))) == NULL) {
return -1;
}
arr->cap = ncap;
arr->buf = nbuf;
return 0;
}
inline TYPE
arrremove(struct arr *arr)
{
return arr->buf[--arr->len];
}
Array indexing is done by directly accessing the buf
field of the structure.
Separate indexing functions can be provided but they don't provide much advantage over this.
Similarly len
and cap
fields are considered public and can be accessed directly.
The previous scenario can now be accomplished as follows:
struct arr *arr = arrnew();
while ((num = get()) < limit) {
arrappend(arr, num);
}
for (i = 0; i < arr->len; ++i) {
process(arr->buf[i]);
}
for (i = arr->len-1; i > 0; --i) {
arr->buf[i] -= arr->buf[i-1];
}
for (i = arr->len-1; i >= 0; --i) {
process(arrremove(arr));
}
arrdelete(arr);
This code is pretty straightforward and idiomatic for C programmers, but it has a major problem. Code is not generic and you can only implement this for a single type since there is no function overloading in C. Therefore, this is only acceptable if you only need it for a single type in a small codebase.
Void Pointer Arrays
Void pointers can be used as an alternative when more than one type is needed. To define a void pointer array, we simply change the type in our previous code to a void pointer:
#define TYPE void *
Void pointer is a generic pointer type that can point to any type. We can use this implementation for any type we need. Our previous scenario now looks almost the same as before:
int num;
struct arr *arr;
arr = arrnew();
while ((num = get()) < limit) {
arrappend(arr, num);
}
for (i = 0; i < arr->len; ++i) {
process_int(arr->buf[i]);
}
for (i = arr->len-1; i > 0; --i) {
arr->buf[i] -= arr->buf[i-1];
}
for (i = arr->len-1; i >= 0; --i) {
process_int(arrremove(arr));
}
arrdelete(arr);
In this code, num
is an int
but it is implicitly casted to void *
when we pass it to arrappend
.
We could also make this more explicit by putting (void *)
before num
.
Similarly arr->buf[i]
is a void *
but it is casted to int
when we pass it to process_int
.
We change the previous name process
to process_int
to make it more explicit what type it accepts, otherwise it is the same code.
Now this code should compile and most likely run as before. However, depending on the warning level you may see a warning as such:
warning: passing argument 2 of ‘arrappend’ makes pointer from integer without a cast [-Wint-conversion]
arrappend(arr, num);
^
note: expected ‘void *’ but argument is of type ‘int’
arrappend(struct arr *arr, TYPE val)
^
This is the implicit cast we talked about before. When we try to make the cast explicit, this time we get a warning as such:
warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
arrappend(arr, (void *)num);
^
This warning is more informative than the previous one.
It basically states that int
and void *
have different sizes.
We can check the sizes of types with the following code:
printf("sizeof(int) = %zu\n", sizeof(int));
printf("sizeof(int *) = %zu\n", sizeof(int *));
printf("sizeof(void *) = %zu\n", sizeof(char *));
Depending on your machine, you may or may not get the following output:
sizeof(int) = 4
sizeof(int *) = 8
sizeof(void *) = 8
This means we are actually trying to store 4 byte integers in 8 byte fields. Memory layout of an array having three elements with values 8, 13, and 21 should look like this:
0 1 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
| 8 | empty | 13 | empty | 21 | empty |
This is not an efficient way to store integers but it works without a problem. However, this is not true in general. If we were to store a type that is more than 8 bytes in size, then the elements would overwrite each other. This is the case for most structures consisting of multiple members.
Therefore, the right way to use void pointer arrays is by using them with pointer types instead of value types. Void pointers are guaranteed to have a size that can hold pointers to any other data type. Our code looks like this when using integer pointers:
int num, *ip;
struct arr *arr;
arr = arrnew();
while ((num = get()) < limit) {
ip = malloc(sizeof(int));
*ip = num;
arrappend(arr, ip);
}
for (i = 0; i < arr->len; ++i) {
process_intp(arr->buf[i]);
}
for (i = arr->len-1; i > 0; --i) {
*(int *)(arr->buf[i]) -= *(int *)(arr->buf[i-1]);
}
for (i = arr->len-1; i >= 0; --i) {
process_intp(arrremove(arr));
free(arr->buf[arr->len]);
}
arrdelete(arr);
In the above code, we allocate new integers on the heap.
If we use the memory address of the num
variable, all added elements would point to the same element on the stack and they would all change when we read a new element.
We have already checked the size of integer pointers before. Memory layout of an array with integer pointers having values 8, 13, and 21 now looks like this:
0 1 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
| pointer | pointer | pointer |
| | |
| v |
v +--+ |
+-+ |13| v
|8| +--+ +--+
+-+ |21|
+--+
This works without a problem but since values are spread around the memory, performance may suffer. This may not be what you want depending on your needs. Also, since we are allocating on the heap, we also need to free memory when we remove an element.
Note that for some types this implementation is simple to use. Consider the use case with strings as below:
struct arr *arr;
arr = arrnew();
arrappend(arr, "one");
arrappend(arr, "two");
arrappend(arr, "three");
for (i = 0; i < arr->len; ++i) {
process_charp(arr->buf[i]);
}
arrdelete(arr);
Strings in this code are allocated statically so we don't need to allocate space or free them after the use.
Fat Pointer Arrays
A frequent complaint about C language is that arrays do not store their lenghts. Fat pointers have been proposed to solve this problem by holding more than just the address of elements in a pointer. This was considered as an addition to the language specification. However, it turns out that you can also simply offset pointers to simulate fat pointers without a language change.
We can use fat pointers to simplify our dynamic array implementation. The idea behind is to allocate a little extra space in the beginning of the array when it is created. This extra space is used to hold length and capacity parameters of the array. Beginning address of elements is returned to the users instead of the actual beginning of allocated space. Thus, users are able to use the fat pointer as a regular pointer to access its elements.
Memory layout of an array having three elements with values 8, 13, and 21 with a capacity of 4 should look like this:
0 1 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
|len = 3|cap = 4| 8 | 13 | 21 | empty |
| |
| v
v beginning of elements (returned to users)
beginning of allocated memory
In above example, length and capacity parameters are stored as 4 byte integers. Array is also an integer array with 4 byte elements, but these two do not actually need to be the same. We define our fat pointer array as a simple pointer for an integer type using a macro:
#define TYPE int
typedef TYPE *ARR;
We implement allocation and freeing functions to simplify the management of fat pointer arrays.
Similar to previous implementations, arrnew
and arrnewcap
allocates an array with a little extra space and returns the offset pointer as described before.
For freeing, we implement arrdelete
function which calculates the actual allocation address of the array and frees it properly:
#define ARR_INIT_CAP 2
inline ARR
arrnew(void)
{
return arrnewcap(ARR_INIT_CAP);
}
inline ARR
arrnewcap(int cap)
{
int *buf;
if ((buf = malloc(cap * sizeof(TYPE) + 2 * sizeof(int))) == NULL) {
return NULL;
}
buf[0] = 0;
buf[1] = cap;
return (ARR)(buf + 2);
}
inline void
arrdelete(ARR arr)
{
free((int *)arr - 2);
}
We also need to provide functions for operations that may change the array.
Appending an element is implemented as arrappend
which returns a possibly new address.
Adjusting the capacity of the array is implemented as arrresize
which reallocates memory.
Lastly, removing an element is implemented as arrremove
which reduces the length of the array and returns the removed element:
inline ARR
arrappend(ARR arr, TYPE val)
{
int *buf, len, cap;
buf = (int *)arr - 2;
len = buf[0];
cap = buf[1];
if (cap <= len && (arr = arrresize(arr, cap * ARR_GROW_RAT)) == NULL) {
return NULL;
}
buf = (int *)arr - 2;
arr[buf[0]++] = val;
return arr;
}
inline ARR
arrresize(ARR arr, int ncap)
{
int *buf, len;
buf = (int *)arr - 2;
len = buf[0];
if ((buf = realloc(buf, ncap * sizeof(TYPE) + 2 * sizeof(int))) == NULL) {
return NULL;
}
buf[0] = len;
buf[1] = ncap;
return (ARR)(buf + 2);
}
inline TYPE
arrremove(ARR arr)
{
return arr[--((int *)arr - 2)[0]];
}
Unlike previous implementations, we also provide arrlen
and arrcap
functions to return the length and capacity of the array:
inline int
arrlen(ARR arr)
{
return ((int *)arr - 2)[0];
}
inline int
arrcap(ARR arr)
{
return ((int *)arr - 2)[1];
}
We can now use fat pointers in our previous scenario as such:
int *arr = arrnew();
while ((num = get()) < limit) {
arr = arrappend(arr, num);
}
for (i = 0; i < arrlen(arr); ++i) {
process(arr[i]);
}
for (i = arrlen(arr)-1; i > 0; --i) {
arr[i] -= arr[i-1];
}
for (i = arrlen(arr)-1; i >= 0; --i) {
process(arrremove(arr));
}
arrdelete(arr);
Note that we assign the result of arrappend
call to the current pointer.
We choose this design to make it more explicit in the code that the value of the pointer may have changed after appending an element.
However, it is also possible to abstract this detail by using an extra macro depending on your taste.
Summary
Summary of each approach along with the code is given below:
- line.c · html
- Builtin solution
- Simple and easy to use
- Need to know the maximum element size beforehand
- plain_arr.c · html
- Builtin solution
- Simple and easy to use
- Need to check capacity when adding new elements
- Need to hold length and capacity parameters in the scope
- Need to pass length parameter to functions
- dyn_arr.c · html
- Holds all parameters within a struct
- Can only be implemented for a single type
- void_arr.c · html
- Holds all parameters within a struct
- Can be used for any type
- Can only be safely used with pointer types
- Requires pointer casts when accessing elements
- fat_arr.c · html
- Clear syntax
- Holds all parameters within a pointer
- Can only be implemented for a single type
- Need to remember using functions for operations