How to avoid subtle bugs in malloc usage in C

By Dmitry Kabanov

January 18, 2024

Whenever a computer program needs to store data of the size determined only during the runtime, then dynamic memory allocation is required. Memory allocation happens in the memory heap, and in C is commonly done via a library function malloc, which has subtle things with its usage that I would like to discuss here.

First of all, malloc has the following signature:

void *malloc(size_t size);

To allocate an array of 100 elements of type double, the following code is commonly used:

int length = 100;
double *arr = (double *) malloc(length * sizeof(double));

where operator sizeof is applied to type double, so its operand must be parenthesized.

Although the above code works most of the time, there are ways to improve the maintainability and correctness of this code.

First, often an explicit cast is applied to the returned value of malloc (which returns void *):
```
       double *arr = (double *) malloc(length * sizeof(double));
```
This is completely redundant in C, as void * is implicitly converted to double * because of the type information on the left-hand side. It is required if this line is compiled in C++, which has stricter type conversion rules.
Note that in the above example, we type the type (pun intended) of the array twice. If at any later time, say, the first double in this line will be changed to float, and the second one not, then twice more memory will be allocated than required. Even worse, if the type will be changed to something larger, like long double, then it will most probably lead to memory access issues that can be difficult to debug.

There is an alternative version of evaluating the size of a single element of the arr array by applying the sizeof operator to the dereferenced pointer arr:
```
       double *arr = malloc(length * sizeof *arr);
```
Note that in this case, sizeof is applied to an object, not a type, therefore, parentheses are optional. Now we avoid type-information duplication. If the variable name arr will be changed later, it is likely that the second reference to this variable in this line will be changed as well.

Alternatively, one can write:
```
       double *arr = malloc(arr[length]);
```
The last thing that I would like to mention is the order of the operands in the length * sizeof *arr. Imagine that instead we have an expression a * b * sizeof *arr for some large integer variables a and b. Then it is probable that an integer overflow will occur during the computation of a * b, and it will get a negative value. Then the product with the sizeof value (which has type size_t) will promote the result of a * b to size_t, which will likely change it to some very large and non-negative value. Then the result of the expression a * b * sizeof *arr can have completely different value than what the programmer was intending to have.

The solution to this is to switch the order of the operands of this product:
```
      double *arr = malloc(sizeof *arr * a * b);
```
Then, because sizeof *a is the first operand, then a and b will be promoted to type size_t, and the probability that a * b will overflow becomes much smaller. On typical modern computer architectures, the variables of type size_t can have much larger values (usually 2^64-1) than the variables of type int (usually 2^31-1 is the maximum value).

This is one of the examples where programming deviates from mathematics. In mathematics the product of several numbers has a commutative property, that is, the result does not depend on the order, in which they are multiplied, However, in programming, the result can be different depending on the order of operands.

The information in this post was collected from the following StackOverflow posts: