How to avoid subtle bugs in malloc usage in C
By Dmitry Kabanov
Whenever a computer program needs to store data of the size determined only
during the runtime, then dynamic memory allocation is required.
Memory allocation happens in the memory heap, and in C is commonly done
via a library function malloc
, which has subtle things with its usage
that I would like to discuss here.
First of all, malloc
has the following signature:
void *malloc(size_t size);
To allocate an array of 100 elements of type double
, the following code
is commonly used:
int length = 100;
double *arr = (double *) malloc(length * sizeof(double));
where operator sizeof
is applied to type double
, so its operand must be
parenthesized.
Although the above code works most of the time, there are ways to improve the maintainability and correctness of this code.
-
First, often an explicit cast is applied to the returned value of
malloc
(which returnsvoid *
):double *arr = (double *) malloc(length * sizeof(double));
This is completely redundant in C, as
void *
is implicitly converted todouble *
because of the type information on the left-hand side. It is required if this line is compiled in C++, which has stricter type conversion rules. -
Note that in the above example, we type the type (pun intended) of the array twice. If at any later time, say, the first
double
in this line will be changed tofloat
, and the second one not, then twice more memory will be allocated than required. Even worse, if the type will be changed to something larger, likelong double
, then it will most probably lead to memory access issues that can be difficult to debug.There is an alternative version of evaluating the size of a single element of the
arr
array by applying thesizeof
operator to the dereferenced pointerarr
:double *arr = malloc(length * sizeof *arr);
Note that in this case,
sizeof
is applied to an object, not a type, therefore, parentheses are optional. Now we avoid type-information duplication. If the variable namearr
will be changed later, it is likely that the second reference to this variable in this line will be changed as well.Alternatively, one can write:
double *arr = malloc(arr[length]);
-
The last thing that I would like to mention is the order of the operands in the
length * sizeof *arr
. Imagine that instead we have an expressiona * b * sizeof *arr
for some large integer variablesa
andb
. Then it is probable that an integer overflow will occur during the computation ofa * b
, and it will get a negative value. Then the product with thesizeof
value (which has typesize_t
) will promote the result ofa * b
tosize_t
, which will likely change it to some very large and non-negative value. Then the result of the expressiona * b * sizeof *arr
can have completely different value than what the programmer was intending to have.The solution to this is to switch the order of the operands of this product:
double *arr = malloc(sizeof *arr * a * b);
Then, because
sizeof *a
is the first operand, thena
andb
will be promoted to typesize_t
, and the probability thata * b
will overflow becomes much smaller. On typical modern computer architectures, the variables of typesize_t
can have much larger values (usually2^64-1
) than the variables of typeint
(usually2^31-1
is the maximum value).This is one of the examples where programming deviates from mathematics. In mathematics the product of several numbers has a commutative property, that is, the result does not depend on the order, in which they are multiplied, However, in programming, the result can be different depending on the order of operands.
The information in this post was collected from the following StackOverflow posts: