How to avoid subtle bugs in malloc usage in C
By Dmitry Kabanov
Whenever a computer program needs to store data of the size determined only
during the runtime, then dynamic memory allocation is required.
Memory allocation happens in the memory heap, and in C is commonly done
via a library function malloc, which has subtle things with its usage
that I would like to discuss here.
First of all, malloc has the following signature:
void *malloc(size_t size);
To allocate an array of 100 elements of type double, the following code
is commonly used:
int length = 100;
double *arr = (double *) malloc(length * sizeof(double));
where operator sizeof is applied to type double, so its operand must be
parenthesized.
Although the above code works most of the time, there are ways to improve the maintainability and correctness of this code.
-
First, often an explicit cast is applied to the returned value of
malloc(which returnsvoid *):double *arr = (double *) malloc(length * sizeof(double));This is completely redundant in C, as
void *is implicitly converted todouble *because of the type information on the left-hand side. It is required if this line is compiled in C++, which has stricter type conversion rules. -
Note that in the above example, we type the type (pun intended) of the array twice. If at any later time, say, the first
doublein this line will be changed tofloat, and the second one not, then twice more memory will be allocated than required. Even worse, if the type will be changed to something larger, likelong double, then it will most probably lead to memory access issues that can be difficult to debug.There is an alternative version of evaluating the size of a single element of the
arrarray by applying thesizeofoperator to the dereferenced pointerarr:double *arr = malloc(length * sizeof *arr);Note that in this case,
sizeofis applied to an object, not a type, therefore, parentheses are optional. Now we avoid type-information duplication. If the variable namearrwill be changed later, it is likely that the second reference to this variable in this line will be changed as well.Alternatively, one can write:
double *arr = malloc(arr[length]); -
The last thing that I would like to mention is the order of the operands in the
length * sizeof *arr. Imagine that instead we have an expressiona * b * sizeof *arrfor some large integer variablesaandb. Then it is probable that an integer overflow will occur during the computation ofa * b, and it will get a negative value. Then the product with thesizeofvalue (which has typesize_t) will promote the result ofa * btosize_t, which will likely change it to some very large and non-negative value. Then the result of the expressiona * b * sizeof *arrcan have completely different value than what the programmer was intending to have.The solution to this is to switch the order of the operands of this product:
double *arr = malloc(sizeof *arr * a * b);Then, because
sizeof *ais the first operand, thenaandbwill be promoted to typesize_t, and the probability thata * bwill overflow becomes much smaller. On typical modern computer architectures, the variables of typesize_tcan have much larger values (usually2^64-1) than the variables of typeint(usually2^31-1is the maximum value).This is one of the examples where programming deviates from mathematics. In mathematics the product of several numbers has a commutative property, that is, the result does not depend on the order, in which they are multiplied, However, in programming, the result can be different depending on the order of operands.
The information in this post was collected from the following StackOverflow posts: