Array and pointers equivalence myth
By Dmitry Kabanov
I’d like to share some knowledge about subtle thing that I have learned recently in C. It is related to the relations between arrays and pointers, and how they are often treated as equivalent to each other (for example, this is how passing an array to a function works when one of the expected arguments is a pointer). However, arrays and pointers are not equivalent to each other.
The aforementioned conversion of an array to a pointer at a function call is what makes one think that they are equivalent. I knew already about this implicit conversion since I programmed C++ a bit in my undegraduate studies. However, this is not always what you need as turned out in this discussion on StackOverflow. I had a problem using libffi and the user @selbie pointed me to the difference between different types of pointers related to arrays.
TLDR: There are pointers to the first element of an array and there are pointers to the array itself.
For example, if we have an array:
char s[16] = "Hello World!";
then we can obtain a pointer to the first element of this array:
char *ps0 = s;
of we can obtain a pointer to the array itself:
char (*ps)[] = &s;
In the expressions, where we use simply the name of the array,
it is very often decays to the pointer to the first element,
that is s
is equivalent to &s[0]
(note that here s[0]
is the operand to &
.
For example, this happens when passing arrays to functions:
inside a function we have a pointer to the first element.
This is called array decay.
However, in some operations, such as operator sizeof
, the name
of the variable represents the array itself.
In details, the text of the C99 standard from
here
says:
Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type “array of type” is converted to an expression with type “pointer to type” that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
Note that if you print all three values: s
, ps0
and ps
,
they all have the same value—the address of the first element
of the array, however, they have different types.
This small C program along with the debugger output demonstrates this.
// Saved in file array_decay.c
#include <stdio.h>
int main(int argc, char *argv[])
{
char s[16] = "Hello World!";
char *p_to_first_element = s;
char (*p_to_array)[] = &s;
printf("%-20s %p\n", "s", s);
printf("%-20s %p\n", "p_to_first_element", p_to_first_element);
printf("%-20s %p\n", "p_to_array", p_to_array);
return 0;
}
We compile it with:
gcc -g -O0 array_decay.c -o array_decay
and run under gdb
debugger:
gdb ./array_decay
The output shows that the values are indeed the same:
s 0x7fffffffc7d0
p_to_first_element 0x7fffffffc7d0
p_to_array 0x7fffffffc7d0
Inside the debugging session we can check the types of the variables:
(gdb) whatis s
type = char [16]
(gdb) whatis p_to_first_element
type = char *
(gdb) whatis p_to_array
type = char (*)[]
Additional literature that I have used: