“Modern C”: Notes on chapter 6 “Derived data types”
By Dmitry Kabanov
These are my notes taken while reading chapter 6 “Derived data types” from the book “Modern C” by Jens Gustedt.
This chapter discusses objects that consist of other objects, such as arrays, pointers, structures, and type aliases.
The table of contents for all notes for this book are available in that post.
Derived data types are defined as entities that include values of other data types as subparts.
The first way to define derived data types is aggregate data types: arrays (they aggregate elements of the same base type) and structures (they aggregate elements of different base data types).
Other two ways to define derived data types are pointers to memory location where some object is located and unions (they overlay several objects in the same memory location).
One can also introduce new names for existing types via typedef
.
6.1 Arrays
Arrays in C are deeply connected to pointers, however, it is important to remember that arrays are not pointers.
Arrays are declared like this:
double a[4];
int B[16][20];
Note that B
is two-dimensional array, precisely, B is 16 arrays
of array int[20]
.
Arrays behave differently than the objects of scalar types.
Particularly, arrays evaluate to true
in conditions, they cannot
be used with arithmetic or comparison operators and they cannot be
assigned to.
Arrays in C are fixed-lengths arrays (FLA) or variable-length arrays (VLA). VLA are available only from C99, they can can be declared with length, defined by a variable.
FLA have length determined either by an integer constant expression (ICE) or by an initializer:
double A[4];
double B[] = {1.0, 2.0, 4.0, 16.0};
double C[] = { [3] = 5.0, [1] = 13.0 };
All the above arrays has length 4.
The length of an array arr
is (sizeof arr) / (sizeof arr[0])
.
Note that here the operator sizeof
is applied to an object, therefore,
parentheses are not needed.
When arrays are passed as parameters to functions, the innermost dimension
is lost.
Do not use sizeof
operator to determine the size of the array parameter.
Arrays behave as if they are passed to functions by reference, that is,
a function can change the passed array.
Strings are special arrays that have type char
and have null character in them.
Note that during initialization one can by mistake create a char
array
that is not a proper string:
char s[] = "jay"; // Has 4 elements: 'j', 'a', 'y', '\0'
char not_s[3] = "jay"; // Has 3 elements: 'j', 'a', 'y', so not a proper 0-string.
Header file string.h
contains function to work with char
arrays and strings.
Functions that start with mem
require only char
arrays,
and with str
require strings
For example memcpy
is used to copy len
characters from source array
to target array or memchr
looks for character in a given char
array.
Example function to work with strings are strlen
to find string’s length,
strcpy
copies one string to another, strcmp
compares two strings.
It is often happening mistake that string functions are used on character arrays that do not contain null character.
In prototype to strlen
we can see that at least one character (\0
) is required:
size_t strlen(char const s[static 1])
And, for example, memcpy
has a prototype that does not actually care
if it is a char
array or not:
void *memcpy(void *target, void *source, size_t len);
Here void *
means an object of unknown type and len
should be treated
as the number of bytes to copy.
6.2 Pointers as opaque types
Pointers are opaque types, in a sense, that they are related indirectly to data but we cannot control, say, the actual value of the pointer, that is, the address of memory, where the pointed-to data lie.
Pointers can be valid, null, or indeterminate.
Pointers must be always initialized either to actual data or to 0
(which is the same as NULL
).
If a particular pointer is in indeterminate state, really bad things
can happed inside the program.
Very important rule. Always initialize pointers.
6.3 Structures
We can combine different variables of different types in a structure:
struct birdStruct {
char const* jay;
char const* magpie;
char const* raven;
char const* chough;
};
(mind the ; in the end), and then we can create a variable of this type and initialize it:
struct birdStruct const birds = {
.chouch = "Henry",
.raven = "Lissy",
.magpie = "Frau",
.jay = "Joe",
};
Later in the problem we can refer to the name of the raven as
birds.raven
.
Another example is from <time.h>
to represent time:
struct tm {
int tm_sec;
int tm_min;
int tm_hour; // [0; 23].
int tm_mday; // [1; 31]
int tm_mon; // Months since January [0; 11]
int tm_year; // Years sinc 1900
int tm_wday; // Weekday since Sunday [0; 6]
int tm_yday; // Days since new year [0; 365]
int tm_isdst; // Is Daylight-Saving Time true?
};
Note that when you initialize a struct
omitted members are set to zero
automatically.
When we initialize a structure like this, it is a bit tedious to set
all the values by hand, for example, wday and yday can be determined
automatically from mday
, mon
, and year
.
We can write a function that fills in remaining structure members:
struct tm time_set_yday(struct tm t) {
// ...
t.tm_yday = ...;
return t;
}
Two important things here:
- one needs to write always
struct tm
which is a bit annoying; - when structure is passed, it is passed by value, therefore, we need
to return it back to the caller. The caller code is then:
today = time_set_yday(today);
Note that returned object is different from the function argument,
as struct
is passed by value!
Structures can be used freely with the assignment operator, but not
with comparison operators ==
and !=
.
A structure layout is an important design decision, so it should be done carefully.
For example, one can work with nano-second-precise timestamps using the following structure:
struct timespec {
time_t tv_sec; // Whole seconds >= 0
long tv_nsec; // Nanoseconds [0; 999'999'999]
};
Any data type except VLA is allowed as a struct member. Therefore, structures can be nested, and the embedded structure can be declared directly insides the enclosing data structure:
struct person {
char name[256];
struct stardate {
struct tm date;
struct timespec precision;
} bdate;
};
Nested struct’s have the same visibility scope as the enclosing one.
The above definition is equivalent to:
struct stardate {
struct tm date;
struct timespec precision;
};
struct person {
char name[256];
struct stardate bdate;
};
6.4 New names for types: type aliases
When we need to declare a variable of newly defined structure,
we need to add struct
in the beginning, which makes it a bit clumsy.
To avoid is, one can introduce new names for existing types using
the keyword typedef
.
Then we can have new aliases for structures:
typedef struct birdStruct birdStructure;
Moreover, one can reuse the structure tag as the alias! Idiomatic use is to fordward-declare the alias for a struct:
typedef struct birdStruct birdStruct;
struct birdStruct {
...
};
Other examples of aliases:
typedef double vector[64];
typedef vector vecvec[16];
typedef double matrix[16][64];
vecvec A;
matrix B;
Here, both A
and B
have the same data type of double[16][64]
.
Yes, the syntax for array aliases is a little bit strange, but one
can get used to it.
Just for clarity again:
typedef double vector[64];
declares a new type alias vector
which resolves to double[64]
.
The C standard uses suffix _t
for data types to emphasize that
they are aliases.
It is not recommended to use this suffix, to avoid future conflicts
with the new revisions of C.