“Modern C”: Notes on chapter 6 “Derived data types”

By Dmitry Kabanov

March 1, 2024

These are my notes taken while reading chapter 6 “Derived data types” from the book “Modern C” by Jens Gustedt.

This chapter discusses objects that consist of other objects, such as arrays, pointers, structures, and type aliases.

The table of contents for all notes for this book are available in that post.

Derived data types are defined as entities that include values of other data types as subparts.

The first way to define derived data types is aggregate data types: arrays (they aggregate elements of the same base type) and structures (they aggregate elements of different base data types).

Other two ways to define derived data types are pointers to memory location where some object is located and unions (they overlay several objects in the same memory location).

One can also introduce new names for existing types via typedef.

6.1 Arrays

Arrays in C are deeply connected to pointers, however, it is important to remember that arrays are not pointers.

Arrays are declared like this:

  double a[4];
  int B[16][20];

Note that B is two-dimensional array, precisely, B is 16 arrays of array int[20].

Arrays behave differently than the objects of scalar types. Particularly, arrays evaluate to true in conditions, they cannot be used with arithmetic or comparison operators and they cannot be assigned to.

Arrays in C are fixed-lengths arrays (FLA) or variable-length arrays (VLA). VLA are available only from C99, they can can be declared with length, defined by a variable.

FLA have length determined either by an integer constant expression (ICE) or by an initializer:

double A[4];
double B[] = {1.0, 2.0, 4.0, 16.0};
double C[] = { [3] = 5.0, [1] = 13.0 };

All the above arrays has length 4.

The length of an array arr is (sizeof arr) / (sizeof arr[0]). Note that here the operator sizeof is applied to an object, therefore, parentheses are not needed.

When arrays are passed as parameters to functions, the innermost dimension is lost. Do not use sizeof operator to determine the size of the array parameter. Arrays behave as if they are passed to functions by reference, that is, a function can change the passed array.

Strings are special arrays that have type char and have null character in them.

Note that during initialization one can by mistake create a char array that is not a proper string:

char s[] = "jay";  // Has 4 elements: 'j', 'a', 'y', '\0'
char not_s[3] = "jay";  // Has 3 elements: 'j', 'a', 'y', so not a proper 0-string.

Header file string.h contains function to work with char arrays and strings. Functions that start with mem require only char arrays, and with str require strings For example memcpy is used to copy len characters from source array to target array or memchr looks for character in a given char array.

Example function to work with strings are strlen to find string’s length, strcpy copies one string to another, strcmp compares two strings.

It is often happening mistake that string functions are used on character arrays that do not contain null character.

In prototype to strlen we can see that at least one character (\0) is required:

size_t strlen(char const s[static 1])

And, for example, memcpy has a prototype that does not actually care if it is a char array or not:

void *memcpy(void *target, void *source, size_t len);

Here void * means an object of unknown type and len should be treated as the number of bytes to copy.

6.2 Pointers as opaque types

Pointers are opaque types, in a sense, that they are related indirectly to data but we cannot control, say, the actual value of the pointer, that is, the address of memory, where the pointed-to data lie.

Pointers can be valid, null, or indeterminate. Pointers must be always initialized either to actual data or to 0 (which is the same as NULL). If a particular pointer is in indeterminate state, really bad things can happed inside the program.

Very important rule. Always initialize pointers.

6.3 Structures

We can combine different variables of different types in a structure:

  struct birdStruct {
    char const* jay;
    char const* magpie;
    char const* raven;
    char const* chough;
  };

(mind the ; in the end), and then we can create a variable of this type and initialize it:

  struct birdStruct const birds = {
    .chouch = "Henry",
    .raven = "Lissy",
    .magpie = "Frau",
    .jay = "Joe",
  };

Later in the problem we can refer to the name of the raven as birds.raven.

Another example is from <time.h> to represent time:

  struct tm {
    int tm_sec;
    int tm_min;
    int tm_hour;   // [0; 23].
    int tm_mday;   // [1; 31]
    int tm_mon;    // Months since January [0; 11]
    int tm_year;   // Years sinc 1900
    int tm_wday;   // Weekday since Sunday [0; 6]
    int tm_yday;   // Days since new year [0; 365]
    int tm_isdst;  // Is Daylight-Saving Time true?
  };

Note that when you initialize a struct omitted members are set to zero automatically.

When we initialize a structure like this, it is a bit tedious to set all the values by hand, for example, wday and yday can be determined automatically from mday, mon, and year. We can write a function that fills in remaining structure members:

  struct tm time_set_yday(struct tm t) {
    // ...
    t.tm_yday = ...;
    return t;
  }

Two important things here:

one needs to write always struct tm which is a bit annoying;
when structure is passed, it is passed by value, therefore, we need to return it back to the caller. The caller code is then:
```
      today = time_set_yday(today);
```

Note that returned object is different from the function argument, as struct is passed by value!

Structures can be used freely with the assignment operator, but not with comparison operators == and !=.

A structure layout is an important design decision, so it should be done carefully.

For example, one can work with nano-second-precise timestamps using the following structure:

  struct timespec {
    time_t tv_sec;    // Whole seconds >= 0
    long tv_nsec;     // Nanoseconds [0; 999'999'999]
  };

Any data type except VLA is allowed as a struct member. Therefore, structures can be nested, and the embedded structure can be declared directly insides the enclosing data structure:

  struct person {
    char name[256];
    struct stardate {
      struct tm date;
      struct timespec precision;
    } bdate;
  };

Nested struct’s have the same visibility scope as the enclosing one.

The above definition is equivalent to:

  struct stardate {
      struct tm date;
      struct timespec precision;
  };
  struct person {
    char name[256];
    struct stardate bdate;
  };

6.4 New names for types: type aliases

When we need to declare a variable of newly defined structure, we need to add struct in the beginning, which makes it a bit clumsy.

To avoid is, one can introduce new names for existing types using the keyword typedef. Then we can have new aliases for structures:

  typedef struct birdStruct birdStructure;

Moreover, one can reuse the structure tag as the alias! Idiomatic use is to fordward-declare the alias for a struct:

  typedef struct birdStruct birdStruct;
  struct birdStruct {
  ...
  };

Other examples of aliases:

  typedef double vector[64];
  typedef vector vecvec[16];
  typedef double matrix[16][64];
  vecvec A;
  matrix B;

Here, both A and B have the same data type of double[16][64]. Yes, the syntax for array aliases is a little bit strange, but one can get used to it.

Just for clarity again:

  typedef double vector[64];

declares a new type alias vector which resolves to double[64].

The C standard uses suffix _t for data types to emphasize that they are aliases. It is not recommended to use this suffix, to avoid future conflicts with the new revisions of C.