“Modern C”: Notes on chapter 5 “Basic values and data”

By Dmitry Kabanov

September 15, 2023

These are my notes taken while reading chapter 5 “Basic values and data” from the book “Modern C” by Jens Gustedt.

This chapter discusses values of different objects that are used in a C program, and how they are represented.

The table of contents for all notes for this book are available in that post.

C programs manipulate data values which have different representations in a computer. Programmer is abstracted out from the actual representations. Actual program is C’s abstract state machine that has objects that are represented with concrete types on a concrete computer, and values that change in time (hence, state changes).

5.1 Abstract state machine

Program has different states as the values manipulated by the program change. Values are observable when they are assigned to variables, and not-observable when they are results of intermediate expressions. For example, in this statement

x = (x * 1.5) - y;

the subexpression (x * 1.5) is hidden as we never assign it to a variable, while the value of x may be observable as it is a variable hence is saved in addressable memory.

C compiler is allowed to do optimizations that remove some variables if it is clear that the end results does not change.

Takeaway 5.2. All values in C programs are numbers or translate to numbers.

A type is an additional property that is associated with a value. In C programs all values have types that are statically determined. Also, results of computations depend on the type: for example, if the type of subexpression is unsigned, the result cannot be negative.

Also, the types are actually abstract as they depend on the actual platform.

5.2 Basic types

Some of the basic types are built-in keywords such as unsigned, int, and double. Some other basic types are defined in header files, such as bool or size_t.

Actually, all basic types in C are numbers or can be treated as numbers. There are two principal classes of numbers: integers and floating-point numbers. Integers can be subclassed as signed and unsigned, while floating-point numbers are subclassed as real and complex.

There are narrow types for integers that are promoted to a wider type during computations. For example, narrow types bool, (un)signed char, (un)signed short are promoted on most of today’s platforms to signed int.

Floating-point numbers are float, double, long double for reals, and float complex, double complex, long double complex for complex numbers.

The precision, i.e., the ranges, of these types are not strictly defined, that is, they depend on the platform. However, C standard constrains the types. For example, char is less or equal to short, short is less or equal to int, int is less or equal to long. On my Linux x86_64 machine, GCC compiler says that int has size of 4 bytes, while long of 8 bytes, has the range of values for long is much bigger than for int on this particular platform.

For signed and unsigned numbers, their sizes in bytes are equal, hence, they can represent different maximum value: for example, typical 32-bit int value has maximum 2^31 - 1 which is about 2 billions, while unsigned int has maximum 2^32 - 1 about 4 billions.

There are special semantic types. For example, stddef.h defines type size_t to represent sizes in programs and the type ptrdiff_t to represent differences between large numbers (and negative differences are allowed). Header file stdint.h add uintmax_t and intmax_t types that denote widest possible on this platform unsigned and signed integer types, respectively.

5.3 Specifying values

Values can be specified as normal decimal integers, hexadecimal integers such as 0x25ABB7F, decimal floating-point numbers such as 3.14E0, hexadecimal floating-point numbers such as 0x7.AFP10, characters such as ‘A’, strings such as “Hello\b and Heaven”, where special escape sequence \b deletes previous character.

Integer literals can have specified type: for example 3 is perfectly representable as short, however, we can prescribed it to be of type unsigned long by adding suffix: 3UL.

Floating-point literals by themselves are of type double but can be specified as float or long double with suffixes F and L.

Complex numbers can be specified with the help of the macro I. Value of type double complex is given by expressions like 2.5 + 0.3*I and of type float complex by expressions like 0.5F + 0.3F*I .

5.4 Implicit conversions

C compiler does a lot of implicit conversions of types. For example, expression -1U has type unsigned because minus operator does not change the type, and expression with result more than 2^31 - 1 may not fit into typical 32-bit int.

The recommendations are basically to avoid narrowing conversions, that is, to avoid assigning the result of an expression to a variable of narrower type. Also, it is not recommended to mix signed and unsigned expressions. Last, use unsigned types when you can.

5.5 Initializers

In C, practically all variables must be initialized.

Initialization of arrays can explicitly state the index of the element, which is preferable:

double A[] = {7.8};
double B[] = {[0] = 2.5, [3] = 47.23};

The default initialization value is 0.

5.6 Named constants

Sometimes we have constant values and instead of using them literally, it is better to name them. For semantic reasons, even if the same value used in the program with different meanings, then there must be several different named constants with the same value. C offers two ways to do this: using enum or using macros.

Constants should be distinguished from const qualified objects. The qualifier const makes variable read-only after it is initialized.

For example, type char const* const denotes a read-only object with read-only strings.

C allows to name small integers via enumerations:

enum corvid { magpie, raven, jay, n_corvids };

where magpie will initialize to zero, raven to one, and so on. Note, that here we use an idiom of adding as the last element the constant which will tell us the number of the elements in the enumeration.

Enumeration constants are of type signed int and can be initialized in more complex ways, for example,

enum constants { p0 = 1, p1 = 7*p0, p2 = 2*p1 };

as long as the initialization values are integer constant expressions, that is, can be fully determined at compile time.

To declare constants of other types than signed int, the only way is to use macros, which are actually handled by C preprocessor. Macros are defined like this:

#define M_PI 3.1415926

When C preprocessor preprocesses a source code file, it replaces all strings M_PI with the actual value (which is a double literal in this case).

It is usually a good idea to write macros names in all caps, LIKE_THIS, although in the C library, some values are not using this convention (for example, false).

5.7 Binary representations

This section is super dense and technical, so I only skimmed through it as I do not require the knowledge from it directly for my current project.

The most interesting bits that I have noticed are the following.

Unigned integers wrap nicely (they form a ring in a mathematical sense) and they never leave to problems.

Signed integers can trap, that is, lead to errors such as arithmetic overflow.

Header file stdint.h provides fixed-width integers, for example, uint32_t for unsigned integers of width 32 bits, or int8_t for signed integers of width 8 bits.

Floating-point numbers such as float~s and ~double~s represent a subset of real numbers. Only values that can be expanded in powers of 2 can be represented exactly, for example ~0.5, while others cannot, for example 0.3 is irrational number in binary representation.

Also, floating-point numbers do not obey to the arithmetic laws (associative, commutative, distributive), which means that change in order of operations can give diffrent result. Also, the results of operations with numbers of very different magnitudes can provide results, which are different from mathematics. For example, adding very small number to a very large number can have just the large number as the result.

Floating-point numbers must not be compared for equality. The only meaningful comparison is, how close they are.

Complex numbers are a pair of real numbers. Header file tgmath.h provides type-generic macros creal and cimag that return real and imaginary parts, respectively.