The increasing use of templates, increasing numbers of translation units and increasing amount of code in headers has tremendously affected the compile-time of C++ code bases. Indeed, it’s hard to justify the compile-time that we typically have in C++ when comparing with code bases of similar size in languages such as C# or Go. This article has for objective to explore the reasons that cause slower compile-time in C++ and with that understanding introduce trade-off that can improve the quality and compile time of a C or C++ code bases. I won’t discuss about techniques such as pre-compiled headers, but they can be added to further improve compile-time.
The C and C++ language is typically compiled by compiling translation units
(producing object files) that are later assembled (linked) by the linker to
create an executable binary. Moreover, the compilation of the translation
units is done independently of each others. For instance, you can
independently compile every translation units (with as many command as
translation unit) and then invoke the linker to produce the executable.
See compiler flag -c
on gcc or clang) The translation unit generally has
a one-to-one relationship with a source file (i.e. .c
, .cc
, .cpp
), but the standard doesn’t require it. For instance, the unity build programming
technique doesn’t has this one-to-one relationship.
This concept of translation units had and has several benefits, but comes
with drawbacks. In my opinion, the main drawback is that it doesn’t work
as most people expect. For this reason, it’s not used efficiently by most
programmers. Indeed, #include
isn’t an inclusion of dependency, but an
inclusion of text. One way to illustrate that is with the following code
which compile correctly.
// header.h
void print()
{
printf("Te"
// main.c
#include <stdio.h>
#include "test.h"
"st\n");
}
int main(void)
{
print();
return 0;
}
This last example also imply that an header generally can’t be pre-compiled independently of the translation units in which it will be included.
Example that illustrate this independence:
// header.h
struct foo {
#ifdef DEBUG
const char *name;
#endif
int number;
};
void print(struct foo *f);
// file.c (clang -c file.c -DDEBUG)
#include <stdio.h>
#include "header.h"
void print(struct foo *f)
{
#ifdef DEBUG
printf("[%s] %d\n", f->name, f->number);
#else
printf("%d\n", f->number);
#endif
}
// main.c (clang -c file.c)
#include "header.h"
int main(void)
{
struct foo f = {10};
print(&f);
return 0;
}
When the following command are used to compile, the program will very likely
segfault, because the memory layout of struct foo
isn’t the same in the
two different translation units.
Command used to compiles the executable:
clang -c main.c
clang -c file.c -DDEBUG
clang -o main main.o file.o
./main
It is not surprising, because the code in the header only make sense in
the context of a translation units, because #include
is an inclusion
of text, not an inclusion of dependency. The header “vector” of Microsoft
libc++ change the layout of std::vector
depending if _SECURE_SCL
is
defined and that can cause similar problems when crossing the boundaries
of two translation units.
Learn C the Hard Way, a reference in
C programming tutorial, introduce their first header, dbg.h, with a series
of logging macros. Those macros uses fprintf
, errno
and strerror
, so
dbg.h includes stdio.h, errno.h and string.h. Additionally, this is explained
with the sentence “Includes for the functions that these macros need”.
The approach of including dependencies of the header in the header is the dominating practice in C and C++. This lead to an obvious problem, recursive and multiple inclusion, but it’s easily circumvented with include guards. Moreover, C and C++ compilers can generally not assume that an including header is [idempotent] (https://en.wikipedia.org/wiki/Idempotence) and the following example illustrate that.
// file1.h
#ifndef FILE1_H
# define FILE1_H
# pragma message("file1.h included")
#else
# undef FILE1_H
#endif
This could be an important factor in slow compilation time, because compilers would need to re-interpret the header in the context of where it’s included, every time it’s included. #pragma once was a tentative to simplify this problem, but today the difference in compile time between guarding with pragma once and guarding with classic include guards is negligible. Indeed, compilers may optimize the include guards if it semantically equivalent to re-interpreting the header in the new context every time. GCC and Clang have shown that those optimizations are possible.
The typical include guard is the scheme 1 in the following snippet. I will also present a different kind of guard, that will be further explained.
Scheme 1: (Classic Include Guards)
// file1.h
#ifndef FILE_1_H
#define FILE_1_H
#include <stdio.h>
#include <string.h>
#define log_error(msg) fprintf(stderr, msg)
#endif
// main.c
#include "file1.h"
// ...
Scheme 2: (Single depth inclusion)
// file1.h
#ifdef FILE_1_H
#error "file1.h included more than once"
#endif
#define FILE_1_H
#define log_error(msg) fprintf(stderr, msg)
// main.c
#include <stdio.h>
#include <string.h>
#include "file1.h"
// ...
Before going over the advantages and drawbacks of the two include guard
schemes, I will formalize how I will use the word “dependency” from now
on. Indeed, in C and C++, headers don’t have dependency in itself. The
translation unit will have dependencies and adding an header may creates
more of them. Moreover, if you are not using all functionality of the
header some dependency may only need a declaration. For instance, the
following code will compile, but if you were to use the function func_bar
you would need the definition of struct bar
.
// file1.h
struct foo;
struct bar;
void func_foo(struct foo *f);
struct bar func_bar(void);
// main.c
#include <stdio.h>
#include "file1.h"
int main(void)
{
printf("Hello World!\n");
return 0;
}
In this case we could say that the declaration of struct foo
and
struct bar
are dependencies when including file1.h, but their
definitions are not. For instance, adding the following code to file1.h,
would not compile, because in the context of the translation unit
sizeof(struct foo)
is unknown.
static void print(void)
{
printf("%d\n", sizeof(struct foo));
}
With that philosophy we can define (informally) a more granular definition of dependency.
T
creates a dependency on the declaration of T
.We can apply this definition of dependency to analyze the header “dbg.h”
from Learn C The Hard Way. If you were to use the debug
macro which use
fprintf
and stderr
, you would create a dependency on the declaration
of fprintf
and the declaration of stderr
.
The method used in the first scheme creates a lot of extra works for the
compiler. The compiler will generally include all levels of dependencies
for a given headers. In returns, a user can include the an header and have
a guarantee that all functionalities are completely accessible. This method
also creates a fairly random include order that can be hard to control,
especially when the header offers “#define” to customize its functionalities.
(e.g. WIN32_LEAN_MEAN
, NOMINMAX
, …)
The second scheme offer the most granular control. That is, an header doesn’t includes any other headers. It forces you to manually include the dependencies in every translation units. This can be quite interesting, because by looking at the source file, it’s very easy to understand the include order and see design flaws. Typically, it will forces you to create a oriented tree of dependency. In practice, you will group your header in dependency groups where “Group 0” is the least dependent and that increment. Header, should never creates dependency on the same level or on higher level groups. This means, that the include order doesn’t mater within a group, but groups should be included in ascending order.
// Group 0
#include <stddef.h> // no dependency
#include <stdint.h> // no dependency
// Group 1
#include <MyProgram/array.h> // needs size_t (stddef.h)
#include <MyProgram/string.h> // needs uint8_t (stdint.h)
// Group 2
#include <MyProgram/path.h> // needs "MyProgram/string.h"
#include <MyProgram/unicode.h> // needs "MyProgram/string.h"
// Group 3
#include <MyProgram/file.h> // needs "MyProgram/path.h" & "MyProgram/unicode.h"
This section is particularly relevant to C++, but can be applied to C.
The rule 2 & 4 of dependency were about creating dependency on declaration.
Declaring used type in the header (e.g. forward declaration) can avoid
includes and this can tremendously improve the compile time. In C, it’s
generally easy, because in most cases declaring a type T
will be writing
struct T;
at the top of the header. In C++ this isn’t necessary the case.
For instance, the following code was taken from a json library.
template <
template<typename U, typename V, typename... Args> class ObjectType = std::map,
template<typename U, typename... Args> class ArrayType = std::vector,
class StringType = std::string,
class BooleanType = bool,
class NumberIntegerType = std::int64_t,
class NumberUnsignedType = std::uint64_t,
class NumberFloatType = double,
template<typename U> class AllocatorType = std::allocator,
template<typename T, typename SFINAE = void> class JSONSerializer = adl_serializer
>
class basic_json
{
// ...
};
using json = basic_json<>;
Now assume that you have a class Foo (header foo.h) with a private
helper method which has a parameter of type json
. This function needs
to be declared in the header with the class definition so, from rule 4
of dependency, foo.h creates a dependency on the declaration of json
.
But, you can’t forward declare a typedef so you will need to forward
declare basic_json
and write the typedef yourself. This is not practical,
because the declaration of basic_json
is very convoluted and may
changes with updates. Moreover, it creates dependency on the declaration
of std::map
, std::vector
, etc. which have the same problem.
Practically, “json.hpp” will be included before every include of “foo.h”.
Similarly, <vector>, <map>, etc. are included and not forward declared.
This header has 14k lines of highly templated code that includes
<algorithm>, <array>, <cassert>, <ciso646>, <clocale>, <cmath>,
<cstddef>, <cstdint>, <cstdlib>, <cstring>, <forward_list>,
<functional>, <initializer_list>, <iostream>, <iterator>, <limits>,
<locale>, <map>, <memory>, <numeric>, <sstream>, <string>,
<type_traits>, <utility> and <vector>.
Compiling a empty program with clang version 6.0.0 (tags/RELEASE_600/final)
take me 0 second. Including this header (without using it) took 0.75 seconds
on the first try and stabilized at 0.525 seconds. Of course, this example is
extreme, but libraries such as libc++ and boost are in this extreme case.
I compared those compile time with stb.h which is a so-called
“single-file library”. This library offer the option to declare or/and
to define the external symbols. “stb.h” always declares the symbols and
defines them if STB_DEFINE
is defined. In addition, it’s a file of 14.5k
lines of code. With the same conditions as “json.h” the first compilation
with STB_DEFINE
took 0.5 seconds and it stabilized at 0.23 seconds.
Without STB_DEFINE
, it compiled in 0.0938 seconds and stabilize at
0.0938 seconds. This header includes <intrin.h>, <stdlib.h>, <stdio.h>,
<string.h>, <time.h> and if STB_DEFINE
is defined it includes
<assert.h>, <stdarg.h>, <stddef.h>, <ctype.h>, <math.h>, <io.h>,
<direct.h>, <stdlib.h>, <string.h>, <malloc.h>, <io.h> and
<process.h> as well.
This problem could be fixed, but it would requires to split the headers in three different files, namely the declarations (.decl), the definition (.def) and the inlined functions (.inl). The “.inl” depends on the “.def” which depends on the “.decl”. In practice, “.def” doesn’t depend on “.decl”, because a definition is also a declaration in C and C++.
Let’s start by looking at the compilation time of SQLite, more precisely the compilation time of the sqlite-amalgamation version 3.27.1. This version concatenate all the code in a single file of 221 850 lines (137 320 lines of code). With clang version 6.0.0, the first compilation took 2.2969 seconds and then stabilized at ~2.1 seconds. I use this example to show that C and C++ compilers can process an important amount of code in a short amount of time. This example, is ideal for compile time, because there is no redundant work by the compiler. The reason for that, is the single translation unit. I emphasize the “single translation unit” over the “single source file”, because one could split the code in several source files that would then be compiled in the same translation unit. Indeed, nothing forbid you the include a source file and there is no reason to believe it’s bad. In fact, I will elaborate over why, when and how you may want to include different source files in the same translation unit.
Let’s remind ourself, that the goal of this article is to explore how we can improve our iterating time by improving the compilation speed of C and C++ projects. Although, the numbers I have computed don’t follow a very scientific approach, they are only their to give a rough estimate of the problem. Indeed, when a project of less than 250k lines takes more than a minute to compile, it’s obvious that the potential iterating time is vastly affected. Extrapolating from sqlite-amalgamation 3.27.1, you could compile 6.3 million lines (linear scaling) and 1.336 million lines (quadratic scaling) in 60 seconds.
Moreover, having different translation units will necessary impact the compile time of a “rebuild”. Indeed, every translation unit will need to go over all the included headers and interpret them in the context of the translation unit. In returns, having different translation unit allow more parallelism and partial recompilation. So, the first question to ask yourself is how mature your project is. This question can be answered by looking at how often a large portion of the project is recompiled. Typically, how often are there modifications to public headers.
Finally, to reduce the friction if you ever need to split a translation unit in different translation units I recommend to include the sources starting with the highest group and going down. e.g.
// header includes
#include <Group0/file1.h>
#include <Group0/file2.h>
#include <Group1/file3.h>
#include <Group1/file4.h>
#include <Group2/file5.h>
// source includes
#include <Group2/file5.c>
#include <Group1/file4.c>
#include <Group1/file3.c>
#include <Group0/file2.c>
#include <Group0/file1.c>
Theoretically speaking, the include order of the sources shouldn’t mater, because the sources should only rely on the api defined in the headers. Practically speaking, a source could rely on internal api defined in an other source file included before. But, this include order is useful, because it forces higher dependency group to only rely on the content of the headers. Moreover, tests can be done by removing source includes from top to down while ensuring that the code still compile at every step.
I will finish this section by adding that you don’t need to restrict yourself to a single amalgamation. Indeed, you can compile different amalgamation and this is particularly relevant when several project works together. Interestingly enough, that was the original purpose of a translation units. It was lost with time, especially when the Java OOP started to impose itself with the one-class-one-file principle.
In conclusion, I think one needs to be able to combined those different methods in a project if compile time is a problem. Forward declaring object is a very easy option for several structs or classes. A project that is very well separated can compile few translation units (e.g. network.cpp, graphic.cpp) that includes the relevant source files themselves. Finally, I really recommend trying to include files by dependency groups to expose the design flaws and potentially improves compile-time.