Last Updated:

Separate compilation of C++ programs

When we write a C/C++ program in one file, there are usually no problems. They are waiting for the moment when the source code needs to be split into several files. In this article, I will try to tell you how to do it correctly.

Terms

A few words about the terms. The following are definitions of the terms as they are used in this article. In some cases, these definitions have a narrower meaning than the generally accepted ones. This is done intentionally, so as not to drown in details and unnecessary clarifications.

Source code is a program written in a programming language, in text format. As well as a text file containing the source code.

A compiler is a program that compiles (unexpectedly! isn't it?). At the moment, among beginners, the most popular C/C++ compilers are GNU g++ (and its ports for various operating systems) and MS Visual Studio C++ of various versions. For more information, see Wikipedia articles: Compilation of C++ Compilers, C++ Compilers.

Compilation is the conversion of source code into an object module.

An object module is a binary file that contains specially prepared executable code that can be combined with other object files using a link editor (linker) to obtain a ready-made executable module or library. (details)

A linker (link editorlinkerassembler) is a program that performs a link ("linking", "assembly"): it takes one or more object modules as input and assembles an executable module from them. (details)

An executable module (executable file) is a file that can be run for execution by a processor running an operating system. (details)

A preprocessor is a word processing program. There can be either a separate program or be integrated into the compiler. In either case, the input and output for the preprocessor are in text format. The preprocessor converts the text according to the preprocessor directives. If the text does not contain preprocessor directives, the text remains unchanged. For more information, see Wikipedia: Preprocessor and C Preprocessor.

The Integrated Development Environment (IDE) is an integrated development environment. A program (or a set of programs) designed to simplify writing source code, debugging, project management, setting compiler, linker, debugger parameters. It is important not to confuse the IDE and the compiler. As a rule, the compiler is self-sufficient. The compiler may not be included with the IDE. On the other hand, different compilers can be used with some IDEs. (details)

Declaration — description of a certain entity: function signature, type definition, description of an external variable, template, etc. The declaration notifies the compiler of its existence and properties.

Definition is the implementation of a certain entity: a variable, a function, a class method, etc. When processing a definition, the compiler generates information for the object module: executable code, memory reservation for the variable, etc.

From source code to executable module

The creation of an executable file has long been done in three stages: (1) processing the source code by the preprocessor, (2) compiling into object code, and (3) linking object modules, including modules from object libraries, into an executable file. This is the classic schema for compiled languages. (Other schemes are already in use.)

Often the compilation of a program is the whole process of converting source code into an executable module. Which is wrong. Note that the IDE calls this process build a project.

IDEs typically hide three separate stages of creating an executable. They appear only in cases where errors are detected at the preprocessing or layout stage.

So, let's say we have a C++ program "Hello, World!":

#include <iostream>

int main() {
    std::cout << "Hello, World!\n";
}

First, the source code is processed by the preprocessor. The preprocessor finds the directive, looks for the iostream file, and replaces the directive with text from that file, processing all preprocessor directives in the included text along the way.#include <iostream>

The file specified in the directive, in this case, is a header file (or "header", "header", "header"). This is a plain text file containing declarations (type declarations, function prototypes, templates, preprocessor directives, etc.). After the textual inclusion of the header file in the text of the program (or module), it becomes possible to use in the text of the program all that is described in this header file.#include

The result of the preprocessor is then passed to the compiler. The compiler performs the entire complex of works: from parsing and searching for errors to creating an object file (it is clear that if there are syntax errors, then the object file is not created). An object file typically has an external reference table—a table that lists, among other things, the names of subroutines that are used in an object module but whose code is missing from that object module. These subroutines are external to the module.

Source code that can be compiled is called a compilation unit. Our program contains one compilation unit.

To get a normal executable, you need to "allow" external references. That is, add missing subroutines to the executable module and configure all references to this code accordingly. That's what the linker does. It analyzes the table of external references of the object module, looks for missing modules in the object libraries, copies them into the executable module and configures the references. The executable is then ready.

A library (object library) is a set of compiled subroutines assembled into a single file of a certain structure. The connection of the library occurs at the stage of linking the executable file from object files (i.e. from those files that are obtained as a result of compiling the source code of the program).

The necessary object libraries are included with the compiler. The bundle of libraries (any) includes a set of header files that contain the declarations required by the compiler.

If the source code of the program is divided into several files, the compilation and build process is similar. First, all compilation units are compiled individually, and then the linker assembles the resulting object modules (with the connection of libraries) into an executable file. Actually, this process is called separate compilation.

Splitting program text into modules

Splitting the source code of the program into several files becomes necessary for many reasons:

  1. A lot of text is just plain uncomfortable to work with.
  2. The division of the program into separate modules that solve specific subtasks.
  3. Division of the program into separate modules, in order to reuse these modules in other programs.
  4. Separation of interface and implementation.

I intentionally used the word "module" because a module can be either a class or a set of functions— a matter of programming technology used.

As soon as we decide to split the source code of the program into several files, two problems arise:

  1. It is necessary to switch from a simple compilation of the program to a separate one. To do this, you need to make the appropriate changes either to the sequence of actions when you build the application manually, or make changes to the command or make files that automate the build process, or make changes to the IDE project.
  2. It is necessary to decide how to split the program text into separate files.

The first problem is purely technical. It is solved by reading the compiler and/or linker, make or IDE manuals. In the worst case, you just have to study all these manuals. Therefore, we will not dwell on solving this problem.

The second problem requires a much more creative approach. Although here there are certain recommendations, non-compliance with which leads either to the inability to assemble the project, or to difficulties in the further development of the project.

First, you need to determine which parts of the program to allocate into separate modules. To make it simple and natural, the program must be properly designed. How to design the program correctly? - Many large and correct books have been written on this topic. Be sure to look for and read books on programming methodology – this is very useful. And as a brief recommendation, we can say: the entire program should consist of loosely coupled fragments. Each such fragment can then be naturally transformed into a separate module (compilation unit). Note that a "snippet" does not just mean an arbitrary piece of code, but a function, or a group of logically related functions, or a class, or several closely interacting classes.

Second, you need to define interfaces for the modules. There are quite clear rules here.

Interface and implementation

When a part of a program is allocated to a module (a compilation unit), the rest of the program (and to be precise, the compiler that will handle the rest of the program) needs to somehow explain what is in that module. Header files are used for this purpose.

Thus, the module consists of two files: the header (interface) and the implementation file.

A header file typically has an .h or .hpp extension, and an implementation file is .cpp for C++ and .c programs, for C programs. (Although STL includes files with no extensions at all, they are essentially header files.)

The header file must contain all the ads that need to be visible from the outside. Declarations that should not be visible from the outside are made in the implementation file.

What can be in the header file

Rule 1. The header file can only contain advertisements. The header file must not contain definitions.

That is, when processing the contents of a header file, the compiler should not generate information for the object module.

The only "exception" to this rule is to define a method in a class declaration. But according to the language standard, if a method is defined in a class declaration, then an inline substitution is used for this method. Therefore, such a declaration does not generate executable code — the code will be generated by the compiler only when this method is called.

The situation is similar with declaring class member variables: code will be generated when an instance of this class is created.

Rule 2. The header file must have a mechanism to protect against re-inclusion.

Protection against re-inclusion is implemented by preprocessor directives:

#ifndef SYMBOL
#define SYMBOL

// ad set

#endif

For the preprocessor, when the header file is first included, it looks like this: since the "character is not defined" () condition is true, define the character () and process all lines before the directive . When re-including , so: since the condition " symbol is not defined " () is false (the symbol was defined when it was first included), then skip everything to the directive .SYMBOL#ifndef SYMBOLSYMBOL#define SYMBOL#endifSYMBOL#ifndef SYMBOL#endif

As a rule, the name of the header file itself is usually used in uppercase, framed by single or double stresses. For example, the header.h file is traditionally used by . However, the symbol can be any, but necessarily unique within the project.SYMBOL#define __HEADER_H__

Alternatively, the . However, the advantage of the first method is that it works on any compilers.#pragma once

The header file itself is not a compilation unit.

What can be in the implementation file

An implementation file can contain both definitions and declarations. Declarations made in an implementation file will be lexically local to that file. That is, only this compilation unit will be valid.

Rule 3. The implementation file must have a directive to include the appropriate header file.

It is clear that ads that are visible from the outside of the module should also be available inside.

The rule also ensures that there is a correspondence between description and implementation. If there is a mismatch, for example, the function signatures in the declaration and definition, the compiler will throw an error.

Rule 4. The implementation file should not contain declarations that duplicate the declarations in the corresponding header file.

If You follow Rule 3, violating Rule 4 will result in compilation errors.

Case Study

Let's say we have the following program:

main.cpp

#include <iostream>

using namespace std;

const int cint = 10;        // 

int global_var = 0;         // 

int module_var = 0;         // 

int func1() {
    ++global_var;
    return ++module_var;
}

int func2() {
    ++global_var;
    return --module_var;
}

class CClass {
public:
    CClass() : priv(cint) { ++counter; }
    ~CClass() { --counter; }
    void change(int arg);
    int get_priv() const;
    int get_counter() const;
private:
    int priv;
    static int counter;
};

int CClass::counter = 0;

void CClass::change(int arg) {
    priv += arg;
}

int CClass::get_priv() const {
    return priv;
}

int CClass::get_counter() const {
    return counter;
}

int main()
{
    int balance;
    balance = func1();
    balance = func2();
    cout << "balance: " << balance << " counter: " << global_var << endl;

    CClass c1, c2;
    if (c1.get_priv() == cint)
        cout << "Ok" << endl;
    cout << c2.get_counter() << endl;
    return 0;
}

This program is not a role model, because some points are ideologically wrong, but, firstly, situations are different, and secondly, this program is very well suited for demonstration.

So, what do we have?

  1. a global constant that is used in both the class and the cintmain;
  2. a global variable that is used in the functions , and global_varfunc1func2main;
  3. a global variable that is used only in functions and module_varfunc1func2;
  4. functions and func1func2;
  5. class CClass;
  6. function.main

Three compilation units seem to emerge: (1) function, (2) class, and (3) functions, and with a global variable that is used only in them.mainCClassfunc1func2module_var

It's not entirely clear what to do with the global constant and the global variable. The first gravitates toward class, the second gravitates toward functions and . However, suppose that you plan to use both this constant and this variable in some other, not yet written, modules of the program. Therefore, another compilation unit will be added.cintglobal_varCClassfunc1func2

Now let's try to divide the program into modules.

First, as the most related entities (used in many places in the program), we move the global constant and the global variable into a separate compilation unit.cintglobal_var

globals.h

#ifndef __GLOBALS_H__
#define __GLOBALS_H__

const int cint = 10;            // 
extern int global_var;          // 

#endif // __GLOBALS_H__

globals.cpp

#include "globals.h"

int global_var = 0;         // 

Note that the global variable in the header file has a specifier. This results in declaring a variable, not defining it. Such a description means that somewhere there is a variable with the same name and specified type. And the definition of this variable (with initialization) is placed in the implementation file. The constant is described in the header file.extern

There is one subtlety with declaring constants in the header file. If the constant is of the trivial type, then it can be declared in the header file. Otherwise, it must be defined in the implementation file, and its declaration must be in the header file (similar to that of a variable). The "triviality" of the type depends on the standard (see the description of the standard used to write the program).

Also note (1) the protection against re-including the header file and (2) the inclusion of the header file in the implementation file.

Then we put the function in a separate module and with the global variable . We get two more files:

func1func2module_var

funcs.h

#ifndef __FUNCS_H__
#define __FUNCS_H__

int func1();
int func2();

#endif // __FUNCS_H__

funcs.cpp

#include "funcs.h"
#include "globals.h"

int module_var = 0;         // 

int func1() {
    ++global_var;
    return ++module_var;
}

int func2() {
    ++global_var;
    return --module_var;
}

Because the variable is used only by these two functions, there is no declaration in the header file. Only two functions are exported from this module.module_var

Functions use a variable from another module, so you must add .#include "globals.h"

Finally, we put the class in a separate module:CClass

CClass.h

#ifndef __CCLASS_H__
#define __CCLASS_H__

class CClass {
public:
    CClass();
    ~CClass();
    void change(int arg);
    int get_priv() const;
    int get_counter() const;
private:
    int priv;
    static int counter;
};

#endif // __CCLASS_H__

CClass.cpp

#include "CClass.h"
#include "globals.h"

int CClass::counter = 0;

CClass::CClass() : priv(cint) {
    ++counter;
}

CClass::~CClass() {
    --counter;
}

void CClass::change(int arg) {
    priv += arg;
}

int CClass::get_priv() const {
    return priv;
}

int CClass::get_counter() const {
    return counter;
}

Note the following:

(1) The definitions of the bodies of functions (methods) have been removed from the class declaration. This is done for ideological reasons: the interface and implementation must be separated (to be able to change the implementation without changing the interface). If later there is a need to make some methods inline, this can always be done with the help of a specifier.

(2) A class has a static member of the class. That is, for all instances of the class, this variable will be shared. Its initialization is performed not in the constructor, but in the global scope of the module.

(3) A directive has been added to the implementation file to access the constant .#include "globals.h"cint

Classes are almost always separated into separate compilation units.

In the main.cpp file, leave only the . And add the necessary directives to include header files.main

main.cpp

#include <iostream>
#include "funcs.h"
#include "CClass.h"
#include "globals.h"

using namespace std;

int main()
{
    int balance;
    balance = func1();
    balance = func2();
    cout << "balance: " << balance << " counter: " << global_var << endl;

    CClass c1, c2;
    if (c1.get_priv() == cint)
        cout << "Ok" << endl;
    cout << c2.get_counter() << endl;
    return 0;
}

The last step: you need to change the "project" of building the program so that it reflects the changed structure of the source code files. The details of this step depend on the program building technology used and the software used. But in any case, four compilation units (four .cpp files) must first be compiled, and then the resulting object files must be processed by the linker to obtain an executable file.

Common mistakes

Error 1. Definition in the header file.

This error in some cases may not manifest itself. For example, when the header file with this error is included only once. But as soon as this header file is included more than once, we get either a compilation error "multiple character definition ...", or a linker error of similar content, if the second inclusion was made in another compilation unit.

Mistake 2. No protection against re-inclusion of the header file.

It also manifests itself under certain circumstances. May cause a compilation error "Multiple character definition...".

Error 3. A mismatch between the declaration in the header file and the definition in the implementation file.

It usually occurs during the process of editing the source code, when changes are made to the implementation file, and the header file is forgotten.

Error 4. Lack of a necessary directive.

#include

If the required header file is not included, then all entities that are declared in it will remain unknown to the compiler. Causes a compilation error "no character defined...".

Error 5. Absence of the required module in the project of building the program.

Causes a "no character defined..." layout error. Note that the character name in the linker message is almost always different from the one defined in the program: it is supplemented with other letters, numbers, or characters.

Error 6. Dependency on the order in which header files are included.

Not exactly a mistake, but such situations should be avoided. Usually signals either errors in the design of the program, or errors in the division of source code into modules.

Conclusion

 

Within the framework of a small article, it is impossible to consider all the cases that arise during separate compilation. There are situations when dividing a program or a large module into smaller ones seems impossible. This usually happens when the program is poorly designed (in this case, parts of the code have strong mutual connections). Of course, you can make additional efforts and still divide the code into modules (or leave it as is), but this brain energy is better spent more efficiently: to change the structure of the program. This will pay much greater dividends in the future than just a military solution.