Last Updated:

Code optimization through manual timing

Sometimes you need to compare the execution speed of several algorithms. Code profilers are often used for this purpose. But sometimes it happens that the profiler is not at hand, or working with it is difficult, and you just need to compare the speed of code fragment A with the speed of code fragment B in a small test program.

Program Template

If you find yourself in such a situation, then here is a tip like "hostess note".

#include <iostream>
#include <ctime>


int main() {

    clock_t the_time;
    double elapsed_time;

    the_time = clock();


    elapsed_time = double(clock() - the_time) / CLOCKS_PER_SEC;
    std::cout << "Elapsed time " << elapsed_time << " sec." << std::endl;

    return 0;
}

This "fish" allows you to find out the execution time of the code fragment.

Including a header is necessary to display messages in the standard output (i.e. on the screen).<iostream>

Enabling the header allows you to use the function and related types and constants. Which is what we actually need for timing.<ctime>clock()

The only function we'll use for timing has the following signature:

clock_t clock( void );

The function returns the time elapsed since the process started (that is, actually since the application was launched) in the ticks of the timer. Return type is usually an alias of the type (defined in ). If the function cannot get the system time from the start of the process, the function returns a value of -1, which is converted to . (I've never encountered such an error, so I'll omit the return check to -1.)

clock_tlong<ctime>clock_t

To translate timer ticks into seconds, there is a constant , also defined in .

CLOCKS_PER_SEC<ctime>

The algorithm of work is transparent: before executing the timed code, we detect the time, after the execution of the timed code, we once again detect the time, their difference will give us the desired time for which the code was executed. It remains only to translate it from ticks to seconds.

In the above template of the program I would like to draw attention to the last step. Since it is an integer type and is an integer constant, at least one of the parts of the expression must be converted to a floating-point type to obtain fractional parts of seconds.

clock_tCLOCKS_PER_SEC

A few tips for performing timing

Between the first and second calls should not be:

clock()

1) I/O operations. Unless, of course, you want to test the performance of your hard drive or console output subsystem.

2) Operations requesting data from the user (for example, from the keyboard).

It may turn out that the processor used in your computer is too fast to time a small piece of code. In this case, the difference in values between calls will be 0.

clock()

The solution, as usual, is simple: enclose the timed fragment in a loop and spin it there 1'000 or 1'000'000 times. In this case, an additional restriction is imposed:

3) The timed code should have no side effects. For example, changing the values of variables external to this code snippet, or taking up memory from a heap.

If there is a side effect, then before the next iteration it must be compensated in some way. This, of course, affects the accuracy of the timing, but usually only slightly. And if the two algorithms being compared have the same side effects, this error can be neglected altogether.

Example

The example below compares two algorithms that perform the same task: generating squares of numbers from 0 to 999 and looking for a number among these values. In the first case, a container is used to store values, in the second case, a container. (The example is purely demonstrative. It is clear that in a real program no one will write such nonsense!)

vectormap

Please note that

  • The timed code is too simple for my CPU, so it uses a loop.

  • timed code has a side effect that is compensated by calling a method on the container.

clear()


For the result of calling the second function, a variable is organized, which, after timing, is displayed on the screen. Made specifically so that the smart optimizing compiler, having detected that the result of the function call is not used, did not optimize the call of the function to hell.

#include <iostream>
#include <ctime>


#include <vector>
#include <map>

typedef std::vector<long> vecl;
typedef std::map<int, long> mapil;

const int MAX_ELEM = 1000;       
const long PATTERN = 999 * 999; 
const int MAX_TIMES = 1000;      

void generator_vector(vecl& vec) {
    for (int i = 0; i < MAX_ELEM; i++) {
        vec.push_back(i * i);
    }
}

bool find_vector(const vecl& vec, long num) {
    for (auto it = vec.begin(); it != vec.end(); it++) {
        if (*it == num)
            return true;
    }
    return false;
}

void generator_map(mapil& mp) {
    for (int i = 0; i < MAX_ELEM; i++) {
        mp[i] = i * i;
    }
}

bool find_map(const mapil& mp, long num) {
    for (auto it = mp.begin(); it != mp.end(); it++) {
        if (it->second == num)
            return true;
    }
    return false;
}

int main() {

    vecl v;
    mapil m;

    clock_t the_time;
    bool res;
    double elapsed_time;

    std::cout << "Vector version" << std::endl;

    the_time = clock();
    for (int t = 0; t < MAX_TIMES; t++) {
        v.clear();
        generator_vector(v);
        res = find_vector(v, PATTERN);
    }

    elapsed_time = double(clock() - the_time) / CLOCKS_PER_SEC;
    std::cout << "Elapsed time " << elapsed_time << " sec." << std::endl;
    std::cout << "result =  " << res << "\n" << std::endl;


    std::cout << "Map version" << std::endl;

    the_time = clock();

    for (int t = 0; t < MAX_TIMES; t++) {
        m.clear();
        generator_map(m);
        res = find_map(m, PATTERN);
    }

    elapsed_time = double(clock() - the_time) / CLOCKS_PER_SEC;
    std::cout << "Elapsed time " << elapsed_time << " sec." << std::endl;
    std::cout << "result =  " << res << "\n" << std::endl;


    return 0;
}

Conclusion

The above method of testing the execution speed of a code snippet is suitable only in the simplest cases. For more advanced cases, you must use the profiler.

I hope that someone will find the article useful.