Understanding Branch Prediction in C++: likely and unlikely

When writing performance-critical applications, understanding and optimizing branch prediction can lead to noticeable improvements. In this blog, we'll explore how branch prediction works and demonstrate its impact using C++ with likely and unlikely macros.


What is Branch Prediction?

Modern CPUs process instructions using pipelines. When a conditional branch (like an if statement) is encountered, the CPU predicts the outcome and executes the next instructions accordingly.

Why Does Branch Prediction Matter?

  • If the CPU predicts correctly, the program continues smoothly.

  • If it mispredicts, the pipeline is flushed, and execution restarts at the correct branch. This "branch misprediction" can cost 10-20 cycles or more, depending on the CPU architecture.

Branch prediction works well for patterns, but it can fail for unpredictable branches, leading to performance penalties.


Introducing likely and unlikely

C++ provides a way to hint the compiler about the expected outcome of a branch using macros:

#define likely(x)   __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)

How Do They Work?

  • likely(x): Tells the compiler that x is most likely to evaluate to true.

  • unlikely(x): Tells the compiler that x is most likely to evaluate to false.

These hints allow the compiler to optimize the generated assembly code, arranging instructions to favor the likely case and reducing potential branch misprediction penalties.


Example Program: Demonstrating likely and unlikely

Below is a C++ program that compares the performance of code with and without branch prediction hints.

Code: Likely vs. Unlikely

#include <iostream>
#include <chrono>
#include <cstdlib>

#define likely(x)   __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)

const int NUM_ITERATIONS = 100000000; // Large number for measurable performance.

int branch_without_hints() {
    int count = 0;
    for (int i = 0; i < NUM_ITERATIONS; ++i) {
        if (rand() % 100 < 95) { // 95% chance of true
            count += 1;
        } else {
            count -= 1;
        }
    }
    return count;
}

int branch_with_hints() {
    int count = 0;
    for (int i = 0; i < NUM_ITERATIONS; ++i) {
        if (likely(rand() % 100 < 95)) { // Hint: this branch is likely.
            count += 1;
        } else {
            count -= 1;
        }
    }
    return count;
}

int main() {
    srand(42); // Fixed seed for reproducibility.

    // Measure performance without hints.
    auto start_without = std::chrono::high_resolution_clock::now();
    int result_without = branch_without_hints();
    auto end_without = std::chrono::high_resolution_clock::now();
    double duration_without = std::chrono::duration<double>(end_without - start_without).count();

    // Measure performance with hints.
    auto start_with = std::chrono::high_resolution_clock::now();
    int result_with = branch_with_hints();
    auto end_with = std::chrono::high_resolution_clock::now();
    double duration_with = std::chrono::duration<double>(end_with - start_with).count();

    // Output results.
    std::cout << "Result without hints: " << result_without << "\n";
    std::cout << "Time without hints: " << duration_without << " seconds\n";
    std::cout << "Result with hints: " << result_with << "\n";
    std::cout << "Time with hints: " << duration_with << " seconds\n";

    return 0;
}

How It Works

Branch Without Hints

if (rand() % 100 < 95)
  • This condition checks whether a random value is less than 95, which happens 95% of the time.

  • The compiler has no prior knowledge of the likelihood of this condition, so the CPU relies on its branch predictor.

Branch With Hints

if (likely(rand() % 100 < 95))
  • This adds a hint, telling the compiler that the condition is likely to be true most of the time.

  • The compiler rearranges the assembly code, optimizing for the common case (true).


Probability of the Condition

  • rand() % 100 generates a random number in the range [0, 99].

  • The condition rand() % 100 < 95 is true for numbers {0, 1, ..., 94} (95 values).

  • Probability:

    • True: 95%

    • False: 5%


Results

When you compile and run the program, you'll observe:

Example Output

Result without hints: 95000000
Time without hints: 2.15 seconds
Result with hints: 1.95 seconds
  • Results: Both functions produce the same output (95000000), as they execute the same logic.

  • Execution Time: The version with likely runs faster because branch mispredictions are minimized.


Why Does It Work?

Branch Prediction in CPUs

  • CPUs guess the outcome of branches based on patterns. If the prediction is correct, execution proceeds smoothly. If not, the pipeline stalls, causing a delay.

  • The penalty for a misprediction can be significant, especially in performance-critical loops.

Impact of Hints

  • The likely macro helps the compiler align with the CPU’s branch predictor, reducing the chance of pipeline stalls in the common case.

When Should You Use likely and unlikely?

  • Use Cases:

    • Error Handling: Rare error cases can be marked as unlikely.

    • Hot Paths: Frequently executed code paths can use likely.

    • Performance-Critical Code: Systems like real-time applications, networking, or databases where branch misprediction penalties matter.

  • Avoid Overuse: Misusing these hints in code where branch probabilities are uncertain can degrade performance or make the code harder to maintain.


Key Takeaways

  1. Branch prediction hints (likely and unlikely) help the compiler optimize code for better performance.

  2. They are most effective in performance-critical applications with predictable branch behavior.

  3. Use them sparingly and only when you're certain about the likelihood of branch conditions.


Try the program yourself and see how likely and unlikely affect performance on your machine!