It was 3:11 AM.

I had two terminals open. Same codebase. Same algorithm. Same input.

Left side: old version Right side: "optimized" version

I hit run.

The right side finished first. Not slightly faster. Not "maybe my CPU is warming up" faster.

Noticeably faster.

And here's the part that messed with my head

I didn't change the algorithm.

Not a single line of logic.

Just the infrastructure around it.

That's when it clicked: Most performance problems aren't in your algorithm they're in your habits.

1. std::ios::sync_with_stdio(false) — Your I/O Is Slower Than You Think

I once had a competitive-style program timing out for no reason.

Turns out… it wasn't the logic.

#include <iostream>

int main() {
    std::ios::sync_with_stdio(false);
    std::cin.tie(nullptr);

    int x;
    while (std::cin >> x) {
        std::cout << x * 2 << "\n";
    }
}

What changed:

  • Disabled sync with C I/O
  • Untied cin from cout

Result: Massive I/O speed boost.

If your program reads a lot… this is non-negotiable.

2. std::move — Stop Copying Without Realizing It

I used to think copies were cheap.

They're not.

#include <vector>

std::vector<int> generate() {
    std::vector<int> v = {1,2,3,4};
    return std::move(v);
}

Why it matters:

  • Transfers ownership instead of copying
  • Eliminates hidden allocations

Pro tip: If you see unnecessary copies, you're leaving performance on the table.

3. tsl::robin_map — Hash Maps That Respect Your Cache

I swapped one line. That's it.

#include <tsl/robin_map.h>

tsl::robin_map<std::string, int> freq;
freq["cpu"]++;

Why it's faster:

  • Cache-friendly layout
  • Fewer pointer jumps

Regular maps scatter memory. This one stays tight.

4. folly::small_vector — Avoid Heap Allocations Entirely

Most vectors I used were small.

Yet they still hit the heap.

#include <folly/container/small_vector.h>

folly::small_vector<int, 4> v = {1,2,3};

What this does:

  • Stores small data on the stack
  • Avoids heap allocation completely

Reality check: Heap = slow. Stack = fast.

5. boost::container::flat_map — Sorted Vector Disguised as a Map

This one surprised me.

#include <boost/container/flat_map.hpp>

boost::container::flat_map<int, int> m;
m[3] = 30;

Why it wins:

  • Contiguous memory
  • Better cache locality

For small/medium datasets, this beats tree-based maps easily.

6. std::bitset — Faster Than You Think

I replaced boolean arrays with bitsets.

Didn't expect much.

Got a speedup.

#include <bitset>

std::bitset<1000> flags;
flags.set(10);

Why:

  • Compact memory
  • Bit level operations
  • Cache efficient

Sometimes performance is about packing data tighter.

7. __builtin_expect — Help the CPU Guess Better

Branch prediction matters more than you think.

if (__builtin_expect(x == 0, 0)) {
    // rare case
}

What this does:

  • Tells compiler which branch is likely
  • Reduces misprediction penalty

This is micro optimization but in tight loops, it adds up.

8. tbb::parallel_for — Real Parallelism Without Headaches

I avoided multithreading for too long.

Then I tried this:

#include <tbb/parallel_for.h>
#include <vector>

std::vector<int> v(1000);

tbb::parallel_for(0, 1000, [&](int i) {
    v[i] = i * i;
});

Why it's powerful:

  • Automatic thread scaling
  • No manual thread management

Bold take: If you're not using parallelism in 2026, you're underusing your CPU.

9. boost::pool — Memory Allocation Without the Pain

Frequent allocations were killing performance.

Solution:

#include <boost/pool/object_pool.hpp>

boost::object_pool<int> pool;

int* x = pool.construct(10);

Why it works:

  • Reuses memory blocks
  • Avoids fragmentation

Allocators are invisible until they dominate runtime.

10. std::atomic — Lock-Free Where It Matters

Locks are slow. Contention is worse.

#include <atomic>

std::atomic<int> counter = 0;

counter.fetch_add(1, std::memory_order_relaxed);

Why this matters:

  • No mutex overhead
  • Faster concurrency

Use it wisely. Not everywhere. Just where it counts.

11. Precompiled Headers — Compile Time Affects Runtime More Than You Think

This one's controversial.

But hear me out.

// pch.h
#include <vector>
#include <string>
#include <map>

Then:

g++ -x c++-header pch.h

Why it matters:

  • Faster builds = faster iteration
  • Better optimization cycles

Quote: Speed isn't just execution. It's feedback loop.

Final Thought

I used to think performance was about being "smart."

It's not.

It's about being aware.

Aware of:

  • what your code allocates
  • how your data moves
  • what your CPU guesses

Once you see it…

You start writing code that doesn't just work.

It flows.

And that's when your programs stop feeling slow even before you measure them.