It was 3:11 AM.
I had two terminals open. Same codebase. Same algorithm. Same input.
Left side: old version Right side: "optimized" version
I hit run.
The right side finished first. Not slightly faster. Not "maybe my CPU is warming up" faster.
Noticeably faster.
And here's the part that messed with my head
I didn't change the algorithm.
Not a single line of logic.
Just the infrastructure around it.
That's when it clicked: Most performance problems aren't in your algorithm they're in your habits.
1. std::ios::sync_with_stdio(false) — Your I/O Is Slower Than You Think
I once had a competitive-style program timing out for no reason.
Turns out… it wasn't the logic.
#include <iostream>
int main() {
std::ios::sync_with_stdio(false);
std::cin.tie(nullptr);
int x;
while (std::cin >> x) {
std::cout << x * 2 << "\n";
}
}What changed:
- Disabled sync with C I/O
- Untied cin from cout
Result: Massive I/O speed boost.
If your program reads a lot… this is non-negotiable.
2. std::move — Stop Copying Without Realizing It
I used to think copies were cheap.
They're not.
#include <vector>
std::vector<int> generate() {
std::vector<int> v = {1,2,3,4};
return std::move(v);
}Why it matters:
- Transfers ownership instead of copying
- Eliminates hidden allocations
Pro tip: If you see unnecessary copies, you're leaving performance on the table.
3. tsl::robin_map — Hash Maps That Respect Your Cache
I swapped one line. That's it.
#include <tsl/robin_map.h>
tsl::robin_map<std::string, int> freq;
freq["cpu"]++;Why it's faster:
- Cache-friendly layout
- Fewer pointer jumps
Regular maps scatter memory. This one stays tight.
4. folly::small_vector — Avoid Heap Allocations Entirely
Most vectors I used were small.
Yet they still hit the heap.
#include <folly/container/small_vector.h>
folly::small_vector<int, 4> v = {1,2,3};What this does:
- Stores small data on the stack
- Avoids heap allocation completely
Reality check: Heap = slow. Stack = fast.
5. boost::container::flat_map — Sorted Vector Disguised as a Map
This one surprised me.
#include <boost/container/flat_map.hpp>
boost::container::flat_map<int, int> m;
m[3] = 30;Why it wins:
- Contiguous memory
- Better cache locality
For small/medium datasets, this beats tree-based maps easily.
6. std::bitset — Faster Than You Think
I replaced boolean arrays with bitsets.
Didn't expect much.
Got a speedup.
#include <bitset>
std::bitset<1000> flags;
flags.set(10);Why:
- Compact memory
- Bit level operations
- Cache efficient
Sometimes performance is about packing data tighter.
7. __builtin_expect — Help the CPU Guess Better
Branch prediction matters more than you think.
if (__builtin_expect(x == 0, 0)) {
// rare case
}What this does:
- Tells compiler which branch is likely
- Reduces misprediction penalty
This is micro optimization but in tight loops, it adds up.
8. tbb::parallel_for — Real Parallelism Without Headaches
I avoided multithreading for too long.
Then I tried this:
#include <tbb/parallel_for.h>
#include <vector>
std::vector<int> v(1000);
tbb::parallel_for(0, 1000, [&](int i) {
v[i] = i * i;
});Why it's powerful:
- Automatic thread scaling
- No manual thread management
Bold take: If you're not using parallelism in 2026, you're underusing your CPU.
9. boost::pool — Memory Allocation Without the Pain
Frequent allocations were killing performance.
Solution:
#include <boost/pool/object_pool.hpp>
boost::object_pool<int> pool;
int* x = pool.construct(10);Why it works:
- Reuses memory blocks
- Avoids fragmentation
Allocators are invisible until they dominate runtime.
10. std::atomic — Lock-Free Where It Matters
Locks are slow. Contention is worse.
#include <atomic>
std::atomic<int> counter = 0;
counter.fetch_add(1, std::memory_order_relaxed);Why this matters:
- No mutex overhead
- Faster concurrency
Use it wisely. Not everywhere. Just where it counts.
11. Precompiled Headers — Compile Time Affects Runtime More Than You Think
This one's controversial.
But hear me out.
// pch.h
#include <vector>
#include <string>
#include <map>Then:
g++ -x c++-header pch.hWhy it matters:
- Faster builds = faster iteration
- Better optimization cycles
Quote: Speed isn't just execution. It's feedback loop.
Final Thought
I used to think performance was about being "smart."
It's not.
It's about being aware.
Aware of:
- what your code allocates
- how your data moves
- what your CPU guesses
Once you see it…
You start writing code that doesn't just work.
It flows.
And that's when your programs stop feeling slow even before you measure them.