Understanding Memory Latency.

1 min read

The Memory Wall

Processor speeds have increased exponentially over the last few decades, but memory speeds have not kept pace. This discrepancy is often referred to as the “Memory Wall”.

When a CPU needs data, it first checks its caches (L1, L2, L3). If the data isn’t there, it has to fetch it from main memory (RAM). This fetch is orders of magnitude slower than a register access.

L1 vs L2 vs RAM

  1. L1 Cache: ~0.5 ns
  2. L2 Cache: ~7 ns
  3. Main Memory: ~100 ns

As you can see, missing the cache is expensive.

Data Locality

To mitigate this, we need to write “cache-friendly” code. This means organizing data so that it is accessed sequentially (spatial locality) and reusing data once it’s loaded (temporal locality).

1
2
3
4
5
6
7
8
9
// Bad: Stride access
for (int i = 0; i < 10000; i += 16) {
    data[i] = i;
}

// Good: Sequential access
for (int i = 0; i < 10000; i++) {
    data[i] = i;
}

By understanding the hardware, we can write software that runs significantly faster.