# Lecture 15: Memory hierarchy

Philipp Koehn

October 9, 2023

601.229 Computer Systems Fundamentals



- ▶ We want: lots of memory and access it fast
- ► We really have: different speed/size tradeoffs
- Need methods to give illusion of large and fast memory

- What helps us is locality
- Temporal locality
  - same memory location often referenced repeatedly

▲ロ ▶ ▲周 ▶ ▲ 国 ▶ ▲ 国 ▶ ● の Q @

- example: instructions in loops
- Spatial locality
  - ▶ after an item is referenced
  - example: processing of sequential data

# Example: Violation of Locality

```
Consider this C code
  #define size 32768
   int matrix[size][size];
  int main(void) {
    for(int i = 0; i<size; i++) {</pre>
      for(int j = 0; j<size; j++) {</pre>
         matrix[i][j] = 47;
       }
     }
    return 0;
   }
► How fast does it run?
  $ gcc -Og cache1.c -o cache1
  $ time ./cache1
  real 0m1.710s
  user 0m0.871s
         0m0.839s
  sys
```

▲ロ ▶ ▲周 ▶ ▲ 国 ▶ ▲ 国 ▶ ● の Q @

# Example: Violation of Locality

```
Consider this C code
  #define size 32768
   int matrix[size][size];
   int main(void) {
    for(int i = 0; i<size; i++) {</pre>
      for(int j = 0; j<size; j++) {</pre>
        matrix[i][j] = 47;
     }
    return 0;
   }
► How fast does it run?
  $ gcc -Og cache1.c -o cache1
  $ time ./cache1
  real 0m1.710s
  user 0m0.871s
  sys 0m0.839s
```

```
Minor change
  #define size 32768
  int matrix[size][size];
  int main(void) {
    for(int i = 0; i<size; i++) {</pre>
       for(int j = 0; j<size; j++) {</pre>
         matrix[j][i] = 47;
     }
    return 0;
  }
How fast does it run?
  $ gcc -Og cache2.c -o cache2
  $ time ./cache2
  real 0m24.601s
  user 0m23.756s
         0m0.844s
  sys
                ・ロト ・ 戸 ・ ・ ヨ ・ ・ ヨ ・ ・ つ へ ()
```

TechnologySpeedSRAM on CPUfastestDRAM on motherboard...Flash memory...Magnetic diskslowest

| <b>Speed</b><br>fastest | Capacity<br>smallest | <b>Cost</b><br>highest |
|-------------------------|----------------------|------------------------|
|                         |                      |                        |
|                         |                      |                        |
| slowest                 | biggest              | lowest                 |





Smaller memory mirrors some of the large memory content

◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 - のへで



- Memory request from CPU
- Data found in cache
- Send data to CPU



▲ロ ▶ ▲周 ▶ ▲ 国 ▶ ▲ 国 ▶ ● の Q @

- Memory request from CPU
- Data **not** found in cache
- Memory request from cache to main memory
- Send data from memory to cache
- Store data in cache
- Send data to CPU

- Memory has to be transferred from large memory to be used
- ► Cache: small memory connected to processor
- Block: unit of memory transferred
- ▶ Hit rate: fraction of memory lookups served by data already in cache
- Miss rate: fraction of memory lookups that require memory transfers

▲ロ ▶ ▲周 ▶ ▲ 国 ▶ ▲ 国 ▶ ● の Q @

- Hit time: time to process a cache hit
- Miss penalty: time to process a cache miss

# Memory Hierarchy

- ► More than 2 levels of memory
- Transfer between memory in level *i* and *i*+1 follows same principle, regardless of *i*
- Hierarchy: if item in level i, then it is also in level i+1
- Hence, we restrict our discussion to 2 levels



# Memory technologies

◆□▶ ◆□▶ ◆ □▶ ◆ □▶ ○ □ ○ ○ ○ ○

| Technology          | Access Time            | Price per GB |
|---------------------|------------------------|--------------|
| SRAM semiconductor  | 0.5-2.5ns              | \$300        |
| DRAM semiconductor  | 50-70ns                | \$6          |
| Flash semiconductor | 5,000-50,000ns         | \$0.40       |
| Magnetic disk       | 5,000,000-20,000,000ns | \$0.02       |
| Magnetic tape       | -                      | \$0.008      |

(prices from 2018)

▲□▶ ▲圖▶ ▲≣▶ ▲≣▶ = のへで

- Integrated in CPU, runs at similar clock speeds
- Implemented using flip flops
- Uses more transistors than DRAM



▲□▶ ▲□▶ ▲□▶ ▲□▶ □ のQ@

#### DRAM

- Separate chips on the motherboard
- ▶ In PCs and servers, multiple chips on a module (DIMM)
- ► Implemented using capacitors lose charge → need to be frequently refreshed
- Lose charge when power is turned off



# Flash Memory

► A type of EEPROM

(electrically erasable programmable read-only memory)

- allows read of individual bytes
- writes require erase of a block, rewrite of bytes
- Writes can wear out the memory
- ► Has become standard storage memory for laptops, PCs



### Hard Drives

- Magnetic charge on spinning disk
- Read/write requires read head at the right place
- Sequential data reads are relatively fast
- $\blacktriangleright$  Random access slow  $\rightarrow$  not practical as process memory
- Useful for bulk data storage (especially when using RAID for redundancy)



# Cache basics

◆□ > ◆□ > ◆ 三 > ◆ 三 > ● ○ < ○

- All data is in large main memory
- Data for processing has to moved to cache
- Caching strategies
  - mapping between cache and main memory

▲□▶ ▲□▶ ▲ 三▶ ▲ 三▶ 三 のへぐ

which data to read / keep / write

- Idea: keep mapping from cache to main memory simple
- $\Rightarrow$  Use part of the address as index to cache
- Address broken up into 3 parts
  - memory position in block (offset)
  - index
  - tag to identify position in main memory
- If blocks with same index are used, older one is overwritten

▲□▶ ▲圖▶ ▲匡▶ ▲匡▶ ― 匡 … のへで

Main memory address (32 bit)

0010 0011 1101 1100 0001 0011 1010 1111

Block size: 1KB (10 bits)

Cache size: 1MB (20 bits)

| 0010 0011 1101 | 1100 0001 00 | 11 1010 1111 |
|----------------|--------------|--------------|
| Tag            | Index        | Offset       |

| Cache content | Index | Valid | Tag | Mapped<br>Memory |
|---------------|-------|-------|-----|------------------|
|               | 000   | no    |     |                  |
|               | 001   | no    |     |                  |
|               | 010   | no    |     |                  |
|               | 011   | no    |     |                  |
|               | 100   | no    |     |                  |
|               | 101   | no    |     |                  |
|               | 110   | no    |     |                  |
|               | 111   | no    |     |                  |

| <ul> <li>Cache content</li> </ul> | Index | Valid | Tag | Mapped<br>Memory |
|-----------------------------------|-------|-------|-----|------------------|
|                                   | 000   | no    |     |                  |
|                                   | 001   | no    |     |                  |
|                                   | 010   | no    |     |                  |
|                                   | 011   | no    |     |                  |
|                                   | 100   | no    |     |                  |
|                                   | 101   | yes   | 10  | 10101            |
|                                   | 110   | no    |     |                  |
|                                   | 111   | no    |     |                  |

◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 - のへで

- ► Operation: read 10101
  - cache miss
  - retrieve value from main memory

| <ul> <li>Cache content</li> </ul> | Index | Valid | Tag | Mapped<br>Memory |
|-----------------------------------|-------|-------|-----|------------------|
|                                   | 000   | no    |     |                  |
|                                   | 001   | no    |     |                  |
|                                   | 010   | yes   | 11  | 11010            |
|                                   | 011   | no    |     |                  |
|                                   | 100   | no    |     |                  |
|                                   | 101   | yes   | 10  | 10101            |
|                                   | 110   | no    |     |                  |
|                                   | 111   | no    |     |                  |

◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 - のへで

- ► Operation: read 11010
  - cache miss
  - retrieve value from main memory

| <ul> <li>Cache content</li> </ul> | Index | Valid | Tag | Mapped<br>Memory |
|-----------------------------------|-------|-------|-----|------------------|
|                                   | 000   | no    |     |                  |
|                                   | 001   | no    |     |                  |
|                                   | 010   | yes   | 11  | 11010            |
|                                   | 011   | no    |     |                  |
|                                   | 100   | no    |     |                  |
|                                   | 101   | yes   | 10  | 10101            |
|                                   | 110   | no    |     |                  |
|                                   | 111   | no    |     |                  |

◆□▶ ◆□▶ ◆ □▶ ◆ □▶ ● □ ● ● ●

► Operation: read 10101

cache hit

| <ul> <li>Cache content</li> </ul> | Index | Valid | Tag | Mapped<br>Memory |
|-----------------------------------|-------|-------|-----|------------------|
|                                   | 000   | no    |     |                  |
|                                   | 001   | no    |     |                  |
|                                   | 010   | yes   | 11  | 11010            |
|                                   | 011   | no    |     |                  |
|                                   | 100   | no    |     |                  |
|                                   | 101   | yes   | 10  | 10101            |
|                                   | 110   | no    |     |                  |
|                                   | 111   | no    |     |                  |

▲□▶ ▲□▶ ▲ 三▶ ▲ 三▶ 三三 - のへぐ

► Operation: read 11010

cache hit

| <ul> <li>Cache content</li> </ul> | Index | Valid | Tag | Mapped<br>Memory |
|-----------------------------------|-------|-------|-----|------------------|
|                                   | 000   | yes   | 10  | 10000            |
|                                   | 001   | no    |     |                  |
|                                   | 010   | yes   | 11  | 11010            |
|                                   | 011   | no    |     |                  |
|                                   | 100   | no    |     |                  |
|                                   | 101   | yes   | 10  | 10101            |
|                                   | 110   | no    |     |                  |
|                                   | 111   | no    |     |                  |

► Operation: read 10000

cache miss

retrieve value from main memory

| <ul> <li>Cache content</li> </ul> | Index | Valid | Tag | Mapped<br>Memory |
|-----------------------------------|-------|-------|-----|------------------|
|                                   | 000   | yes   | 10  | 10000            |
|                                   | 001   | no    |     |                  |
|                                   | 010   | yes   | 11  | 11010            |
|                                   | 011   | yes   | 00  | 00011            |
|                                   | 100   | no    |     |                  |
|                                   | 101   | yes   | 10  | 10101            |
|                                   | 110   | no    |     |                  |
|                                   | 111   | no    |     |                  |

◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 - のへで

► Operation: read 00011

cache miss

retrieve value from main memory

| Cache content | Index | Valid | Tag | Mapped<br>Memory |
|---------------|-------|-------|-----|------------------|
|               | 000   | yes   | 10  | 10000            |
|               | 001   | no    |     |                  |
|               | 010   | yes   | 11  | 11010            |
|               | 011   | yes   | 00  | 00011            |
|               | 100   | no    |     |                  |
|               | 101   | yes   | 10  | 10101            |
|               | 110   | no    |     |                  |
|               | 111   | no    |     |                  |

► Operation: read 10000

cache hit

| <ul> <li>Cache content</li> </ul> | Index | Valid | Tag | Mapped<br>Memory |
|-----------------------------------|-------|-------|-----|------------------|
|                                   | 000   | yes   | 10  | 10000            |
|                                   | 001   | no    |     |                  |
|                                   | 010   | yes   | 10  | 10010            |
|                                   | 011   | yes   | 00  | 00011            |
|                                   | 100   | no    |     |                  |
|                                   | 101   | yes   | 10  | 10101            |
|                                   | 110   | no    |     |                  |
|                                   | 111   | no    |     |                  |

- ► Operation: read 10010
  - cache miss
  - retrieve value from main memory
  - overwrite existing cache value

Clicker quiz omitted from public slides

Clicker quiz omitted from public slides

#### Larger block size

- fewer cache misses due to spatial locality
- Ionger transfer times of block
- $\blacktriangleright$  fewer blocks in cache  $\rightarrow$  more competition for cache

▲ロ ▶ ▲周 ▶ ▲ 国 ▶ ▲ 国 ▶ ● の Q @

#### In practice

- optimal value somewhere in the middle
- depends on running process