Cache Me if You Can
When many of us started working on computers, memory was very expensive and limited system performance and capacity. But as costs came down, hardware vendors came up with new ideas to effectively implement memory management (including virtual memory). The most recent idea is to offer various levels of memory cache. This is memory located closer to the CPU, designed for frequently-accessed items. Now, cache appears on all computers from really limited PCs all the way to mainframes.
If you’re like me, you remember the challenges of effectively exploiting cache on disks. You had to know what you were doing to get the best results. We can’t do a lot about the CPU cache, but it is helpful to understand how it works.
How Does Memory Cache Work?
Just like disk cache, the design assumes that data being accessed more than once will likely be accessed again. When memory is accessed, memory near that location will be accessed again soon. A major difference is that there are two kinds: instruction cache (to speed application execution) and data cache (the type most resembling disk cache).
Cache is divided into levels:
- L1 (level 1): Small cache very close to the CPU, very fast
- L2 (level 2): Larger cache that is faster than main memory but slower than L1
- Many systems now have L3 and L4 caches like a z13
The memory structure on a z13 looks like this:
This hardware release put L1 and L2 on the CPU chip providing very fast access to data. Levels 3 and 4 are larger and shared among multiple processors. The main memory is 10 TB, if you have a fully loaded z13 with four drawers at 2.5 TB of memory each.
Breaking Down the Cache Levels
L1 and L2 can be shared by multiple systems, a plus when your application may run across CPUs or LPARs. The best use is for frequently read data; but, if the program modifies the data, it has to be written back to the caches above it as well as main memory and possibly also to disk. This means that the copies in L3/L4 have to match the copies in each L1/L2. The more LPARs, the more copies you have to keep in sync.
The system manages the situation where multiple caches may contain the same data, i.e., cache coherence. However, this isn’t free. Just like with the ENQ/DEQ situation with data, having many LPARS all trying to get at the same data can slow things down a lot.
What makes this really interesting to performance people is that we’ve been used to spreading data across many volumes to improve performance. Many experts have suggested spreading work across multiple LPARs to achieve the same benefit. Some great papers have been written suggesting that fewer LPARs can actually be better and managing the various caches is just one of many reasons why this is so.
We’ve also spread applications across LPARs, for a variety of reasons, not the least is that there is a limit to how fast CPUs will get. We have to go wider (more horizontal growth) rather than faster to try to manage increasing workloads. But this presents the same problems (and more) of having too many LPARs.
The Bottom Line
The bottom line is that the job of managing performance is only getting more complicated, requiring greater skill. We’re sharing applications, data and memory across systems. This is beginning to look a lot like the challenge our distributed friends have been experiencing with the growth of server farms.
Short-term, you have to manage smarter; but, you also have to exploit automation technology like Compuware ThruPut Manager wherever you can in order to reduce the workload.
Latest posts by Denise Kalm (see all)
- All Treats, No Tricks – a Better Automated Testing Solution - October 31, 2019
- Be a Data Champion with File-AID and Topaz for Enterprise Data - October 10, 2019
- Driving Change Faster, With a Little Help From our Friends (at Compuware) - September 12, 2019