Advanced Nutanix: Extent Cache
In the post I’ll explain the Nutanix Extent Cache and how I use and tweak this in my environment.
What is the Nutanix Extent Cache?
The Extent Cache is a key piece of NDFS and is responsible as acting as an in-memory read cache for frequently accessed data. The size of the extent cache is directly dependent on the amount of memory the CVM is assigned (default is 12GB) and will dynamically adjust based upon memory resources and availability. Increasing the size of the CVM memory will allow you to specify a larger extent cache and have more reads from memory (“cache hits”).
The figure below shows how the Extent Cache relates to the IO path for NDFS:
Extent Cache Gflags
Gflags are an advanced configuration parameter which allows us to modify configuration values on the Nutanix platform. DISCLAIMER: Any modification of Gflags should only be performed by a Nutanix SE or a Nutanix Systems Reliability Engineer and not on any production systems!
- Explanation: This Gflag specifies the size of the in-memory extent cache
- Default: -1 (dyanmic)
What Cache Size is Right for me?
The key thing to look for when sizing the extent cache is the ‘Extent cache hit’ rate or % of read operations which are hitting the cache and not going to the Extent Store. In my environment I like to keep this value above 95% for random workloads meaning that 95% of all read request are being served from memory (obviously the higher the better).
If this value is constantly low (50% is bad for random workloads) and you’re experiencing read latency or a reduction in read IOPS increasing the cache size is a good start to increase performance. You should see read IO performance grow as the % of Extent Cache hits grows.
Here we show the metrics from an example 2009 page showing the hit rate and size:
The above shows that 12 GB of the 16 GB is being used and 100% of read requests are being served from the cache.
How to configure the Extent Cache
The following steps will allow you to update the CVM Memory and modify the Extent Cache size (NOTE: I use a cache size of 16 GB in my examples). NOTE: Don’t attempt this in a production environment as it will require a restart of the Stargate Services and a potential reboot of the CVM!
- Add memory to each Nutanix CVM (NOTE: if Memory Hot Plug isn’t enabled the CVM will need to be powered off)
- Connect to the Aegis Portal (http:// Zookeeper Leader:7777)
- Select ‘GFlags editor’
- Select ‘stargate’ as the component
- Click ‘Create’
- Expand the ‘…/extent_cache.cc’ tree
- Click on the ‘stargate_extent_cache_max_MB’ parameter
- Set the value to 25% of the total CVM memory (eg. I use 64 GB on my CVMs so this would be 16 GB/16384 MB)
- Check the ‘Apply flag changes to all future versions of this binary’
- Click Commit
- Restart Stargate on each CVM
You can also change this by visiting a url for each CVM: (http://<CVM IP>:2010/h/gflags?stargate_extent_cache_max_MB=16384)
Here’s a way to do this all programmatically from the CLI on a CVM:
# Set Extent Cache Gflag
for i in `svmips`; do wget -O - http://$i:2010/h/gflags?stargate_extent_cache_max_MB=16384; done
# Restart Stargate (Needs to be run on each CVM)
genesis stop StargateService
You can validate the change has taken place by viewing the Stargate Gflags page (http://<CVM IP>:2009/h/gflags) and looking at the value for stargate_extent_cache_max_MB. With a 16 GB size the Gflag show show the following: –stargate_extent_cache_max_MB=16384 (default -1)
Stay tuned I’ll have another post shortly and how we’re evolving this into something code-named our Content Cache!