Advanced Nutanix: Extent Cache

advNutanix_640

In the post I’ll explain the Nutanix Extent Cache and how I use and tweak this in my environment.

What is the Nutanix Extent Cache?

The Extent Cache is a key piece of NDFS and is responsible as acting as an in-memory read cache for frequently accessed data.  The size of the extent cache is directly dependent on the amount of memory the CVM is assigned (default is 12GB) and will dynamically adjust based upon memory resources and availability.  Increasing the size of the CVM memory will allow you to specify a larger extent cache and have more reads from memory (“cache hits”).

The figure below shows how the Extent Cache relates to the IO path for NDFS:

Extent Cache

Extent Cache Gflags

Gflags are an advanced configuration parameter which allows us to modify configuration values on the Nutanix platform.  DISCLAIMER: Any modification of Gflags should only be performed by a Nutanix SE or a Nutanix Systems Reliability Engineer and not on any production systems!

stargate_extent_cache_max_MB
  • Explanation: This Gflag specifies the size of the in-memory extent cache
  • Default: -1 (dyanmic)

What Cache Size is Right for me?

The key thing to look for when sizing the extent cache is the ‘Extent cache hit’ rate or % of read operations which are hitting the cache and not going to the Extent Store.  In my environment I like to keep this value above 95% for random workloads meaning that 95% of all read request are being served from memory (obviously the higher the better).

If this value is constantly low (50% is bad for random workloads) and you’re experiencing read latency or a reduction in read IOPS increasing the cache size is a good start to increase performance.  You should see read IO performance grow as the % of Extent Cache hits grows.

Here we show the metrics from an example 2009 page showing the hit rate and size:

Extent Cache Hit

The above shows that 12 GB of the 16 GB is being used and 100% of read requests are being served from the cache.

How to configure the Extent Cache

The following steps will allow you to update the CVM Memory and modify the Extent Cache size (NOTE: I use a cache size of 16 GB in my examples).  NOTE: Don’t attempt this in a production environment as it will require a restart of the Stargate Services and a potential reboot of the CVM!

  1. Add memory to each Nutanix CVM (NOTE: if Memory Hot Plug isn’t enabled the CVM will need to be powered off)
  2. Connect to the Aegis Portal (http:// Zookeeper Leader:7777)
  3. Select ‘GFlags editor’
  4. Select ‘stargate’ as the component
  5. Click ‘Create’
  6. Expand the ‘…/extent_cache.cc’ tree
  7. Click on the ‘stargate_extent_cache_max_MB’ parameter
  8. Set the value to 25% of the total CVM memory (eg. I use 64 GB on my CVMs so this would be 16 GB/16384 MB)
  9. Check the ‘Apply flag changes to all future versions of this binary’
  10. Click Commit
  11. Restart Stargate on each CVM

You can also change this by visiting a url for each CVM: (http://<CVM IP>:2010/h/gflags?stargate_extent_cache_max_MB=16384)

Here’s a way to do this all programmatically from the CLI on a CVM:

You can validate the change has taken place by viewing the Stargate Gflags page (http://<CVM IP>:2009/h/gflags) and looking at the value for stargate_extent_cache_max_MB.  With a 16 GB size the Gflag show show the following: –stargate_extent_cache_max_MB=16384 (default -1)

Stay tuned I’ll have another post shortly and how we’re evolving this into something code-named our Content Cache!

Enjoy!

  • Doug Youd

    Very cool. How does one become a Nutanix Certified Professional?

  • James Knapp

    you use 64GB for CVM? that sounds like an awful lot – i.e. a quarter or half of host memory! sounds expensive in terms of resources.

    • stevenpoitras

      Just for benchmarking to test the scalability of the cache sizes, not for normal use :) I’ll normally fluctuate the amount of memory given to the CVM based upon the use-case/workload.

      For example, with VDI I’ll put the CVM memory to ~16-24 GB depending on the image size since the number of VMs requires more memory for density. However for Splunk I’ll increase it given the fact that Splunk VMs aren’t too wide.

      That’s the nice part about having the CVM run as a VM, its resources can all be dynamic.

      • James Knapp

        another powerful feature of the SDS capability I had not yet considered – excellent. thanks

Legal Mumbo Jumbo

Copyright © Steven Poitras, The Nutanix Bible and StevenPoitras.com, 2014. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Steven Poitras and StevenPoitras.com with appropriate and specific direction to the original content.