1-888-310-4540 (main) / 1-888-707-6150 (support) info@spkaa.com
Select Page

3 Real World vSphere Situations to Avoid

Published by Mike Solinap
on January 8, 2014

Almost a year ago, I wrote about 10 Pitfalls that Can Impact VMware Performance. I thought I’d revisit this topic, provide some specific situations I’ve encountered over the past year, and explain what I’ve learned from them. You may have already taken the advice of my previous article, but as we’re all aware, the real world will typically bring us unexpected issues.

Here are 3 real world vSphere situations that you will want to avoid:

1. Poor NFS performance

You’ve got powerful hosts, a fast network, and a decent number of spindles on your storage array — yet your IO performance is horrible. Are your datastores mounted via NFS? If so, this may likely be the culprit. Don’t get me wrong — after a decade of managing Unix systems as part of our engineering services, I learned that NFS is something that we can’t live without. In terms of vSphere, NFS provides several key benefits when compared to the alternative, block storage. Namely, compared to Fiber Channel; no storage area network investment is needed, LUNs no longer need to be carved, you have better accessibility to individual VMDKs and troubleshooting – port mirror with wireshark! The list goes on.

However, there is 1 big limitation with respect to NFS and vSphere – the NFS implementation in vSphere seems to only support synchronous writes. This is regardless of how your storage array is exporting the NFS volume.

Why is this problematic? It may or may not be, depending what your storage array is. Take for example a Network Appliance. Synchronous writes are NOT a problem, due to the fact that Netapp implements an NVRAM. Writes are considered committed as soon as they are written to NVRAM, as opposed to waiting for your spindles to write the data. A linux or BSD based machine with ZFS and an SSD backed Intent Log is a similar situation. Control is returned to the application before data is committed to the physical disks.

Without the ability to quickly acknowledge the write requests from the vSphere hosts, your overall performance will suffer.

2. Snapshots Are Not Free

The beauty of virtualizing a machine is that we can take snapshots of the running state, and revert back to them if needed. This comes in extremely handy for testing software, configuration changes, or as an easy fallback when migrating to production as part of your build and release management plan.

Unfortunately, this benefit isn’t “free”. When you initiate a snapshot of a virtual machine, vSphere stops writing changes to the original VMDK file. A new file is created and subsequent changes are appended to this file throughout the life of the snapshot. Chances are, we forget that we created the snapshot in the first place, and this can create some obvious, but also not so obvious repercussions.

Obviously, as the the snapshot ages, the delta file will grow. And grow. And grow. Even though you may think your virtual machine is mostly “idle”, things constantly get written to systems logs, periodic tasks run, OS updates get applied, etc. If you have your datastores accessible via NFS (hint hint!), run a find for all delta files and see how much space they’ve accumulated. You may be surprised.

Growing snapshots present another issue. At some point, you will want to commit (delete) them. If your snapshots have grown significantly, committing them will generate a HUGE amount of IO. Delete them and cross your fingers. I’ve encountered a situation where I was deleting a large snapshot. It generated enough IO to cause vSphere to time out the operation. The result – a corrupted VM, a saturated network, and a bogged down storage array.

3. Under Utilized Memory

If you have an abundance of hardware available, and deep pockets, feel free to skip over this section. Otherwise, you’re likely in the same situation as the rest of us — with limited budgets and hardware sorely needing upgrades.

You may be tempted to splurge on some additional RAM since it’s typically the lowest hanging fruit. Or is it? What if your host has no free RAM slots left? With some simple analysis, it is worth investigating whether or not additional RAM is needed at all.

Although your host says its memory utilization is at 80%, it may not necessarily need any more. vSphere employs many techniques to get the most utilization from the host:

  • Memory deduplication: vSphere will look for identical memory pages, and only keep one copy. Since hosts run multiple, similar machines, running similar applications, there’s good potential for memory savings.
  • Ballooning: This concept is similar to the “swappiness” behavior of a Linux machine. The idea is that if pages in memory aren’t being accessed often or at all, page them to physical disk so that memory can be freed up. However, vSphere goes one step further and attempts to control or force this paging to disk by employing a balloon driver. Installed with VMware Tools, an artificial process will start consuming memory within the guest. Then, this forces the guest to decide for itself what best should be paged to disk.
  • Memory Compression: vSphere has the ability to check for the compressibility of a memory page. If it can be compressed greater than 50%, then it will do it. Compressing a memory page still outperforms having to swap to disk.

After looking at performance metrics for each of these techniques, and seeing that the guest is still swapping to disk, then it’s likely that you do in fact need more RAM. However, if new RAM is out of the question for whatever reason, vSphere can take advantage of an SSD and use it as your swap space. With SSD’s dropping in price month after month, this may be your biggest bang for your buck. Additionally, if you are licensed for Enterprise Plus, the SSD can also be used as a read cache, reducing IO on your storage array even further.

Hopefully you’re fortunate enough to avoid situations like these.  Do you have any unique situations you’d like to share, or feedback on the ones I’ve mentioned? I’d like to hear them!

Next Steps:

Michael Solinap
Sr. Systems Integrator
SPK & Associates

Latest White Papers

Atlassian Cloud: Understanding Zero Trust Security

Atlassian Cloud: Understanding Zero Trust Security

Where To Start & Why It Matters What is the Atlassian Cloud Zero Trust Security model? Well, for decades, enterprise security controls were built to protect a large, single perimeter around a corporation. Often described as castle-and-moat security, This approach...

Related Resources

Why Process Automation Is Critical For Engineering

Why Process Automation Is Critical For Engineering

Process automation releases your engineers for the work their brains are intended for. That work is creativity and problem-solving.  By implementing process automation, you improve the team’s morale. Firstly, they get more focus time for deep work and designing better...

CloudBees SDA:  Software Delivery Without Silos

CloudBees SDA:  Software Delivery Without Silos

Tired of missing deadlines, and only measuring performance of departments, but not of your entire software delivery organization? This blog will show you how CloudBees Software Delivery Automation (CloudBees SDA) can break down silos, drive DevOps mentality and...

Deep Work Improves Engineers’ Productivity

Deep Work Improves Engineers’ Productivity

In this blog we'll explore how the principle of Deep Work by Cal Newport can improve your engineers productivity. Does it feel harder for you to focus on your creative, technical work? When I speak to engineers or management staff and ask this question, the answer...