virtualization sprawl (VM sprawl)

Robert Sheldon

What is VM sprawl?

Virtualization sprawl is a phenomenon that occurs when the number of virtual machines (VMs) on a network reaches a point where administrators can no longer manage them effectively. Virtualization sprawl is also referred to as virtual machine sprawl, VM sprawl or virtual server sprawl.

VM sprawl has become a common challenge for many organizations, and the more they rely on virtualization, the more likely they are to encounter this problem. Because sprawl can occur gradually, IT teams might not be aware of it at first. By the time they do realize it, the problem is often quite serious, offsetting many of the benefits that come with virtualization. Even when VM admins are aware of the issue, they can still have a difficult time identifying and removing the unwanted VMs.

Virtualization sprawl can result in many unused VMs spread across the network, several of which are ignored or forgotten. VMs might still run in the background and waste resources, but they serve no function. Even if they've been shut down, they still take up valuable disk space and pose a potential security risk. Several factors can contribute to virtualization sprawl:

Creating VMs is much easier and faster than standing up physical servers, and it can often be done without having to go through any approval or justification processes.
Because VMs can be created with such ease, VM owners frequently lose track of them and forget that they're out there.
VM owners often hang onto their VMs in case they're needed for future projects, even if they haven't been used for a long time.
Because VMs are created with software rather than hardware, many users think of them as being free, not considering OS (operating system) licensing fees or resource usage.
Many organizations don't have governance policies or standardized processes in place to control VM creation and other operations.

Because of these factors, VMs are being created faster than they can be removed, leading to virtualization sprawl and the serious consequences that come with it.

This article is part of

What is server virtualization? The ultimate guide

Why is VM sprawl an issue?

Virtualization sprawl can undermine many of the benefits, such as increased security, better resource utilization, easier management and lower costs, that come with virtualization. In fact, VM sprawl raises several serious concerns:

Security and compliance. A virtual server can run for years, even if it was used for only a few days or weeks. Unused VMs might not get patched or receive proper maintenance and they tend to be easily forgotten. Some VMs might also contain sensitive corporate data or private information. Even if a VM has been shut down, its files still exist. A forgotten VM in any form can potentially lead to increased security and compliance risks. If one of these VMs is compromised, the organization might not know it's happened until long after it's too late.
Management. Virtual server sprawl can add significant management overhead. Even if the VMs aren't running, admins must still manage the storage they use, and if the VMs are running, admins must balance their allocated physical resources with active VMs. In some cases, VM admins continue to update and patch unused VMs as part of their routine maintenance plans, incurring extra management overhead. Sprawl can also affect data protection efforts, such as complicating disaster recovery strategies or increasing the number of backups that must be maintained. In addition, VM sprawl can make it more difficult to forecast resource usage because of the uncertainties that come with all the unused VMs.
Performance. A running VM, even if it sits idle, has resources allocated to it. If a server hosts many of these VMs, the operational VMs might experience resource availability issues, slowing them down and affecting application performance. Even if a VM is turned off, it still uses disk space, potentially affecting the performance of the operational VMs. If unused VMs are spread across multiple hosts and still running, they might continue to use network bandwidth for routine maintenance tasks, which can affect both virtualized and bare-metal applications.
Costs. Unused VMs take up disk space, whether they're running or not. Combined, these VMs can take up a significant amount of space, resulting in the need to purchase additional storage space. The effect on compute and network resources can also translate to increased costs if IT must beef up infrastructure to support the operational VMs. Even an idle VM can require CPU (central processing unit) time and use network bandwidth. In addition, VM sprawl increases management overhead, resulting in even more costs. An organization might also be paying out a substantial amount in licensing fees for its unused VMs.

Clearly, organizations that rely on virtualization must take sprawl seriously or they could face serious consequences. Each unused VM wastes resources and introduces risks. But to avoid virtualization sprawl, IT teams must take specific steps to address the unused VMs that already exist and to prevent more of those VMs from being created.

How can you prevent VM sprawl?

To get VM sprawl under control, IT teams must stop the careless behavior that leads to sprawl and take a more proactive approach to VM lifecycle management. A good place to start is by implementing a comprehensive set of documented VM policies for controlling virtualization usage. The policies should help standardize the processes used to create, maintain, archive and destroy VMs so the unused ones are kept to a minimum. Users should be able to create VMs only when they're needed, and only the necessary physical resources should be allocated to the virtual servers to avoid overprovisioning.

In addition to defining policies, IT should audit the existing VMs to determine which ones are actively operational and under the control of a virtualization platform and which ones aren't being used and can potentially be deleted or archived. The goal is to identify every VM on the network and document its usage, whether it's fully operational, running in an idle state or completely shut down. Admins should also evaluate the operational VMs to determine whether they conform to the newly defined policies and then take the steps necessary to bring them into compliance.

For the unused VMs, admins should carefully assess them to ensure they're no longer needed. However, they shouldn't delete or disconnect any unused VM until its status can be verified. That said, determining whether a VM is still needed isn't always a straightforward process and, sometimes, it takes shutting it down or disconnecting it to see whether anyone raises an objection. Proceed with caution. Some VMs might appear to be out of service but still serve an important function, if only part of the time.

Once it's determined that certain VMs are no longer in use, they can be archived or destroyed. When destroying VMs, admins should ensure no sensitive data can be compromised. They should also look for any VM file fragments that got left behind, as well as secondary files such as temporary or configuration files. In addition, they should search for orphaned snapshots or backups and delete those in a secure manner once they've verified that they're no longer needed.

Concurrent with cleaning up the virtual environment, IT teams should take several other steps as part of their VM lifecycle management plan. They might implement practices such as the following:

Monitor systems for signs of VM sprawl, such as inexplicable slow performance or server logs that show no login activity.
Use the advanced VM management features available through their virtualization platforms, such as automatically decommissioning VMs based on specified expiration dates.
Implement VM tagging to more easily track and inventory VMs after they've been deployed.
Establish a VM baseline and then perform regular audits at assigned intervals.
Assign costs to internal customers for VM usage, using cost models such as chargeback or showback so business groups are more aware of VM-related expenses.
Implement strict access controls that limit the number of users who can create VMs or reallocate resources to existing VMs.

To maintain control over their VMs, IT teams also need the proper tools to manage and monitor VM operations across their networks. The right tools can offer insights into the entire VM ecosystem, providing information such as how many VMs are running, who owns the VMs, which computers are hosting the VMs or where VM data is stored. Many tools can also track details about VM software and OS licenses, and some tools also offer advanced automation and orchestration capabilities to help streamline management operations and reduce VM sprawl.

With the right management tools, along with well-defined policies, IT teams can overcome their VM sprawl challenges, but they must first recognize the seriousness of the problem and then be willing to take the steps necessary to properly address it.

This was last updated in March 2021