Siloed data traditionally collected by disparate infrastructure management and automation tools include security/compliance, DevOps, cloud orchestration, workload automation and infrastructure management/provisioning. Pretty soon, you realise you’re a victim of tool sprawl and data creep, and a prisoner of your own device! In effect, you’ve checked into the Hotel California, but your data can never leave!
If you’re adding monitoring tools one project or problem at a time, then your dev-teams are most likely having to write scripts to automate standard IT operations tasks and monitoring, which influences the silo mentality that bleeds into tool adoption. Cloud and virtualisation initiatives across converged or shared environments add multiple layers of complexity to management and automation of the underlying physical IT infrastructure. As a result, automated provisioning and ongoing management of workloads – applications and IT services – is also becoming overly complex and expensive to maintain.
The key to successful infrastructure management and automation is to implement ‘service-aware’ tools, which understand the performance context of applications or IT services. Some tools that are designed to specifically solve one silo problem such as network automation, lack a broader cross-domain context that would make them practical for automating virtualised services that typically include server, storage and networking components.
Without that service-aware, cross-silo intelligence capability in virtualisation and hybrid-cloud environments, admins can easily lose visibility to degrading conditions. Performance storms are created by the unintended toxic interactions among cross-silo shared resources in the converged data centre. A performance storm entangles multiple objects – such as VMs, hosts and applications – even if they are unrelated and this entanglement often has a dramatically adverse effect on overall infrastructure performance and end-user quality of experience. Some of the most common performance storms include:
Storage storms: typically occur when applications unknowingly and excessively share a datastore, which causes storage performance to deteriorate
Memory storms: can occur when you have multiple VMs trying to share insufficient amount of memory or, in other cases, you might have a VM that is ‘hogging’ memory and not leaving enough for the others, even with ballooning in place
CPU storms: caused when there aren’t enough CPU cycles or virtual CPUs to go around in the sharing of processing resources, leaving some with more and some with less
Network storms: usually occur when too many VMs are attempting to communicate at the same time on a specific interface or when a few VMs are ‘hogging’ a specific interface with traffic – limiting the ability of other VMs to send or receive data.
To see what is causing a performance storm, you need visibility not only into how objects are consuming cloud resources but also how objects are interacting with other objects within the infrastructure. Consumptive silo-specific alerts, using a combination of system-learned heuristics and sophisticated algorithms, point to the effects of performance storms – an impacted application or VM, for example – while interactional cross-silo alerts give details to accurately identify and resolve the source of the problem.
In order to deliver these interactional alerts – and reveal the usage overload interactions that may be occurring between different objects – you must incorporate a cross-silo analysis of the entire, end-to-end infrastructure – cutting across network, server and storage tiers, as well as applications, end-points and end-users to provide a service-aware context.
Furthermore, this cross-silo analysis needs to mesh and scale so that you can easily view the distant and proximate areas of impact for a given storm, as well as the source of contention and the resources affected. Only by visualising and analysing the cross-silo interactions can you accurately identify the trends and patterns of interactions that are causing the storm.
For sys admins, this ‘could be Heaven or this could be Hell’. But there is another way with live, granular, cross-silo data.
About the author
Atchison Frazer (@AtchisonFrazer) is VP of Marketing at Xangati, he was previously CMO of KEMP Technologies and CMO of Gnodal (now part of Cray). He has also held senior marketing and communications leadership roles with Cisco, ServGate, and HP.