Outages and glitches seem to be becoming more and more frequent – with a wide-reaching ripple effect ensuring that the impact of such outages are felt more widely than ever. This is dangerous for business – reputation is so fragile in this fickle economy
that businesses just can’t afford to allow their IT to let them down. So why does it keep happening? Amazon’s cloud going down, Natwest’s two week outage, the BATS IPO failure - these outages come in many different forms, and effect many different types of
businesses. The recent
United Airlines outage is the first widely reported airline to have fallen foul of this, and suffered the consequences in costs of reimbursing flight costs and twitter outrage. It’s beginning to look like outages are becoming something that we have to get
I understand that the nature of IT these days means that outages are almost certainly unavoidable, but the point I want to make is that in order to maintain business continuity; enterprises need to
take responsibility for planning their outage contingencies.
The problem is that business processes, applications and computing infrastructure are too intertwined and dependent on each other. If the infrastructure isn’t configured just right or is unavailable, the business process stops. The industry has made great
strides in abstracting the physical computing infrastructure from the applications it supports. Amazon and VMware have created tremendous value and built businesses by abstracting (or insulating) applications and users from hardware diversity and failures.
However, the industry has only started to abstract the business process from the applications and infrastructure that supports it. To work around an outage on the scale of Amazon EC2, organizations really need to utilize more than one provider to avoid
a single point of failure. Yet in order for the business to be successful at this there needs to be the ability to re-route and re-run the process in their own data center or an alternative service provider. This is where higher-level process automation
The recent outages at RBS, BATS Global Markets and others demonstrate the inability not only to abstract the process from the infrastructure but to see the inter-dependencies and the failures that plague complex IT systems as well. In those particular outages,
it took minutes to fix the problem but days to find it.
Process automation that keeps track of the complex inter-dependencies between applications, infrastructure and business workflows can help identify, or even predict problems. Then in the case of an unavoidable outage, the business workflows would be re-routed
to an available data center.
Most process automation done today is low level IT administrative tasks for provisioning servers, handling backup or startup routines, and generally doing infrastructure tasks that require little decision making that could affect the line of business. This
is necessary and important, but not sufficient to preserve the user experience or business process integrity in the face of increasingly complex IT environments where, statistically, something is always failing.
Enterprises must step up their IT process automation to the point that they can manage business workflows not just servers or IT tasks.
If the businesses dependent on Amazon had these capabilities, they would drastically reduce the outages they experienced. Orchestrating business workflows and associated data across applications and infrastructure is easier said than done. However, it can,
and is, being done by many enterprises to assure service-levels.
Being able to ‘roll-back’ failed system updates to previous working versions, spotting process failures before they create an unrecoverable backlog, and the ability to run a workflow on newly provisioned environments is the type of higher-level process automation
that abstracts inevitable outages from the user or business experience.
As enterprises get more serious about higher-level process automation, they will spend less time bemoaning outages and more time abstracting their processes from specific infrastructures and application environments.
Ready or not, the business is doing whatever it can to gain a competitive edge in today’s market by becoming more agile and responding to a quickly changing market and customer base. As business and IT people work together to create new internal capabilities
and customer-facing features to outmaneuver the competition, this means developing exponentially more software applications at a faster pace – and being able to launch them quickly on highly virtualized infrastructure. Speed often translates into a lack of
organization and infrastructure sprawl, while agile IT practices result in more fluidity as to where applications actually run. All of this causes IT complexity and application-to-infrastructure dependencies to skyrocket.
These inter-dependencies, which represent potential breakage points, are beyond human ability alone to manage. IT organizations are now forced to deal with these new realities while Cloud, Big Data, DevOps and ITaaS pressures get added to the mix in the
name of providing more business agility. With all these moving parts, something needs to be stable and act as the IT backbone. It’s increasingly obvious that it’s the process and process control.
The days of designing the process to accommodate the shortcomings of infrastructure are over. Enterprises must abstract, insulate and protect their business processes from the applications and infrastructures that support them. The need for improved IT
process automation is rising as the services and brand impact of on-line outages grows.