The proposed Regulation SCI (Systems Compliance and Integrity) from the SEC may require further strengthening, but principles such as mandatory testing of disaster recovery procedures would ensure that financial organisations recognise some of the key weaknesses
in their system infrastructures.
One of the examples of system failure cited by the SEC was the software glitch that resulted in BATS Global Markets cancelling its IPO on its own exchange. BATS, like many other operators in the securities market, is no stranger to unexpected failures. At
the end of 2011, trading at BATS Chi-X was halted for an entire day after a hardware failure. Whilst the exact cause of the failure was not confirmed, similar instances have been attributed to network devices.
In 2012, over confidence in the reliability of the Tokyo Stock Exchange’s systems eventually led to the stoppage in trading for a few hours when a switching procedure, triggered by a hardware failure, was unsuccessful.
Looking at the recent outages to RBS and NatWest it’s easy to see that whilst networks should be designed with possible failures in mind, outages still affect even the most resilient networks. The problem for many organisations is that they’ve traditionally
concentrated on protecting their data, implementing automated fail safes, but have ignored the nuts and bolts (the network) that hold it together.
As companies deliver more real-time services over the network, the outage stakes haven risen and network configurations need to be better managed. With misconfiguration being a contributing factor in over 65% of network outages, business leaders need to
understand the risks of even a small failure and have a strategy to recover.
Even when a hardware outage occurs, the trouble often begins just when the IT team thinks the panic is nearly over. With the failed hardware replaced, all that’s left is to restore the settings. It should be a simple matter of few clicks, but this is normally
the time that organisations discover that the backup of the previous working configuration is not up-to-date.
A typical infrastructure might have hundreds of network devices from dozens of different vendors, each requiring manual intervention by skilled engineers to create back-ups of those configuration settings that drive the network. Because it’s a time consuming
task, network configuration backups are often put to one side due to other business as usual activities. As network and security devices such as firewalls are changed fairly frequently, unforeseen risks and compliance failings start to build up as the distance
between backup cycles grow.
When an outage strikes, as a result of hardware failure or due to human error, the effected organisations network engineers are against the clock to resume normal operations. Without a current backup, they are often faced with making live changes to the
network, to rebuild configurations to their last known state.
Even when engineers have created scripts to automate the configuration backup process, recovery operations are seldom tested. The recovery process is also usually manual, requiring skilled engineers to be available. Without centralised automation of both
the back-up and the recovery of network device configurations, delays in restoring systems and downtime costs will be inevitable.
Whilst organisations are generally well prepared with regards to their server infrastructure, network devices are often overlooked and inadequacies in business continuity plans only come to light when a device needs to be restored. Events that activate a
disaster recovery situation are rarely predictable. By mandating the testing of disaster recovery procedures, Reg SCI will ensure that whilst the cause of the disaster can’t be predicted, the recovery path can.