IBM employee fingered as culprit in massive DBS outage

IBM employee fingered as culprit in massive DBS outage

An IBM employee has been fingered as the culprit behind a seven-hour system-wide outage that knocked out all consumer and business banking services and ATM and POS transactions at Singapore's DBS Bank on Monday.

In a letter posted on the bank's Website, DBS Ceo Piyush Gupta, says the outage was triggered during a routine repair job on a component within the disk storage subsystem connected to the bank's mainframe.

"So far, we understand from IBM that an outdated procedure was used to carry out the repair," says Gupta. "In short, a procedural error in what was to have been a routine maintenance operation subsequently caused a complete system outage."

IBM and BDS entered into a S$1.2bn agreement in 2002 in which the bank outsourced IT services and infrastructure in Singapore and Hong Kong to IBM.

Gupta says that all payments and transactions that were scheduled to be made on 5 July were completed. "Nothing was held over and full data integrity was maintained at all times," he says.

He continues: "I am treating this matter with utmost priority and the full scale investigation that we initiated last week is still underway. This investigation is being done with the support of IBM's labs in the US and their engineering teams in Asia.

In a statement, IBM says it has taken steps "to enhance training of our personnel related to current procedures and brought in experts from our global team to provide further assistance."

In addition, IBM and DBS are taking "additional actions to increase the resiliency and redundancy of this part of DBS' infrastructure."

Comments: (4)

A Finextra member
A Finextra member 15 July, 2010, 07:01Be the first to give this comment the thumbs up 0 likes

A seven hour outage because of an error ina routine repair job sounds outrageos but thats the way banking systems and the realted IT infrastruture are. A seemingly small incident/bug can paralyse an entire bank's network of ATMs,Branches & Online banking.

A Finextra member
A Finextra member 15 July, 2010, 10:49Be the first to give this comment the thumbs up 0 likes

Not all banking systems are created equal ...

Many banks worldwide trust their ATM and POS operations to fault tolerant systems that are designed to run flawlessly despite any hardware or software component failure - but conventional mainframes are not built that way.

As a further consequence, such fault tolerant systems are also built for very simple online repair, without requiring elaborate procedures to be followed. This reduces greatly the risk of human error - which obviously had hit DBS Bank badly in this case. 

Ketharaman Swaminathan
Ketharaman Swaminathan - GTM360 Marketing Solutions - Pune 15 July, 2010, 13:11Be the first to give this comment the thumbs up 0 likes

Using fault-tolerant hardware is probably the best option, but its higher costs would be justifiable only if banks are able to ascribe a $ value to the cost of such outages, including reputation loss.

A quicker and more pragmatic solution could be the adoption of a "4-eye" process, in which the two eyes of the 'maker' (the IBM employee in this case) are supplemented by two more eyes of a 'checker', who could be a bank employee.

Although it is by no means 100% foolproof, experience with another Tier-1 bank assures that such a process drastically reduces the chances of occurrence of a DBS-type of event.


A Finextra member
A Finextra member 15 July, 2010, 13:43Be the first to give this comment the thumbs up 0 likes

According to a study performed by Standish Group in 2008, the average cost of downtime for ATM operations is 3,600 US$ per minute, and for POS operations it is 4,700 US$ per minute. Multiplying these by 420 minutes of downtime results in a financial loss of roughly 3.5 million dollars, which is more than an adequate fault tolerant system would cost. As DBS is certainly a bit larger than the "average bank", their losses may have been even higher.

By the way, reputation loss is not considered in the above figures.

The suggested "four eyes" procedure may prove to be not so practical in real life, as it requires additional bank employees who get trained in hardware maintenance. They would have little practical experience compared to the vendor's technicians, as those deal much more often with related incidents because they serve a number of different customers. So when in doubt, the opinion of the vendor's technician is likely to prevail anyway ...