The Reserve Bank of Australia has lifted the lid on a serious outage in October that delayed the settlement of some real-time payments by up to five days.
A post-mortem on the incident, reveals that at 19.00 on 12 October an "operational error" occurred during a planned Bank wide change using the software that provisions the RBA’s virtual servers.
The error triggered a process that disrupted a significant number of servers in a random pattern over a period of approximately 25 minutes.
"The scale of servers affected was caused by a failure to comply with the RBA’s Technology Change Management policy and control gaps associated with the virtual server solution design contributed to the rapid propagation of the error," states the central bank. "The incident affected multiple systems across the RBA.
"While the strong redundancy features of RITS and FSS enabled parts of the system to continue operating normally, some services became unavailable and the resilience of the system was severely degraded. The scale and haphazard pattern of disruption significantly complicated the incident response."
As a result, around 500,000 NPP unique payments (17 per cent of the daily average volume for a Wednesday) sent by the public were delayed by at least four hours, with some delayed by more than five days.
The report pinpoints a number of serious weaknesses in the NPP platform, covering governance, monitoring of payment flows, recovery procedures and communications.
In a statement, the red-faced central bank says: "The RBA acknowledges the seriousness of this incident and sincerely apologises to industry participants and their customers for the widespread repercussions it caused."