Nasdaq OMX blames software bug for outage

Nasdaq OMX blames software bug for outage

Nasdaq OMX says that last week's three hour outage was caused by a software flaw triggered by a flood of messages from rival Nyse Arca.

In its preliminary report on the meltdown, Nasdaq OMX says that it was forced to halt trading because of a "confluence of unprecedented events" that overwhelmed its Securities Information Processor (SIP).

The problems began when the SIP received more than 20 connect and disconnect sequences from Nyse Arca, meaning the system was deluged with far more messages than its tested capacity. In addition, Nasdaq OMX says that available capacity was further hit because the SIP received a stream of inaccurate symbols and generated quote rejects from Nyse Arca.

During this period, Nyse Arca sent multiple bursts with each connect and disconnect, topping more than 26,000 quote updates per-port, per second as it attempted to reconnect. By comparison, a typical August day for Nyse Arca would peak at less than 1000 messages per-port, per second.

The traffic caused the SIP's failure and exposed a "latent flaw" in the system's software code which prevented its built-in redundancy capabilities from failing over cleanly, and delayed the return of system messages to users.

Nasdaq OMX decided that the SIP system's ability to process quotes was so degraded that it was in the "broader public interest" to shut down. Data feeds were back up and running within 30 minutes but the outage was delayed as market participants were consulted and tests carried out.

Despite pointing the finger at Nyse Arca, the report admits that "a number of these issues were clearly within the control of Nasdaq OMX" and says that "our performance is unacceptable to our members". The exchange operator is working to improve the SIP's resiliency and looking into how it communicates with the market during future glitches.

However, warns the report: "Other issues contributing to the halt are more endemic to technology issues across today's complex markets and will require a broader industry-wide effort to resolve."

Comments: (1)

A Finextra member
A Finextra member 02 September, 2013, 09:07Be the first to give this comment the thumbs up 0 likes

It almost seems as if - for the sake of high frequency trading - the technical architectures of exchanges have been made less reliable than they used to be in the past. Where there used to be big iron keeping trading functions under central control, we now find pretty complex server farms and a lot of dispersed functionality optimized for sheer speed, with the risk that part X of the system does not always know what part Y of the system is doing - although it better should. Having strong protocols and proper flow control also between exchanges could prevent overload situations, those trading systems might be somewhat slower but much more reliable.