Systemic risk was at the heart of the financial crisis of 2008, and is again on everyone's mind as the current sovereign debt crisis unfolds. Regulatory and industry efforts are, therefore, focusing on getting a more accurate view of risk exposures across
asset classes, lines of business and firms in order to better predict and manage systemic interplays.
Managing large amounts of data (including positions, reference, market data, etc.) is a key aspect of these efforts, and is one of the reasons data management has recently ascended to top-of-mind status, after being relegated to the back burner for many
We are finally at a point as an industry when data is considered a key input into business processes, with full awareness of the top executive ranks. Efficient allocation of capital is now seen as a major competitive advantage, and risk-adjusted performance
carries more weight than ever before. And so, from the front office all the way to the board room, everyone is keen on getting holistic views of exposures and positions, which require fast, on-demand, aggregated access to disparate data.
Most Big Data discussions have been focusing on Internet companies such as Google and Facebook and the data they generate. There's been a lot of attention given to harnessing that data for commercial goals, and certainly the banking industry is examining
these usage scenarios as it considers its future direction.
I would argue however, that a more urgent task associated with Big Data is the one mentioned earlier - namely managing the large amounts of financial data that have been sitting within most firms' firewalls without being utilized to address critical business
concerns. The rest of this post will attempt to build a case for this argument, with following posts focusing on specific technology implications.
Being a recent darling of IT analysts, Big Data has had many definitions, but key aspects seem to be typically categorized along the "four Vs":
Let's examine each in more detail:
The web is not the only place seeing exponential growth in data volumes - our industry has witnessed exponential growth in trade data, beginning with the early days of electronic markets, and skyrocketing with the wide-spread use of algorithmic, program,
and high-frequency trading. These generate orders of magnitude, more execution orders and cancels compared with the "quaint days of open outcry." Additionally, complex strategies, including cross-asset trading and instruments such as structured products, generate
far more data per trade than simple ones.
Higher trade volumes mean higher market data volumes of course, but also much larger amounts of historical tick data and positions data that need to be kept around. New regulations require ever more extensive data retention, and sophisticated strategy development
requires ever growing amounts of historical tick data for back testing.
Many systems are struggling to keep up with these vast amounts of data while still performing their tasks - whether it's risk management, regulatory reporting, trade processing or analytics.
Higher volumes are not the only issue firms face today; the data is also coming at them at higher and higher speeds, resulting from low-latency and high-frequency trading. At the same time, data needs to be culled from source systems in ever growing speeds.
It is the latter aspect that's been getting a lot of attention lately, as new regulations become much more stringent about timely delivery of data, essentially mandating on-demand risk exposure and positions reporting.
Most current systems are ill-prepared to meet these requirements, making the notion of on-demand exposure reporting seem all but impossible. Many use long ETL data integration and batch calculation cycles to generate reports overnight, and are completely
incapable of supporting an ad-hoc analytics model.
Low value of the overall data set, or low information density, is another key aspect of Big Data. Just as Twitter feeds contain a lot of "noise" when you're interested in analyzing specific public sentiment for instance, financial data can have a very low
"signal-to-noise ratio" when looking to analyze a specific market exposure, find correlations between unrelated variables, and so on.
Low information density puts even more onus on current analytics systems, as more and more data needs to be sifted through to get at the relevant information. In many cases, this can make traditional approaches to analytics fall apart.
Lastly, information variety has to do with loosely-structured data. And while this is quite clear when it comes to images and videos on the Web, within the financial services industry we've had a data variety challenge for quite some time, which has actually
been getting a lot of attention lately… I'm referring of course to OTC derivatives – essentially contracts that have little in the way of structured data, and which were at the center of the financial crisis of 2008.
A lot of regulatory effort has been focusing on these instruments, attempting to make them more structured by establishing formulas for their trading, clearing, and settlement (e.g. central counterparties). While this certainly goes a long way toward reducing
systemic risk, it will not fundamentally change the fact that certain instruments will always remain nothing more than a bilateral contract.
As long as OTCs exist, we need to find a mechanism to extract structured data out of these contracts in order to properly valuate them and manage their risk exposures.
I think you'd agree that all these factors make a case for Big Data management being a real challenge that we, as an industry, need to address right away. In the next few posts I'll cover some technologies that can help us get a grip on Big Data, focusing
on the aspects above in greater detail.