24 July 2014

The Big Data Blog

Amir Halfon - MarkLogic

9 | posts 47,549 | views 1 | comments

Avoiding Systemic Crises: Getting a Grip on Big Data

07 November 2011  |  4727 views  |  0

Systemic risk was at the heart of the financial crisis of 2008, and is again on everyone's mind as the current sovereign debt crisis unfolds. Regulatory and industry efforts are, therefore, focusing on getting a more accurate view of risk exposures across asset classes, lines of business and firms in order to better predict and manage systemic interplays.

Managing large amounts of data (including positions, reference, market data, etc.) is a key aspect of these efforts, and is one of the reasons data management has recently ascended to top-of-mind status, after being relegated to the back burner for many years.

We are finally at a point as an industry when data is considered a key input into business processes, with full awareness of the top executive ranks. Efficient allocation of capital is now seen as a major competitive advantage, and risk-adjusted performance carries more weight than ever before. And so, from the front office all the way to the board room, everyone is keen on getting holistic views of exposures and positions, which require fast, on-demand, aggregated access to disparate data.

Most Big Data discussions have been focusing on Internet companies such as Google and Facebook and the data they generate. There's been a lot of attention given to harnessing that data for commercial goals, and certainly the banking industry is examining these usage scenarios as it considers its future direction.

I would argue however, that a more urgent task associated with Big Data is the one mentioned earlier - namely managing the large amounts of financial data that have been sitting within most firms' firewalls without being utilized to address critical business concerns. The rest of this post will attempt to build a case for this argument, with following posts focusing on specific technology implications.

Being a recent darling of IT analysts, Big Data has had many definitions, but key aspects seem to be typically categorized along the "four Vs":

  • Volume
  • Velocity
  • Variety
  • Value

Let's examine each in more detail:

Volume

The web is not the only place seeing exponential growth in data volumes - our industry has witnessed exponential growth in trade data, beginning with the early days of electronic markets, and skyrocketing with the wide-spread use of algorithmic, program, and high-frequency trading. These generate orders of magnitude, more execution orders and cancels compared with the "quaint days of open outcry." Additionally, complex strategies, including cross-asset trading and instruments such as structured products, generate far more data per trade than simple ones.

Higher trade volumes mean higher market data volumes of course, but also much larger amounts of historical tick data and positions data that need to be kept around. New regulations require ever more extensive data retention, and sophisticated strategy development requires ever growing amounts of historical tick data for back testing.

Many systems are struggling to keep up with these vast amounts of data while still performing their tasks - whether it's risk management, regulatory reporting, trade processing or analytics.

Velocity

Higher volumes are not the only issue firms face today; the data is also coming at them at higher and higher speeds, resulting from low-latency and high-frequency trading. At the same time, data needs to be culled from source systems in ever growing speeds.

It is the latter aspect that's been getting a lot of attention lately, as new regulations become much more stringent about timely delivery of data, essentially mandating on-demand risk exposure and positions reporting.

Most current systems are ill-prepared to meet these requirements, making the notion of on-demand exposure reporting seem all but impossible. Many use long ETL data integration and batch calculation cycles to generate reports overnight, and are completely incapable of supporting an ad-hoc analytics model.

Value

Low value of the overall data set, or low information density, is another key aspect of Big Data. Just as Twitter feeds contain a lot of "noise" when you're interested in analyzing specific public sentiment for instance, financial data can have a very low "signal-to-noise ratio" when looking to analyze a specific market exposure, find correlations between unrelated variables, and so on.

Low information density puts even more onus on current analytics systems, as more and more data needs to be sifted through to get at the relevant information. In many cases, this can make traditional approaches to analytics fall apart.

Variety

Lastly, information variety has to do with loosely-structured data. And while this is quite clear when it comes to images and videos on the Web, within the financial services industry we've had a data variety challenge for quite some time, which has actually been getting a lot of attention lately… I'm referring of course to OTC derivatives – essentially contracts that have little in the way of structured data, and which were at the center of the financial crisis of 2008.

A lot of regulatory effort has been focusing on these instruments, attempting to make them more structured by establishing formulas for their trading, clearing, and settlement (e.g. central counterparties). While this certainly goes a long way toward reducing systemic risk, it will not fundamentally change the fact that certain instruments will always remain nothing more than a bilateral contract.

As long as OTCs exist, we need to find a mechanism to extract structured data out of these contracts in order to properly valuate them and manage their risk exposures.

I think you'd agree that all these factors make a case for Big Data management being a real challenge that we, as an industry, need to address right away. In the next few posts I'll cover some technologies that can help us get a grip on Big Data, focusing on the aspects above in greater detail.

TagsRisk & regulation

Comments: (0)

Comment on this story (membership required)
Log in to receive notifications when someone posts a comment

Latest posts from Amir

The Case for Semantic Technology in Financial Services

14 April 2014  |  3189 views  |  0  |  Recommends 0 TagsTrade executionInnovationGroupData Management 101

NoSQL Use Cases

04 January 2014  |  2557 views  |  0  |  Recommends 0 TagsRisk & regulationInnovationGroupInnovation in Financial Services

ACID, BASE and NoSQL

09 May 2013  |  2719 views  |  0  |  Recommends 0 TagsPost-trade & opsInnovationGroupBanking Architecture

Enterprise Big Data: It's Not About Size

16 April 2013  |  3238 views  |  0  |  Recommends 0 TagsPost-trade & opsInnovationGroupInnovation in Financial Services

Big Data Use Cases

24 February 2012  |  12361 views  |  0  |  Recommends 0
name

Amir Halfon

job title

CTO

company name

MarkLogic

member since

2011

location

New York

Summary profile See full profile »
Amir Halfon is Chief Technologist for Financial Services at Marklogic, where he oversees the deve...

Amir's expertise

What Amir reads
Amir writes about

Who is commenting on Amir's posts