24 April 2014

The Big Data Blog

Amir Halfon - MarkLogic

9 | posts 44,958 | views 1 | comments

Innovation in Financial Services

A discussion of trends in innovation management within financial institutions, and the key processes, technology and cultural shifts driving innovation.

Enterprise Big Data: It's Not About Size

16 April 2013  |  3086 views  |  0

Data is at the center of most challenges facing our industry today, with business drivers such as new regulations, aggregated risk management, and deep customer insight all having critical data management implications. The term Big Data has become a common way to describe this, and while some of these challenges are associated with large volumes, it isn't really the size of the data that's at issue. I'd argue that at this point we know how to handle large volumes: use shared-nothing architectures that scale horizontally on commodity hardware.  The trickier problem has to do with a different "V" of Big Data - variety - and it is that aspect that I'd like to focus on. 

There are countless examples of business value locked up in data that does not fit neatly into rows and columns. The most frequently cited is Social Media, with its ability to offer deep customer insight and sentiment analysis. And there are many others within the company's firewall as well: Gleaning information from on-boarding documents for FATCA and AML compliance, getting a better handle on credit risk by analyzing ISDA agreements, lowering cost per trade by consolidating the processing of diverse asset classes with varied and complex structures, etc. 

How can we effectively handle all this information, which is either hidden in free-form text, or scattered across incompatible schemas? Hierarchical structures such as XML and JSON certainly come to mind, as they can accommodate various degrees of structure, organized in a way that mirrors intuitive human perception. Indeed, many organizations have been using XML to handle these business challenges and have reaped some benefits, but found themselves constrained by the underlying RDBMS platforms that actually managed the data.

The problem with the typical approach to handling hierarchical information is that data is "shredded" into tables: a customer / derivative trade / legal document, with all its hierarchical attributes, is shoehorned into an ER model that satisfies referential integrity. Don't get me wrong - I love relational modeling and I have spent years doing it, but 3rd Normal Form has its limitations when it comes to diverse data: just consider the typical first step when analyzing normalized data: de-normalize it! 

There is an alternative to shredding though, in the form of NoSQL - a wide set of technologies that transcend the boundaries of relational schemas. The name is somewhat unfortunate since SQL is actually one of the best features associated with an RDBMS (some call it the most successful Domain Specific Language). The problem with RDBMSs is not SQL but the pre-requisite of a schema definition for data ingestion and analysis, which hinders business agility. We've all seen cases where the business needs have been delayed while data models, transformations and analytical schemas were being developed. NoSQL databases free us from the rains of the schema to enable real business agility.

However, one factor has prevented a wide adoption of NoSQL technologies within the enterprise: the BASE architectural principle underlying most of them. It stands for Basically Available, Soft state, Eventual Consistency - a play on ACID transactions (Atomicity, Consistency, Isolation, Durability), which are associated with relational databases. BASE has several advantages when it comes to non-transactional systems, as it relaxes consistency to allow the system to process requests even in an inconsistent state. Social media sites are a perfect example - No one would mind if their Facebook status or latest tweet were inconsistent within their social network for a short period of time; it's much more important to get an immediate response than to have a consistent state of users' information.

Financial and other enterprise systems are a different matter though. Imagine for instance, a merger corporate action, occurring at the same time a firm is trading the affected instrument: The post-trade processing systems would certainly have to be consistent with the Reference Data system, or costly exceptions would ensue.

So how do we avoid schema woes without giving up ACID transactions, as well other enterprise qualities such as fine-grained entitlements, point-in-time recovery, and high availability, all of which we've come to expect for mission-critical system? 

The answer lies within a different category of technology called Enterprise NoSQL, which has been designed and built with transactions and enterprise features from the ground up, just like relational databases. But unlike those, an Enterprise NoSQL database models the data as hierarchical trees rather than rows and columns. These trees are aggressively indexed in-memory as soon as the data is ingested, and then used for both element retrieval and full text search, unifying two concepts that have traditionally been separate - the database and the search engine.

An Enterprise NoSQL database also offers full SQL access, thus combining the benefits of both worlds - the business agility associated with NoSQL and search, and the data integrity and sophisticated querying associated with a traditional RDBMS.

In the next installment of this blog I will explore the mechanisms by which this is achieved.

TagsPost-trade & opsInnovation

Comments: (0)

Comment on this story (membership required)
Log in to receive notifications when someone posts a comment

Latest posts from Amir

The Case for Semantic Technology in Financial Services

14 April 2014  |  1682 views  |  0  |  Recommends 0 TagsTrade executionInnovationGroupData Management 101

NoSQL Use Cases

04 January 2014  |  2381 views  |  0  |  Recommends 0 TagsRisk & regulationInnovationGroupInnovation in Financial Services

ACID, BASE and NoSQL

09 May 2013  |  2562 views  |  0  |  Recommends 0 TagsPost-trade & opsInnovationGroupBanking Architecture

Enterprise Big Data: It's Not About Size

16 April 2013  |  3086 views  |  0  |  Recommends 0 TagsPost-trade & opsInnovationGroupInnovation in Financial Services

Big Data Use Cases

24 February 2012  |  12002 views  |  0  |  Recommends 0
name

Amir Halfon

job title

CTO

company name

MarkLogic

member since

2011

location

New York

Summary profile See full profile »
Amir Halfon is Chief Technologist for Financial Services at Marklogic, where he oversees the deve...

Amir's expertise

What Amir reads
Amir writes about

Who is commenting on Amir's posts