In an earlier post I discussed the topic of NoSQL - what it is, what it isn't, and what are some of the misconceptions surrounding it.
I'd like to now turn to the topic of what NoSQL is good for - the actual use cases, especially those involving Enterprise NoSQL.
1. Operational Trade Store
Once a trade is made it needs to be processed by the back office, and reported to the regulators. Trade data is typically read off of the message bus connecting the trading systems, and persisted into a relational database, which becomes the system of record
for post trade processing and compliance. The original data formats are either XML (FpML, FIXML) or text based (FIX), and have to be transformed into normalized relational representation. This may sound easy enough, but with a high rate of innovation in the
front office, introducing very complex instruments quite often, the task of stuffing them into a relational store becomes harder and harder. And as a result, the back office takes a longer to respond to the needs of the business. This is compounded by the
need to create and maintain a fully normalized,"canonical" schema before any new data can be ingested, which can become quite onerous, leading to a proliferation of multiple schemas and databases. Or worse yet, workarounds are put in place that allow shoving
data into existing schemas (such as flags that indicate a record is of a different type than expected, or an empty shell into which any variable can be fitted).
These workarounds can create costly trade exceptions downstream, which need to be resolved manually, and the ensuing costs are compounded by the high maintenance costs of complex RDBS systems, leading to high costs per trade.
All of these ills can be addressed using NoSQL, by persisting trade messages as-is, without the need for transforming them into a normalized relational schema. Trade messages contain their own structure, and there's no need for an over-arching canonical
data model in order to process them or report on them. Furthermore, this structure can be modified at the time of querying the data based on the actual usage, rather than trying to create a schema that will handle any foreseeable usage. This is an example
of the notion of schema-on-read mentioned in earlier posts.
2. Reference Data
Another area where the current state of data management can seem abysmal is enterprise reference data - data about traded instruments and the legal entities related to them. Most banks have been through several rounds of M&A and other organizational changes
that resulted in multiple reference data management systems across the firm. This introduces data inconsistencies (which lead to trade exceptions), complexity and costs. Many firms have been trying to rationalize their reference data systems to create a single
enterprise data management platform. This has usually been a Herculean task however, for reasons similar to those mentioned above. Namely, the level of effort involved in creating a single, unified data model to handle all the different incoming data-vendor
feeds and address all the different concerns of the downstream data consumers. And this is typical of Enterprise Data Management efforts of this scale that rely on a relational database as their core platform.
In this case as well, NoSQL provides an attractive alternative, allowing for the persistence of data vendor feeds in their original format, without the need to transform them into a canonical data model. The data can then be fed to the costumers in the appropriate
formats with transformation occurring at the time it's needed, rather than ahead of time based on assumptions - again, schema on read vs, schema on write.
3. Customer Insight
Customer data is often dispersed across the organization, with diverse systems all having different notions and data models encapsulating a customer. Obtaining the illusive 360 Customer View, whether for revenue purposes, fraud prevention and risk mitigation,
or as a result of regulations, has been close to impossible as a result. And the need to go beyond the firm's firewall and incorporate web and social media data is making this even tougher.
Again, the main culprit is applying typical Enterprise Data Warehousing methodologies, which are relational in nature, and are dependent on canonical models which are hard to change and evolve as more data becomes available and the needs of the business
In this case there's also an added wrinkle, as some of the data is much less structured than in the previous two use cases. And this is true for more tha external web data: Customer on-boarding documents, call center notes, web server logs, etc. are all
internal to the firm, yet represent just as much of a challenge as social media data in incorporating them into a coherent customer view. This last particular aspect is generating a lot of interest in NoSQL, as it makes much more sense to use a non-relational
database for these types of data, but the advantages of NoSQL also become apparent when it comes to highly structured data, as it alleviates the need to harmonize and normalize the data before it can be aggregated. With NoSQL it's perfectly fine to have different
representations of a customer, which can be unified based on certain attributes without needing to create a single, over-arching data model. Thus new data can be easily incorporated from disparate systems, and then linked and enhanced with non-relational text
4. Regulatory Compliance and Investigations
Whether it's Dodd Frank, EMIR, FATCA, KYC or Basel III - most of today's regulations involve data. And the data needs to be obtained from disparate sources (including non-relational ones), and presented quickly, sometimes in an ad-hoc manner. Consider for
instance Dodd Frank Title VII, which requires reporting on all the phases of a swap transaction, eventually also including the pre-trade correspondence preceding it - obviously this would be quite hard to do in a relational database, which was not designed
with text analytics in mind. Similarly, FATCA requires reporting on foreign account access by US citizens, and the data concerning this access may be found in non-relational sources (such as the on-boarding documents mentioned above). Legal investigations
also represent a similar challenge, as they require combing through reams of documents and email messages in search of the ones relevant to the case at hand.
Furthermore, the need to constantly update internal procedures based on regulatory changes, including the mapping between the actual regulations and internally available data can become quite onerous.
In all of theses cases, a NoSQL database, particularly a document-oriented one, represents a superior solution to traditional relational technology.
5. Pre-Trade Decision Support
One of the earlier use cases for looking beyond relational databases was sentiment analysis - mining the web for indications of public sentiment to effect trading decisions and risk calculations. This use case is related to, but also expands on, text mining
based on news analysis. But pre-trade decision support involves the aggregation of many other sources of information - from highly structured market data to highly unstructured analyst research, to geo spatial data (in the case of commodities), etc.
All this data needs to be visible on a trader's desktop, and there's a growing realization that rather than just presenting diverse data side by side, there's value in aggregating it into a more holistic view of a given instrument. Here again, a documented-oriented
NoSQL store is a natural fit, especially if it supports geo-spatial information, and has event-processing features that can be used to alert traders about significant changes.
These are some of the use cases where I've seen successful NoSQL implementations within the industry. I’m interested in your thoughts - are these representative of the ones you're targeting ? Are there other prominent ones I've missed?