Blog article
See all stories »

Small Data - it's not Big...but it is clever

Natural language document processing of counterparty contracts sounds like another Big Data problem, but at its heart this is really about very ‘Small Data’. Banks are interested in a solution to assist with initiatives ranging from regulatory reform through to more competitive pricing, but most continue to grapple with this superficially simple challenge; here I delve a little deeper to discuss why this is so.

When it comes down to unstructured, natural-language documents, banks need Small Data. More specifically, the individual letters and numbers that are buried in the midst of pages and pages of contracts and emails. The terms are wide and varied, and can include key data such as: ratings, interest spreads, termination triggers, initial margins and haircuts, but they all boil down to just a few letters or numbers. Data mining and e-discovery is all very well, but when it comes to an environment such as banking, where every decimal point matters, what you want is the right data, in the right format at the right time and in the right system. That is almost certainly not a fancy NoSQL database, but more likely a plain old relational database that is feeding the Trading, Risk, CVA, Collateral and Treasury systems.

What is more interesting is that despite vendors playing ‘buzz-word bingo’ around Big Data and regulation, nobody has actually ‘joined the dots’ and given banks the solution that they really want. ‘Legal process outsource’ firms are busy selling people to manually grab these letters and numbers, banks technology teams are busy writing systems to house the data, but who is really looking at how the data is needed by the business? This is a more holistic problem that spans a multitude of contracts and relies on offering a single counterparty view that correctly accounts for the convoluted web of legal hierarchies.

But nobody is doing it very well

There are lots of solutions available. Unfortunately they only solve one aspect of the problem – some solve locating a set of documents of interest, while others help you codify data from documents into a particular data model. Some involve people following processing scripts to capture data, while others offer search that is based on the optical character recognition (OCR) of text. Few provide a view of the data that can be consumed, manipulated and mined by the dimension of interest, without significant custom development. Most stop there. None ‘solve’ the whole problem with a joined-up offering. They all side-step data complexity in some way or another – and it is very complex.

It’s not simple

So why is this so hard? To understand this, we must explore why counterparty relationships are so complex. Consider that most financial institutions are divided into business units aligned around different asset types. For example Equities, Fixed Income and FX will all sit in their own silos, while Prime Services will often be its own group but with links into multiple silos. You then have functions providing services across the business silos such as Compliance, Risk, Legal and Treasury. Next you have the additional complexity of separate legal entities, each with their own contracts, usually within a hierarchy of some description.

Now think about the counterparty relationship between two such organisations – you have relationships and trades between multiple legal entities on both sides, with different business area silos within each organisation in a criss-crossing mess. Things get harder still when you have a ‘layering’ of contracts, where the precedence of one agreement overrides some but not all of the terms in another, as you might see in a trade level confirmation, which alters the terms in an ISDA agreement. 

Economics can affect an individual silo, but there may also be cross-cutting measures with netting spanning the entire bank. Now throw into the mix 20 years of change, mergers and acquisitions, countless employees having long since departed and numerous amendments, and it becomes like untangling a huge and very knotty ball of string!

Federal Reserve Bank of New York economists identified this complexity as being a key contributing factor to the delay of several years experienced by institutional clients in the settlement of claims relating to OTC Derivatives contracts after the Lehman’s bankruptcy. Events such as these sparked the Financial Stability Board (FSB) to create “living wills”, forcing large global banks to simplify their structures and get a handle on how they would gracefully stabilise or shut down in a crisis.

Without solving this challenge, anyone trying to manage group-wide risk, optimise their collateral or convince the FSB they are in control may be waiting a very long time!

The solution?

While the solution is complex, it need not be as elusive as it first seems. To get it right, you’ll need to have a lot of background knowledge of the business domain, but will also need an in-depth understanding of what is contained in the various contract types, and the interplay between the documents that lurk in the depths of the document management archive of the bank. Getting the right ‘Small Data’ means having data models that address the business needs, which can evolve over time, plus a method of tying together and overlaying multiple contracts that are related by counterparty. This is a labour intensive process that can and should be augmented by the right technology, to drive up speed and accuracy whilst reducing cost.  

Doing this right the first time will protect your investment and help ensure that the constant revisiting and analysis of legal clauses becomes a thing of the past.



Comments: (0)

Now hiring