Most of my posts so far have been focused on Big Data technology in Financial Services. In this installment, I’d like to switch gears a bit and focus on the industry itself, paying more attention to actual use cases within financial services institutions.
Let's start with the most widely discussed use case, sentiment analysis. Whether looking for broad economic indicators, specific market indicators, or sentiments concerning a specific company or its stocks, there is obviously a trove of data to be harvested
here, available from traditional as well as new media (including social media) sources. While news keyword analysis and entity extraction have been in play for a while, and are readily offered by many vendors, the availability of social media intelligence
is relatively new and has certainly captured the attention of those looking to gauge public opinion. (In a previous post, I discussed the applicability of Semantic technology and Entity Extraction for this purpose, but as promised, I'm sticking to the usage
topic this time).
Sentiment analysis is considered straightforward, as the data resides outside the institution and is therefore not confined by organizational boundaries. In fact, sentiment analysis is becoming so popular that some hedge funds are basing their entire strategies
on trading signals generated by Twitter analytics. While this is an extreme example, most financial institutions at this point are using some sort of sentiment analysis to gauge public opinion about their company, market, or the economy as a whole.
Another fairly common use case is predictive analytics. Including correlations, back-testing strategies, and probability calculations using Monte Carlo simulations, these analytics are the bread and butter of all capital market firms, and are relevant both
for strategy development and risk management. The large amounts of historical market data, and the speed at which new data sometimes needs to be evaluated (e.g. complex derivatives valuations) certainly make this a big data problem. And while traditionally
these types of analytics have been processed by large compute grids, today, more and more institutions are looking at technologies that would bring compute workloads closer to the data, in order to speed things up. In the past, these types of analytics have
been primarily executed using proprietary tools, while today they are starting to move towards open source frameworks such as R and Hadoop (detailed in previous posts).
As we move closer to continuous risk management, broader calculations such as the aggregation of counter party exposure or VAR also fall within the realm of Big Data, if only due to the mounting pressure to rapidly analyze risk scenarios well beyond the
capacity of current systems, while dealing with ever-growing volumes of data. New computing paradigms that parallelize data access as well as computation are gaining a lot of traction in this space. A somewhat related topic is the integration of risk and finance,
as risk-adjusted returns and P&L require that growing amounts of data be integrated from multiple, standalone departments across the firm, and accessed and analyzed on the fly.
Speaking of finance and accounting, a less common use case - but one that is frequently discussed as we're faced with increasing implications - is rogue trading. Deep analytics that correlate accounting data with position tracking and order management systems
can provide valuable insights that are not available using traditional data management tools. Here too, a lot of data needs to be crunched from multiple, inconsistent sources in a very dynamic way, requiring some of the technologies and patterns discussed
in earlier posts.
Turning our attention to the detection of more sinister fraud, a similar point can be made. Correlating data from multiple, unrelated sources has the potential to catch fraudulent activities earlier than current methods. Consider for instance the potential
of correlating Point of Sale data (available to a credit card issuer) with web behavior analysis (either on the bank's site or externally), and cross-examining it with other financial institutions or service providers such as First Data or SWIFT. This would
not only improve fraud detection but could also decrease the number of false positives (which are part and parcel of many travelers' experience today).
Most banks are paying much closer attention to their customers these days than in the past, as many look at ways to offer new, targeted services in order to reduce customer turnover and increase customer loyalty, (and, in turn, the banks' revenue). In some
ways this is no different than retailers’ targeted offering and discounting strategies. The attention that mobile wallets have been getting recently alone, attests to the importance that all parties involved – from retailers to telcos to financial institutions
– are putting on these types of analytics, rendered even more powerful when geo-location information is added to the mix.
Banks, however, have additional concerns, as their products all revolve around risk, and the ability to accurately assess the risk profile of an individual or a loan is paramount to offering (or denying) services to a customer. Though the need to protect
consumer privacy will always prevail, banks now have more access to web data about their customers – undoubtedly putting more informational options at their fingertips – to provide them with the valuable information needed to target service offerings with
a greater level of sophistication and certainty. Additionally, web data can help to signal customer life events such as a marriage, childbirth, or a home purchase, which can help banks introduce opportunities for more targeted services. And again, with location
information (available from almost every cell phone) banks can achieve extremely granular customer targeting.
So, I started and ended this use case discussion with web data, as it still seems to be dominating Big Data discussions, and adds minimal friction when executing POCs and projects. I would still argue that financial institutions shouldn't lose sight of internal
data, which remains central to their business. It is the combination of internal and external, and structured and unstructured data from unrelated sources that has the potential to truly revolutionize the industry.
I'd like to hear more of your thoughts. Are these use cases relevant? Have I missed anything? Are you currently pursuing any of these use cases using "Big Data technologies"? I welcome your input on the growing importance of this subject.