Blog article
See all stories »

Big Data in the Financial Services Industry - From data to insights

1. Introduction

Just as "Cloud", "IoT" (Internet of Things), "Open Banking" and "Machine Learning", "Big Data" is one of the most written buzzwords in the financial services industry today, but just like the other mentioned terms a clear definition is not so easy to provide.
Especially as "Big Data" is often used as a synonym for customer analytics, real-time analytics or predictive analytics.

The general consensus is that "Big Data" is the collective term used for the contemporary methodologies and technologies used to collect, organize, process and analyse large, diverse (structured and unstructured) and complex sets of data, while "customer / real-time / predictive analytics" mainly refers to specific types of analyses done on these data sets to find patterns and create business value. However, since the ultimate business goal of Big Data is not the data itself, but to get business insights into the data, the analytics part of the chain is the most visible and important for a business user, which explains why the terms are often interchanged.

According to a study of IBM in 2015, it is estimated that every day we create 2.5 quintillion (1018) bytes of data and that 90% of the data in the world today has been created in the last 2 years.
These figures show that the scale of Big Data has taken an exponential increase in recent years and will continue to rise in the coming years, especially due to the further adoption of mobile technologies and IoT.

2. Impact on the Financial Services sector

As the financial services sector is probably the most data-intensive sector in the global economy, the impact of Big Data on the sector is hard to overestimate.

Banks have enormous amounts of customer data (i.e. deposits/withdrawals at ATMs, purchases at point-of-sales, payments done online, customer profile data collected for KYC…​), but due to their silo, product-oriented organisations, they are not very good in utilizing these rich data sets.
This while the financial services industry has been investing heavily for more than a decade in data collection and processing technologies (such as data warehouses and Business Intelligence) and is one of the forerunners in investments in Big Data technologies.

Due to the increasing and changing customer expectations and the increased competition of Fintech players, the financial services sector can simply not permit itself to leave those huge amounts of data unexploited. Instead banks and insurers should leverage the existing (and new) data sets to maximize customer understanding and gain a competitive advantage.

Several players in the market are already using Big Data techniques to deliver compelling use cases, but many organisations are still lagging behind.

3. Drivers

The recent rise of Big Data is driven by several factors, which strengthen each other, resulting in an exponential increase of data and the need to derive value from it:

  • Change in customer behaviour and expectations:

    • Customers are interacting with their bank or insurer more and more digital, meaning that personal interaction is reduced, but at the same time it is possible to collect in an automated way much more data about the customer (e.g. his browsing history, geo-location data on his mobile phone, exact timing of the interactions…​) than when he visits a branch. This data should be leveraged to compensate the reduced customer engagement, caused by the loss of personal interaction.

    • Customers use more and more social media: where these media used to be limited to closed private circles of friends, customers now use these media more and more in their day-to-day live, e.g. to interact with companies. This means banks and insurers should interact more through these channels to offer services and to gain insights about their customers.

    • Customers expect more and more a high-quality, low-friction, around-the-clock, customer-centric experience across multiple channels. In order to deliver such a personalised service, an in-depth holistic knowledge about the customer is required. This can only be achieved by leveraging all available customer data through Big Data techniques.

  • Technological evolutions leading to larger amounts of input data:

    • The rise of IoT (Internet of Things) will further explode the amount of customer data, as it will result in new, continuous (even if customer is not interacting with the bank or insurer) streams of data.

    • New advanced authentication techniques, such as biometric authentication and continuous authentication (e.g. mouse movements and keyboard rhythm or accelerometer and gyro sensor readings on mobile phone), will also considerably increase the amount of data to be processed in near real-time.

    • The rise of Open Architectures (Open APIs), allows banks and insurers to collect valuable data about their customers from data stored at competitors.

  • Competition of Fintech players using already Big Data techniques for new financial services. E.g. the recent success of the Fintech robo-advisors, offering automated digital investment advise using their gathered customer profile information, shows that Fintechs are already able to convert Big Data into new compelling customer services. Unless banks can deliver quickly similar services, they are likely to lose considerable business to these Fintech companies.

  • Regulatory pressure: the recent tsunami of new regulations (Basel III, FRTB, MiFID II, AML/KYC, FATCA…​) force banks to disclose more diverse data and more granular data to central banks and regulators. Furthermore, the fines associated when not complying to these regulations are climbing. This forces banks to collect more and more data in a controlled way, so that the necessary regulatory reporting can be generated automatically, but also that all data is available for ad-hoc inquiries of the regulators.

  • Increased cyber-security: with fraud and financial crimes increasing, banks need to protect their most valuable asset, namely the "trust" that customer give to their bank. This increases the pressure to further secure the interaction channels and the customer data, through different security techniques. One of the most promising is risk-based authentication, in which a fraud-detection engine calculates a risk profile for each channel request, determining the required level of security (authentication). This fraud-detection engine uses customer analytics to identify irregularities in the user’s behaviour.

  • Pressure to reduce operational costs: due to the increased competition and low interest rates, profit margins in the financial services industry are dropping. Banks and insurers are forced therefore to reduce operational costs, by improving business efficiency. Many of these efficiency gains can be driven by the insights gained from Big Data.

  • Technological evolutions to support the processing of huge amounts of complex and diverse data in real-time: with data sets growing so large and complex, traditional tools are no longer able to process this data at sufficiently low cost and in reasonable time. Luckily a set of new technologies provides an answer to this issue, allowing to process these data sets in near real-time and at lower costs, i.e.

    • Event streaming: streaming of large volumes of events in real-time.

    • NoSQL databases: databases which allow to store and retrieve data in a much more scalable and flexible way than the traditional relational databases (RDBMS).

    • In-memory data stores: data structure which resides entirely in RAM and is distributed among multiple servers.

    • Distributed processing (= distributed computing): using a network of computers to split up a task in smaller tasks, which are executed in parallel over the different computers (often commodity hardware), after which the result is aggregated.

    • Machine-learning: give computers the ability to learn without explicitly being programmed.

    • Advanced data visualization: tools to assist users in visualizing in a user-friendly way the large amounts of data and the insights derived from it (e.g. through bubble charts, word clouds, geospatial heat maps…​).

    • Cloud solutions: cloud solutions offer a cheap and flexible (i.e. elastic scalability) infrastructure (but also higher-level services) to support these Big Data technologies.

4. Characteristics of Big Data

Big data is characterised by the 3 V’s, i.e.

  • Volume: a vast quantity of data (i.e. terabytes or petabytes) to be handled. These huge amounts of data make it impossible to be processed by traditional data processing tools within reasonable time delays.

  • Velocity: Big Data technologies should be able to process both batch and real-time data. For real-time data, quick analysis for (near) real-time insight generation can be a necessity for the business.

  • Variety: multiple types of data should be supported, i.e. from highly structured data to unstructured info like text, video, audio, blogs, tweets, Facebook status updates…​

5. Use Cases

The financial services industry, being a data-driven industry, allows to define a multitude of use cases, where Big Data and Customer Analytics can bring added value.
In this chapter, a few of these use cases are presented.

5.1. Sales and Marketing

McKinsey research estimated that sales and marketing consume about 15 percent of the costs of financial service companies, meaning that improving the efficiency of these processes can lead to significant cost savings.

When looking at the customer lifecycle, we identify 3 stages, i.e.

  • Acquisition: the art of attracting new customers, by targeting prospects through campaigns.

  • Activation: this stage includes re-engaging old customers or dormant customers to buy new products/services, but also the art to maximize the first purchase of new customers.

  • Relationship management (Cultivation): cultivating the customer’s relationship with the company.

These stages can be supported by Big Data, demonstrated by below use cases:

  • Improve the efficiency of acquisition through Big Data: mass marketing campaigns are costly and often ineffective, especially nowadays when customers are overloaded by marketing campaigns through different channels. Effective marketing campaigns should therefore target the right set of customers, with the right (personalized) message and through the right channel (direct mailing, email, channel advertising, social media, TV, radio…​). Big Data can provide an answer to this by:

    • Segmenting prospects based on publicly available information about the prospects and insights gained from information from the existing customer base. This allows to select the right set of prospects for targeting a new product/service (hyper-targeted marketing).

    • Determine the best channel (mix) for the marketing campaign, based on the gained insights in the selected prospect segment.

    • Personalization of marketing messages (contact optimization)

    • Monitor what customers say, i.e. monitor social media and other information sources (e.g. call centre records) to get direct feedback on marketing campaigns, allowing to adjust ongoing campaigns or take lessons for future campaigns (and product design/development).

    • Identify influential customers, i.e. identify and engage with influential customers (i.e. customers who have high impact on company brands or products) to boost word-of-mouth marketing.

  • Improve the efficiency of activation through Big Data: once a prospect has replied to a campaign, it is important to maximize the first sales opportunity. At the same time, sales to existing customers should also be boosted. Big Data can also support those processes through:

    • Segmentation of customers, based on the available data (e.g. customer profiling, analyzing transaction patterns, past and immediate customer behaviour…​) to get real-time customer insights. This allows to predict the products or services customers are most likely to be interested in (i.e. predictive analysis) for their next purchase, thus allowing to determine next-best-offers (next-product-to-buy) and what his most likely next action will be. These products can be specifically marketed to the customer and proactive offers can be generated.

    • Bundle products based on the gained insights to boost cross-selling.

    • Optimize pricing, i.e. apply dynamic pricing, based on estimation of how much a customer is willing pay for the product or service.

  • Generate cross- and up-selling opportunities based on customer insights and current customer behavior. These opportunities can result in notifications, call-backs or in specific pop-ups in the front-end channels, e.g.

    • Customer has received a large inflow of cash: investment cross-selling opportunity

    • Customer has less money on his current account than he requires based on the budget he has created in the Personal Finance Management module of the bank: credit line (overdraft) cross-selling opportunity

    • Customer has been simulating car loans on the internet and steps into a branch: car loan cross-selling opportunity

    • Customer has cancelled a demand in middle of the flow and steps into a branch: cross-selling opportunity for continuing the demand

    • Customer has an expiring term deposit: term deposit reinvestment selling opportunity

    • Customer arrives in foreign country (identified based on geo-location information): opportunity to authorize credit card for the country (if not the case) and to temporarily increase his credit limit (e.g. to pay hotel bill)

    • Customer is performing home renovations (i.e. identified based on transaction information): renovation loan cross-selling opportunity

    • Customer is buying a bond on the stock market: upselling opportunity for similar structured note (if customer insights show that customer would be open to this and has enough knowledge of the product).

    • Customer is requesting consumer loan, but payment history identifies lot of home renovation expenses: upselling opportunity for home renovation loan

    • Customer does not have a home yet and is currently located at a house for sale (based on geo-location information and public information of houses for sale): selling opportunity for mortgage

    • Customer modifies certain customer information (e.g. change of address due to move/relocation, change of civil status e.g. following a wedding): selling opportunities for loans (e.g. mortgage, car loan…​) or insurances (home insurance, car insurance…​).

5.2. Customer Cultivation

This paragraph describes the use cases to optimize the management of an existing customer relationship, i.e. the so-called customer cultivation.
Big Data can also support this life-cycle step by:

  • Transforming the business to a customer-centric business: empower employees with a Single Customer View (i.e. 360° holistic view of the customer), providing a broad, centralized view of the customer information, a full view of the history of inquiries and transactions (regardless of the interaction channel) and insights on the customer’s family, business and bank employee relationships. These insights allow to have more focused and in-depth interactions with the customer.

  • Identifying high-value and most profitable customers: identifying those customers allows to provide a premium service to these relationships, i.e. offer more attractive products and services, provide attractive pricing or get insights on how they behave, how they best should be reached and what motivates them to buy more.

  • Enhancing the loyalty of existing customers through:

    • Targeted next-best-offers

    • Loyalty programs, e.g. based on card usage habits

    • Partnerships with retailers to send discount offers to cardholders, who use their card near the retailer’s stores

  • Retention management: detect customers with high risk of leaving (indicators can be e.g. cancellation of automatic payments, customer complaints in call centre calls or on social media…​) and provide retention offers to these customers.

  • Adaptive channel interactions: based on the data of current and past customer behaviours, it is possible to predict future customer trends and what their most likely next action will be. Front-ends could use this information to show dynamic buttons/menu items (i.e. dynamic screen adaptation), which put these next actions forward.

  • Improve efficiency of products, services and channel interactions: monitor the customer journey, i.e. interactions of customers, to gain insights to improve existing channels, processes and products. Such improvements will also result in an improvement of customer service.
    Feedback of customers on social media can also be used for this.

5.3. Risk Management

Big Data can help banks and insurers to significantly improve risk management, through improved and (more) real-time insights in the customer behavior.

This paragraph provides some examples per type of risk:

  • Cyber (identity) fraud detection and prevention: use Big Data to feed fraud-detection engines, allowing to continuously assess the risk of identity fraud and determine near real-time whether additional security measures (e.g. additional authentication techniques or restriction of access) are required.

  • Liquidity risk management: get better insights on the incoming and outgoing cash flow, to optimize liquidity management. This technique can be useful for both physical money at branches as for the overall liquidity management of the bank/insurer.

  • Credit risk management: based on customer insights, improve the credit models for private and corporate customers, thus allowing to improve credit scoring. These insights can be derived from transaction history, public information (e.g. annual reports of companies), IoT data (e.g. inventory sensors, home sensors, car sensors…​)…​ This data can also be used to better manage the collateral of credits, thus also reducing credit risk for the bank.

  • Cards fraud detection: analyse the card transaction patterns (location, timing, amount, type of merchants…​) to identify frauduleus transactions, so that they can be blocked.

  • Insurance fraud detection: improve detection and prevention of fraud at opening of a new insurance policy (i.e. policy data not matching with reality) and fraud when introducing claims. Several indicators can be used for this identification, e.g. IoT data (sensor data can give directly information about car accident or home damage), customer performing several simulation attempts with different configurations before demanding an insurance policy, a high number of view request to the insured amount of life insurances or fire/theft insurances preceding a claim, unpaid premiums…​

  • Legal Claim Management: Big Data can also help to better avoid, prepare for and react to legal matters involving large amounts of data, e.g. collect all information related to a legal matter, assessment of cases to determine probability that case will lead to a legal claim, tracking of regulations to avoid fines and sanctions…​

5.4. New Data Driven Products and Services

Big Data allows also to deliver new, innovative products and services to customers, which use the insights derived from the data streams.

Some examples:

  • Home insurance in combination with IoT (utilities smart meters, smoke and carbon monoxide detectors, fire suppression systems, advanced alarm systems) allowing to improve protection, dynamically adapt pricing and provide value-added services (e.g. statistics on utilities consumption).

  • Car insurance in combination with IoT (black-box in car) allowing to improve protection (e.g. car recovery in case of theft), dynamically adapt pricing (based on driving style) and provide value-added services (e.g. fleet management services to SMEs).

  • Trade Finance contract in combination with IoT (supply chain sensors) allowing the automatic execution of the contractual conditions defined in the contract.

  • Personalized Wealth Management Advise: use Big Data to identify customer goals, family situation, risk aversity, financial situation, financial goals…​ and propose automatically investment advise, tax advise and financial planning based on these insights.

  • Personal Financial Management: use Big Data to automatically classify financial transactions in categories, which are best suited to the type of customer, propose budget plans in line with customer’s goals and compare budget plans and actuals with "people-like-me".

  • Algorithmic trading: analyse massive amounts of market data in fractions of a second to identify investment opportunities

5.5. Internal Management Support

Where all use cases up till now are all focusing on getting better insights about the customer and consequently better servicing these customers and generating more profits, this paragraph focusses on using Big Data to support internal management decisions.

Some examples:

  • Examine customer feedback: collect and analyse customer feedback from different sources (e.g. call centre comments, social media…​) to identify improvements to products and services. Using these techniques allows much faster reactivity to this feedback than traditional surveys or focus groups, which tend to be slow, costly and inaccurate (due to the limited size of the sample group).

  • Determine branch location/relocation strategy: use Big Data to understand where customers live, where they shop and how much they spend to determine optimal locations of branches.

  • Big Data can also be used to better comply with regulations and improve regulatory reporting. Some examples of regulations which are very data driven are KYC/AML, FATCA, MiFID2 and Basel III. Getting better insights into the customer will allow banks to reduce the risk of sanctions and fines, due to regulation breaches.

6. Benefits

As showed above, Big Data can support multiple use cases which will bring significant benefits for banks or insurers:

  • Drive growth in the business

  • Drive better risk management (better identification, assessment, prevention and mitigation of risks)

  • Reduce and better control costs


  • More personalized and targeted marketing (maximizing lead generation potential)

  • Improved measurement of marketing effectiveness across all channels

  • Optimized funnel conversion

  • More personalized customer servicing (relevant content per channel, dynamic pricing, next-best-offer…​)

  • Faster reactivity (seconds rather than hours) to customer related transaction issues

  • Holistic and forward-looking view (predictive analysis) of customers

  • Increased customer loyalty (better servicing, loyalty programs, knowing which customers are going to churn and when)

  • Enhanced usability (by dynamic adaptive front-ends and better monitoring and feedback cycles on usability)

  • Identify hidden connection between seemingly unrelated data

  • Identify customer trends and changing customer preferences and expectations faster than competitors

  • Guiding customers to low cost channels

  • Reducing time lost searching information

  • Supporting optimal management decisions

  • Improved modelling of credit scoring and fraud detection

7. Challenges

Delivering these Big Data enabled use cases and reaping the benefits from them poses significant challenges for banks and insurers:

  • Data sits very often in many silos (i.e. large, monolithic legacy systems), making it difficult to centralize data. Many financial service companies have already invested in extracting specific data from these silos towards data warehouses (often in batch), but Big Data ideally requires more data and a faster (near real-time) delivery. Therefore, significant data integration efforts are expected.

  • Current technologies are not able to handle the high-volume, high-velocity, high-variety data sets and analyse these data sets in a timely manner. Banks and insurers will therefore have to invest in new Big Data technologies to support the new use cases. These investments will require application selections (several Big Data solutions are already available on the market) and a good understanding of the new technologies and the specific complexities associated with distributed systems.

  • Big Data consists of a lot of unstructured content, which makes it difficult to interpret. Furthermore, given that Big Data can identify hidden connections between seemingly unrelated data, it is difficult to know in advance which Big Data will bring the biggest business value.

  • Introducing Big Data requires a cultural shift. Instead of considering data as an IT asset, the ownership of data should be moved to the business users, making data a key asset for decision making. This means that a closer cooperation between business and IT is required and the organisational product-silos should be torn down.

  • These new Big Data technologies and methodologies result in new profiles such as data scientists and Big Data IT specialists. As all companies start investing heavily in these processes, a war for talent is expected.

  • Banks and insurers should consider the restrictions of regulatory requirements and privacy concerns. For example, the European Global Data Protection Regulation (GDPR) imposes companies to store only personal data when there is a direct use for it. Furthermore, a customer should be in full control of his data, meaning he should be able to request at any moment to see or modify his personal data and the company should be able to explain for all stored personal data why it is stored. This makes Big Data use cases certainly a grey zone for breaching this regulatory law.
    Regulations also enforce more and more transparency on the algorithms used to e.g. calculate credit acceptance and pricing, insurance acceptance and pricing or algorithmic trading (cfr. MiFID2 regulation).

  • Data Quality: already today the principle "garbage in - garbage out" applies, when dealing with data quality. This statement is however exponentially enforced by the sheer volume of data, the real-time nature (meaning that cleansing actions are not possible) of the processing and the fact that Big Data leverages often third-party, publicly available sources (for which bank or insurer cannot manage the reliability).

  • Security: due to the centralization of customer data out of the well-protected silos, data security risks are amplified. A reliable data security policy (encryption, access restrictions…​) is therefore also a real challenge.

8. Data Sources

Big Data will leverage a multitude of internal and external data sources.
Some examples are:

  • Structured data present in the company’s databases, like

    • CRM information (e.g. KYC/AML information)

    • Product silo information (e.g. credits, accounts, payments, securities, insurances…​)

    • Security information (e.g. authentication, authorization and signature data)

    • Structured data gathered by new compelling services, like PFM (Personal Financial Management), Digital Investment Advise, IoT…​ This consists of information like budget plans, categorization of expenses, saving goals, investment profiling questionnaires, financial plans, IoT sensor data (like information from car telematics, wearables, home sensors, geolocation data…​)…​

  • Structured and unstructured data transiting through the organisation, but often not stored, like

    • Unstructured communication like emails, chat sessions, voice and video recordings (e.g. between branch or contact center employees and customers). This also includes unstructured summaries of communication, like call logs or comments written by call centre employees.

    • Customer behavior on websites, i.e. browsing patterns.

    • Meta-data on channel interactions, like browser name and version, IP address, customer operating system, cookie data, URL redirects, geo-location data…​

    • Customer surveys

  • Information gathered from external partners, like

    • Social media data, i.e. feedback, likes, (re-)tweets, shares, opinions, feeling and attitudes about the company brand, topic or keyword…​

    • Blogging

    • Data gathered from API ecosystems, e.g. data collected by partners, which deliver services on top of APIs exposed by the financial services company or data collected from APIs publicly exposed by competitors (e.g. PSD2 account information stored by competing banks)

    • Financial Market news, analyst reports, securities data…​

  • Publicly available information, like

    • Demographic data

    • Financial data about companies (like annual reports)

    • Media feeds (news agencies, newspapers…​)

    • …​

9. Technological Evolution

As indicated in the first chapter of this article, Customer Analytics has been around for more than a decade. Big Data and Real-Time Analytics should therefore not be considered as a revolution, but rather as an evolution of technologies and processes which have been around for several years.

In this paragraph, we give a short overview of this evolution. Since financial service companies have already deployed several of these "older" technologies, it is essential that the analytical solutions from different generations work effectively together, rather than having to perform a big-bang replacement of the existing data architecture.

Several financial service companies have been aggregating data from different channels and silos, by creating operational data stores (ODS) and data warehouses (DWH), which are based on relational databases.
Afterwards these companies created OLAP (On-Line Analytical Processing) cubes (i.e. multi-dimensional data tables) on these data warehouses or used Business Intelligence (BI) tools like Business Objects to slice and dice the data, to calculate KPIs and to better understand customer behavior. More recently these BI tools have also evolved towards more statistical and mathematical analysis, like e.g. SAS.

With data volumes increasing, data streams become more and more real-time and complex (structured and unstructured), these data warehouses and business intelligence architectures are no longer meeting the business needs of today. Big Data technologies provide an answer to these challenges, by using new distributed technology stacks, like Hadoop-MapReduce and technologies to analyse unstructured data, like text analytics, natural-language processing…​

These Big Data technologies require also new ways of storing the data, which gave rise to Operational Data Lakes (ODL). These data lakes store all data (i.e. data in use today, but also data that may be used someday), both structured and unstructured, in their raw unprocessed form (allowing easy adaptation to change). This in contrast to data warehouses, which only store the data required by the business and store this data specifically structured for flexible querying and intuitive viewing and analysis (i.e. data is denormalized with a lot redundancy). Operational Data Lakes form a modern replacement for Operational Data Stores (ODS), supporting both structured as unstructured data and handling much larger volumes of data. Often companies consider the replacement of an ODS with an ODL as a good first step towards a Big Data architecture.

The combination of these new technologies, with the current technology stack, is called a Big Data Warehouse (BDW), which is a hybrid data warehouse architecture. The structured data is still using the conventional DWH setup, but for the unstructured data a more modern ODL approach is taken. Big Data technologies can be setup in such a way that they can aggregate and analyse data from both sources.

Finally, there is also a trend from batch-oriented replication of data on which analytics are executed, towards online real-time streaming of data, on which (near) real-time analytics are performed.

10. Big Data Processes

From collecting data all the way to getting business added-value results (i.e. discovering useful information, suggest conclusions and support decision-making) is a complex process, requiring several steps to go executed. High level these steps can be split in 2 high level blocks, i.e. Data Management and Data Analytics.

  • Data Management: this includes all processes required to prepare the data for analysis:

    • Data collection (i.e. the data source layer): gather from different internal and external sources (data scraping) the raw data.

    • Data storage (i.e. the data storage layer): storage and/or staging (i.e. temporary storage of the data) of the data for further processing.

    • Data manipulation (i.e. the data processing layer): prepare the data to perform analysis more efficiently:

      • Data filtering: filtering out of irrelevant data

      • Data cleansing: improving quality of the dataset by filtering out or correcting incorrect data samples

      • Data transformation: transforming the data to a different format, applying simple business rules to the data…​

      • Data aggregation: generate synthetic data at different levels of granularity (i.e. grouping at different levels)

      • Data join or merge: joining / merging different data together

      • Data sorting: sorting the data in a pre-defined order (e.g. alphabetically)

      • Data comparison: compare data with previous (old) datasets to identify the differences

      • Data calculations: execute calculations on the data, like averages, sums…​

  • Data Analytics: this includes all processes, where new insights are gained from the data. This includes 2 sub-steps, i.e.

    • Modelling and analysis (i.e. the data analytics layer): there is a huge variety of techniques to model and analyse data and often techniques are combined to get the best result. In the different analytic techniques, different groups can be identified:

      • Unstructured data to structured data conversion: these techniques transform unstructured data to structured data, on which other Big Data techniques can be applied. This includes "Text Analytics", "Picture Analytics", Audio Analytics" and "Video Analytics".

      • (Descriptive) Data Mining: this refers to techniques using algorithms to discover hidden patterns, relationships, dependencies and unusual records or dependencies.

      • Predictive analytics: a variety of techniques to make predictions (determine likelihood of future events, i.e. future trends or likely behavior) from historical and current data patterns. Often based on time-series analysis, this type of analysis is typically used for determining the "next-best-offer" and implementing adaptive user interfaces.

      • Machine learning: this group of techniques consists of applying one of the above techniques but adding the element of automated learning to it. This means the analytic technique will learn itself to provide better insights into the data, i.e. the model compares expected outcome with the real outcome and adapts accordingly to better align for future predictions.

      • Social network analysis: this group of techniques will typically use a combination of above techniques, but due to their widespread usage, it is often considered as a separate group. It allows to represent, analyse and extract patterns and trends from social media data. A typical example is sentiment analysis, which aims to derive conclusions from the subjective information of the customer sentiment.

    • Data visualization (i.e. the data output layer): visualize the data and gained insights with different visualization methods like charts, graphs, decision trees, traffic light indicators, heatmaps…​ aggregated in dashboards and reports. It also includes the representation of the analytical model itself and reports needed for model monitoring, benchmarking and back testing.

When looking more in detail to "Modelling and Analysis", we can identify various techniques and algorithms to achieve different types of modelling and analyses:

  • Text Analytics (also referred to as text data mining): this refers to several algorithms to derive conclusions from unstructured text. It includes parsing the input text, adding derived linguistic features to it and then deriving patterns from this structured data.
    Example usage: measure customer opinions, product reviews, feedback

  • Anomaly or Outlier Detection: search for data items (i.e. outliers) in a dataset that do not match a projected pattern or expected behaviour (i.e. significantly deviate from the general average).
    Example usage: fraud detection engines

Association Rule Learning: search for interesting relations (interdependencies) between different variables in large databases.
Example usage: identification of cross-selling opportunities

Clustering Analysis: discover data sets that are alike to understand the differences as well as the similarities within the data (i.e. finding meaningful patterns within a data set).
Example usage: targeted marketing, identifying "people-like-me"

Classification Analysis: task of applying a known structure (classification) to new data. Often implemented using decision trees and decision rules.
Example usage: credit risk engine

Regression Analysis: tries to define the dependency between variables or even a function which models the data with the least error.
Example usage: algorithmic trading (i.e. correlations between stock prices) or churn prediction

  • A/B testing: technique in which a control group is compared with one or more test groups to determine which changes improve a given objective variable, e.g. marketing response rate.
    This technique is often used when deploying new software versions, i.e. user population is split in 2 parts, i.e. 1 group receives the new software version, while the other group remains on the current version. This way the impact of the version change can be measured and analysed.

11. Big Data Technologies

As mentioned in previous paragraphs, Big Data requires several new technologies to deal with the volume, velocity and variety of the Big Data sets.
These technologies are typically based on the Hadoop open source framework/platform. This technology stack allows to distribute computing problems across a (large) number of (commodity hardware) servers, thus enabling massive parallel processing. This permits very fast and scalable processing and eliminates dependency on expensive hardware (making the process much more economical).

The Apache Hadoop-oriented technology stack consists of multiple elements:

  • HDFS: Hadoop Distributed File System, enables the storing of large files by distributing the data among a pool of data nodes.

  • MapReduce: engine for the simultaneous processing of data across multiple nodes

  • YARN: Yet Another Resource Negotiator, i.e. the cluster-coordinating component of the Hadoop stack

  • Spark: Big Data framework, like Hadoop, which is more modern and significantly faster than Hadoop-MapReduce. Where Hadoop-MapReduce is mainly batch-oriented, Spark provides both batch processing and online stream processing. As Spark does not have a file system, it is typically installed on top of Hadoop, where Spark replaces MapReduce, but makes use of HDFS as file system. Spark can however also be deployed standalone, when paired with a storage layer (e.g. an in-memory data grid like Apache Ignite)

  • Pig: high level programming language that simplifies the common tasks of working with Hadoop, i.e. loading data, data transformation and storing the results

  • Hive: enables Hadoop to operate as a data warehouse, i.e. provides an SQL like interface and relational model

Other options for Big Data processing (often used on top or in combination with Hadoop) are:

  • Storm: Big Data framework, specifically oriented to stream-only workloads.

  • Samza: Big Data framework, like Storm, specifically oriented to stream-only workloads.

  • Flink: Big Data framework, like Spark, which allows both batch-oriented and online streaming processing.

  • HBase: distributed, versioned, non-relational database modeled after Google’s Bigtable (a NoSQL database running on top of the Hadoop cluster)

  • …​

All those open source applications are available for free, but to simplify usage and provide support, enterprise versions like Cloudera, MapR and Hortonworks exist. But even then, this Hadoop stack is not an out-of-the-box solution and requires extensive configuration and programming to perform the analytics desired by the financial service company.

Apart from these open source frameworks, which are generic toolkits, allowing to define any analytics for any industry, some companies have also started to create solutions (using the Hadoop technology stack in the back) to implement the specific customer analytic techniques in the financial services industry. E.g. NG Data delivers Lily Enterprise, which is a solution to collect data about the customer (i.e. Customer DNA) from different sources and generates insights from analytics. This solution is built on top of the Hadoop platform.

The different cloud providers also provide several offers for Big Data analysis:

  • Amazon (AWS): Amazon EMR, Amazon Redshift, Amazon Kinesis…​

  • Google (GCP): Cloud Pub/Sub, Cloud Data Transfer, Cloud Dataflow, Cloud Dataproc, BigQuery…​

  • Microsoft (Azure): HDInsight, APS (Analytics Platform System) …​

12. Conclusion

Banks have access to enormous amounts of data about their customers, but due to multiple constraints this data is not yet sufficiently converted into useful insights.
With competition in the financial services sector getting fiercer, banks need to adopt a data-driven approach if they want to stay competitive. As opportunities for incumbent banks and insurers from these insights are almost unlimited, Big Data will be a strong differentiator in the future competitiveness of financial institutions.



Comments: (2)

Vishwanath Thanalapatti
Vishwanath Thanalapatti - Temenos - Canada 10 September, 2019, 00:36Be the first to give this comment the thumbs up 0 likes

Thanks. A very interesting perspective.  Each human exudes data. The IOT eco system includes the human race, nay includes all life and the connected machines. Digitisation has accelerated creation of data that can be mined by emerging technologies. Yes. The ability to exploit the big data will be the differentiator.  

Pavlo Sidelov
Pavlo Sidelov - - Vilnius 25 January, 2020, 12:33Be the first to give this comment the thumbs up 0 likes

Great article with strong technical background. Thanks!

Now hiring