The growth of Data Science as a discipline is attributable to the availability of new sources of data and to the increased focus on incorporating analytical outputs into every day decision-making within enterprises. Data Science brings together the hitherto
disparate worlds of programming, statistical learning and data management. In my team, we have long recognized that financial institutions need a single analytics platform that brings together data management, statistical modeling and business application
development capabilities, not because we foresaw the emergence of data science, but because our approach was developed in response to real-world challenges faced by Financial Institutions. A few enterprise-level modeling related challenges we observed at financial
institutions are listed below.
- Separate Modeling & IT Worlds: Although banks rely heavily on statistical models and application development for gaining analytical insight, the modeling and IT worlds are pretty much separate within the bank. Even though the models
that statisticians develop have to be deployed and managed on bank IT systems, modelers and IT don't share a common toolset or even seem to speak the same language, and in large banks it's not uncommon for it to take several weeks for a model to be deployed
after it's developed. IT departments have developed mature principles and tools for systems lifecycle management and much of the methodology could be adapted for model lifecycle management. However, because of this disconnect in the two worlds, the processes
and methodologies remain separate.
- Modeling Data Governance: From a data management perspective, modeling platforms often work on copies of enterprise data. So while a bank may have put in place sophisticated data governance policies around data in the enterprise warehouse,
data used for models are often outside the purview of these governance systems. The oft-used phrase that the analytics problem is a data problem underscores the fact that analytics and data management have to be closely tied. Yet while banks have poured
resources into enterprise level data management and governance programs, enterprise level model management does not seem to have attracted quite the same level of attention. Regulatory demands have shaped a financial institution's data management approach,
and there is no reason to assume that their demands on model management will be any different.
- Modeling Output & Business Applications: Model outputs (such as scores) are integral to business decision making, but model outputs in themselves are not readily usable in business decisions. They need to be interpreted and delivered via business
applications. For example, a credit risk stochastic economic capital model might compute the required capital value, but the business may want to see that value allocated to individual exposures via a set of deterministic business rules (i.e. an application).
In effect, models should not really be executed in isolation, but they should be exposed as services that may be stitched together with other business process logic to form a complete analytical application.
- Model Execution & Data In-Warehousing: Finally, as data keeps increasing and analytical outputs are needed in shorter timeframes, a big challenge to financial institutions is the timeliness of analytical outputs. Moving model execution
to where the data has become a necessity today. This in-warehouse analytics approach, of course, necessitates a tight integration between the database and the modeling platform.
What is needed?
The fresh take on enterprise modeling must address all of these challenges.
Open source R is an enormously popular and functionally rich modeling platform. At the enterprise-level, however, an analytics platform should be more than a statistical package. The R platform, by itself does not provide the full data management and governance
capability desired by banks and required by regulators – data lineage, auditability and security are not what the R platform is architected for. The R platform also does not provide model management and model deployment capabilities, nor does it enable model
outputs to be integrated into applications easily. The solution must bring together the IT, app developer, data architecture and modeling worlds using a unified, metadata-driven, toolset.
What do I mean by a unified, metadata-driven toolset ?
- It is a unified platform in the sense that complete analytical applications – from data management, to model development and deployment, to incorporating model outputs into business usable applications – may be built, deployed and managed using the platform.
The platform should help bridge the IT-statistical modeler divide and integrate with IT security and deployment policies.
- The platform needs to be metadata-driven for some of the same reasons why enterprise data management platforms are often metadata-driven systems:
- The platform manages models like other application objects. For example, the same IT governance, security, auditability procedures that govern data and business rules objects apply to models. Uniformity of policy enforcement requires that we represent data,
application and model objects in a unified metadata management system.
- Models, like business processes, are discoverable and callable, service-orientated objects registered in the object repository. And building a complete analytical application becomes nothing more than stitching together different metadata objects, such
as a few data movement objects and data quality checks, a few business logic rules and perhaps a model execution task, into a single runnable object.
- Models access data, but modelers should not be expected to be SQL or NoSQL technology experts. Metadata-driven modeling means models work on variables that are also metadata objects. Behind the scenes, the system takes care of mapping these data objects
to their data source, thus freeing the statistician from having to deal with the complexities of data management technologies.
- The core modeling capability of the platform should adhere to accepted standards. While the R modeling platform has been in existence for over twenty years, recently its popularity has eclipsed that of other proprietary modeling platforms. Financial Institutions
would rightly balk at adopting an analytics platform that uses a new proprietary modeling language.
As today’s financial institutions seek to become more analytics driven, modeling cannot survive as an island within the institution. While a plethora of point solutions have been introduced into the market to meet specific analytical needs – from big data
processing to visualization – financial institutions need a more strategic approach to modeling; one that is built, developed and deployed on a unified platform. Models have been and will continue to be core assets that are essential to managing financial
performance, risk & capital, and customer relationships in the financial services sector. Firms can gain control of their modeling environment and integrate modeling with the fabric of enterprise-wide business intelligence with the right enterprise analytics
As always, please share your thoughts with me.