I work in financial services, typically quantitative technology applications. A recent employer of mine was an imagery company, providing satellite and drone-sourced data into finance and insurance. In this heady mix of finance and space, I worked with people
from defence, aerospace, geospatial, surveying and satellite communications backgrounds who were intrigued and often surprised to hear about the relevance of open source and community programming in financial services.
Understandably, many see open source through the lens of Facebook, Google and Amazon-sponsored projects such as TensorFlow, PyTorch, Caffe, KERAS, etc which intersect with academic research programmes, meetups, and makers. To outsiders, open source can seem
contradictory to "proprietary" moneyballed finance. “We’ll write them a proprietary algo to identify and exploit the alpha in our data and they’ll buy it for six or seven figures” was a comment I heard from one venerable vendor company founder two years ago.
When I responded that graduates can create test and trade an algo in hours for free using similar data sources to his, his disbelief was evident.
Facebook, Google, Amazon, Apple and Microsoft are certainly driving open source, often with ulterior motives, consciously supporting the sale of proprietary tools and services. Unconsciously, they can be accused of driving liberal west coast values and weeding
out smaller commercial competition - I will be fascinated to see the consequences of Google releasing its Quantitative Finance Tensorflow. I also acknowledge the
historical and continuing proprietary tendencies of financial services. A finance technology VP once told me around Year 2000 that “open source would never take off in quantitative finance”. While factually wrong even then, his assumptions were reasonable –
management reputation, internal risk management and regulators beyond wouldn’t want untraceable, dangerous code running key algorithms. Key algorithms and the packages and languages in which they were embedded were also differentiators, hence proprietary.
At the time, institutions did what they could to make the most, hire the best and beat the rest, and proprietary languages and code were the norm.
However, things were changing even then and I want to argue that since the 1990s, financial services and associated disciplines have more than played their part in driving open source and community programming initiatives, including many key foundations
of modern data science, and has not received enough credit for that.
First, let’s consider the role of quantitative modelling in finance.
When forecasting price or risk, algorithms indicate but rarely predict except in basic mechanical single-factor instances, for example bonds. Most financial asset valuations are multi-factor and so “human” that linear – or nonlinear - algorithms connecting
price to dependent variable can correlate and guide but rarely causally determine. Some argue that price offers a perfect proxy, “ceteris paribus” to use the old economics phrase, so if you can identify the value of the thing you are measuring, your price
proxy can take care of the rest. However, that assumes a perfect rational market, which it rarely is.
Take an example. My house was put up for sale in April, with three estate agents valuations differing across a 25% price spread. We since sold our house for at a price just below the highest valuation. The lower valuers argued “objective facts” around two
dependent variables i) our house was semi-detached ii) worse still, a 1970s house. They ignored our countryside views, our quiet, safe dead-end road, community village pub, well-stocked shop and nearby school, also our house’s Arts and Crafts character was
applauded by two conservationist organizations.
The factors driving the value of my house and any asset of note are multiple, inter-related, and complex, often intangibly subjective depending on the purchaser. Price is an approximation. Some models will perform better than others over periods of time,
at least against some reputable ground truths, but not all models work all the time. This is in part why arbitrage exists and how financial firms profit. They tell us the value of our things, we nod our heads deferentially, then they buy, sell and re-sell
taking margin on the way. Okay, I'm exaggerating, but you get the idea.
In short, price prediction models are complex, multi-factor, indicative. In the smart beta world, multi-factor models are the assumed standard approach. Alpha identification is still prone to alleged single silver bullet factor influence, not least in the
crazy wild west of Alternative Data. However, be careful. Key correlative and back-test indicators such as R2 and Sharpe Ratio can look impressive but can mislead, with error (in the former) or misinformation (selective trades, no transaction costs) in the
Once we understand that financial models are simply models and rarely truth, open source begins to make sense, particularly as quantitative model frameworks are now readily available. In 1997, building, evaluating, testing and running models took time and
expertise in difficult verbose languages. Try running a Monte Carlo simulation in Excel with 1997 technology. In 2019, anyone with a propensity towards Python, MATLAB, R, Julia or a math library-equipped coding language of note and an API can run and integrate
a performant Monte Carlo model into Excel in minutes. From VWAP to ARMA to VaR and VAR, capabilities are rudimentary, off-the-shelf and freely available online. For those of a disposition towards R, see Ralph Sueppel's excellent Power of R for Trading Parts
1 and 2 for a very straightforward introduction.
Differentiation comes down increasingly to the uniqueness, reliability and parameterization of the data inputs, but even here open source is available. Quandl or CVX anyone?
All four named platforms above made their name providing “white box” or "grey box" frameworks and reference models, where users configure building blocks for their application. Compared to third party “black box” models, vendor risk is reduced, products
and maintenance are free or cheap, models are adaptable, extendable and deployable into multiple platforms for different uses and to be sped-up. Quantitative model “precision” limitations in finance became features not bugs, as they can be assessed, reproduced,
tested and expanded by Quants, not swept under a black box of excuses for model failure or methodology obfuscation.
At MathWorks, where I formerly worked, we were proud of File Exchange, a fabulous hosted file code sharing forum from 2001. Rivals slow off the mark fell by the way-side, including the vendor of SPlus, a commercial predecessor or descendant of R depending
on your view, superseded in part by a comprehensive, reviewed R-CRAN applied finance code repository maintained by Dirk Eddelbuettel. These excellent early community repositories have become industrialized by Python-dominated Github and JupyterHub contributions working
alongside core languages like C++ and Java, which themselves are predicated on and feature many open source contributions, not least the increasing importance of the OpenJDK ethos given Oracle's recent craziness. Numerical newcomers like Julia Computing have
gained community mind-space through supplying open source platforms with alleged benefits, selling services and key capabilities targeting allegedly free-spending financial services communities. While Julia quite understandably excites many in quantitative
finance and technical computing, recently winning the prestigious Wilkinson Award, its revolutionary foundations are built on earlier legacies.
Black box libraries are increasingly specialist. Even managers in finance are as likely to be STEM-trained quants or programmers than not, with those that aren’t often heavily steered and/or supported by those that are. Refined enterprise-utilised model
frameworks matter. Black-box garbage in and garbage out has had its day, thankfully.
So what are some of the best, most influential open source financial models you’ve heard or not heard of, preceding and predating Tensorflow, PyTorch and Keras ? Note my biases. I worked at MathWorks (MATLAB), a fundamental part of the quantification of
finance and economics. MATLAB is a proprietary commercial language, but it features openly viewable code and many examples I reference make use of it. Many open source “styles” have followed MATLAB norms, for example R's help and RStudio editor. In Python,
Matplotlib, as its name suggests was massively MATLAB-influenced. Julia features many notable MATLAB fans among its early evangelists and contributors.
Let's start in the world of economics. Dynare – www.dynare.org – has underpinned much macro-economic research in and beyond central banks since the late 1990s, a MATLAB-implemented
equation parser, though calling specialized FORTRAN subroutines. Dynare’s DSGE [Dyamic Stochastic Generalized Equilibrium] compatriot, the object-oriented IRIS Toolbox – https://iris.igpmn.org/ -
has targeted macroeconomic policy scenario modelling, with the initial support of the IMF and now the Global Projection Model Network. The influence of both packages on macroeconomic policy cannot be over-stated. The DSGE community has picked up on new language
Julia, with the Federal Reserve of New York alongside consultancy Liberty Street Economics updating its MATLAB DSGE model (https://github.com/FRBNY-DSGE/DSGE.jl/blob/master/docs/DSGE_Model_Documentation_1002.pdf).
It claims to run faster than its earlier MATLAB equivalent, though some commentators, I paraphrase, suggest the comparison is like unfairly comparing apples to pears.
Another popular library with economists was and is Kevin Murphy’s Bayesian-oriented software repository at https://www.cs.ubc.ca/~murphyk/Software/. Featuring
DAG [Directed Acrylic Graph] networks and HMM [Hidden Markov Models], his routines feature among many hidden stepchildren of data science, well utilised and powerful when you look beyond the hype. Kevin Murphy is now a senior staff scientist at Google. He
and I proudly share similar university alumni, though while I was a young, annoying grad student indulging in hockey and the joys of west coast living at one university, he was driving early computational data science over in the computer science department.
Finally, continuing our econometrics focus, a shout out to James LeSage’s Spatial Econometrics library (http://www.spatial-econometrics.com/), with similar methods
now available in PySAL (https://pysal.readthedocs.io/en/latest/). LeSage also drew on Kevin Sheppard’s now legacy UCSD GARCH (https://www.kevinsheppard.com/UCSD_GARCH)
MATLAB libraries, updated as part of the MFE Toolbox (https://bitbucket.org/kevinsheppard/mfe_toolbox/src/default/). Kevin Sheppard these days
works more in Python, see for example his Python for Econometrics book-related https://www.kevinsheppard.com/Python_for_Econometrics#Code.
In short, economists have worked with openly viewable and open source sharable code since the 1990s and data scientists – and quant finance – have much to thank them for. A significant open source data science legacy belongs to them.
Moving to quant finance, Quantlib https://www.quantlib.org/ features an industry standard set of derivative pricing and calibration routines, written in C++ and callable from
most standard platforms c/o its API.
For algorithmic trading, I highlight the influence of Ernest Chan’s code, mostly affiliated to his book authoring and tutorials on algo trading running since the early 2000s. See https://www.epchan.com/books/ for
examples. His community work - derived from an impressive resume of major proprietary finance institutions - was originally MATLAB-based, but his more recent routines have been Python-oriented, likewise the work of Jev Kuznetsov whose Trading With MATLAB blog
(featuring code and Youtube tutorials) was supplanted by Trading With Python (https://github.com/sjev). Quantopian is now a household name providing an open source backtesting trading
platform https://www.quantopian.com/ helping users get started quickly with community-hosted trading model implementations. With rivals Numerai and WorldQuant (a Millennium Investments
spin-off) pursuing similar, algorithmic trading has been radically democratized, with money made by their hosts in trading the most successful strategies.
For portfolio management and risk, there are few code-sets to rival Attilio Meucci’s code https://uk.mathworks.com/matlabcentral/fileexchange/25010-exercises-in-advanced-risk-and-portfolio-management.
Originally run in MATLAB (I am proud to have been a Meucci champion within MathWorks), Meucci’s ARPM code is increasingly Python and Jupyter-oriented, underpinning ARPM’s (www.arpm.co)
live and online Bootcamps.
Python is now the incumbent in Quantitative Finance, and a major reason behind that rapid adoption is the staggeringly successful Pandas (https://pandas.pydata.org/), a now
standard Python Data Analysis library and a foundation of Pythonic data science. The origins of this fundamentally useful package grew out of leading
hedge fund AQR Asset Management in the mid 2000s, where Pandas lead Wes McKinney worked. Wes McKinney was and remains an inspiration behind the proliferation of OSS in finance, including recently at TwoSigma Investments where the Arrow project (https://pypi.org/project/arrow/)
was in part gestated. The Python ascendancy continues as the financial services industry comes to terms with applying the ecosystem's deep and machine learning libraries.
Finally, a nod to brilliant UK-based Saeed Amen at Cuemacro (https://www.cuemacro.com/ and https://github.com/cuemacro)
who has provided respected financial helper functionality in Python, partly in support of his specialized transaction cost and best execution libraries. Man Investments too are active on the London OSS circuit, releasing their Arctic time-series database/structures
For those many excellent libraries I have missed, my sincere apologies for my biases.
However, whether you are Python, R, MATLAB, Julia or other language user working in finance, you will likely appreciate it is a misnomer that in 2019 finance algo and model-building is proprietary and expensive. It isn’t. Much community open source has underpinned,
democratized and commoditized the modelling revolution. Proprietary attention in financial services has switched from model paradigms and packages to new sources of data, particularly so-called alternative data including the image-derived data of one of my
employers, Geospatial Insight. Alternative data, though, is rarely useful without a framework model destination, applying optimal parameter identification and estimation and even here, open data, open source and code sharing continue to transform and excite.