It is rare to see quants talk about trading systems and strategies so publicly. Recently Avi Palley, Lead Quantitative Strategist in eFX Trading at Wells Fargo and Carlos Zendejas, also a one-time Quant and now CEO and co-founder of Digital Q did
just that,in the Deep Q Cast series about building AI-Ready trading systems. Their conversation is compelling and passionate, two quants bringing to life the changing roles of quant amidst the excitement of automation, AI and machine learning. Avi is
an industry veteran and renowned programmer with considerable experience in system development, architecture and maintenance courtesy of long careers at Galaxy Digital and Merrill Lynch Bank of America.
Avi outlined the evolution of quantitative analytics in finance, tracing it from the early days of structured finance, into areas like derivatives pricing, and the later effect of regulation and collateralization. Today the discipline is far less about,
say, options pricing, which is often just a function call away, but much more about their use and business impact, as with any assets.
The role of quants too has changed. He describes three quant types: those that love mathematics, enjoy modeling and go deep into machine learning; quant developers and coders, less concerned with underlying “quant” but knowledgeable of them and great at
implementing quant, and business-focused types; communicators who bridge the other types and present commercial perspectives.
As someone who has worked with quants since the discipline’s explosion in the nineties, I recognize all three types Avi describes. When their personalities shine through, they’re amazing people. A “business” type quant fund manager – albeit one with strong
maths and code – I once worked with happened to be a synth player for a brilliant goth synth band that supported Depeche Mode in the nineties. I bought his band’s tapes and CDs, and was starstruck when I met him! A developer focussed quant now runs the IT
stack at a major university’s library, revelling in his geekdom. When we meet up, we share stories of books and code. And the French quants whatever their fit – more often than not offer brilliant discussions about philosophy, art and cinema.
But I digress. As Avi explains, all three are subject to the criticality of data and I completely agree (I’d also add the model matters too). He and Carlos discussed the importance of data cleanliness, integrity, tooling, consistency and the workflow managing
the data lifecycle. For AI systems in particular – data-driven models rather than traditional quant closed-form models – data matters even more. As Avi stated succinctly, approach data from first principles:
“Build data into your trading systems, or at least think about proper data from day one. It’s not something that you decide [on] six months down the road”
When organizations fail in forward thinking, they unwittingly develop Rube Goldberg/Heath Robinson type arrangements that incur significant technical debt. They also fail to address how the data will actually be processed and consumed over time to deliver
business value. To mitigate such risks, Avi and Carlos discussed such challenges faced in developing next-generation systems and offered practical approaches to address them, which I’ve aggregated into five steps:
Step 1: Accommodate Change but Retain Control
Change never changes. Accommodate it from the outset. In architectural terms, start generic and abstract and minimize early stage hard coding in areas like schemas. Instead, set a framework that allows schemas to adapt without having to rebuild entire datasets
and avoid tight coupling to, say, its visualisation to ensure future flexibility. In practical terms, present well-defined API’s with parameterization that support controlled bespoke development but are resilient to underlying system changes. Well-engineered
APIs, though encapsulation, shield uses from data imperfections that may require, for example, very specific aggregations, or catering for known imperfections in the data set. As someone who once constructed a time-stamped but highly inconsistent “alternative”
data-set, I recognize that challenge.
Step 2: Deliver Early and Deliver Often
The step speaks for itself but good design should never be at the cost of delivery. Don’t be over-prescriptive as different problems require different approaches. Be sufficiently pragmatic to understand issues arise that may unfound best plans, but always
have – and evolve – a process and a framework. As Carlos appositely quotes Mike Tyson, “Everyone’s got a plan until they get punched in the mouth”. View things strategically and tactically (also read commercially), with the strategic vision have an end-point,
but take iterative steps which entail delivering value to the business along on the way. Yes, that’s important beyond the data realm too, but from a data software standpoint, it’s great to motivate and build positive feedback loops with DevOps and, if applicable,
Step 3: Use the Right Tools.
We are awash with exciting new technologies. Many promise the world but may not always deliver – how many natively support time series data for example? There remains the attraction (or is that distraction?) of new approaches and open-source initiatives
that advertise well on the developer community sites but may not have supporting technical documentation or other support services to solve problems. Avi wryly notes:
“When you think about something as fundamental as data, you don’t want to be building your trading system on a house of cards.”
He cites, in contrast, the proven nature of time-series databases and analytics engines as "example[s] of technology used by every major bank on Wall Street and virtually all buy-side institutions” where native support for things like asof joining (for aligning
across disparate data sets) and window joining (for grouping over time) can be performed instantly without complex parallelizing – speeding up both development time and runtime.
Step 4: …. But Use the Right Tools Optimally
Using familiar tools is important, but using them optimally matters too. One-size-fits-all may be an attractive panacea to reducing both cost and risk in transferring data and models across and within systems, but are they the right tool for the right job?
Not many doctors I know use Swiss Army Knives for surgery.
Counterbalance risks and opportunities to deploy the known with the new. Avi notes it would be a mistake to say, for example, “I’m doing 80% of my work in Python, let me also do my data queries in Python” when it is clear that the aggregations mentioned
above can be executed much fast and easier in the time-series engine. Continuing the discussion Avi and Carlos have a fascinating discussion about programming language strengths and weaknesses, for example:
“Python is pretty much gold standard for data analysis, machine learning AI, and the amount of productivity that you get by using Python compared to other languages for something like that is great, especially if you’re using JupyterLab….. Your productivity
is massive compared to … say Java or C++..... So q,.... really, really good at data aggregation….. asof joins, being able to do SELECT statements and doing them quick……. Striking that balance where you use things to their strengths, and then bridge them together,
is what I think is very important.“
A theme I explore in my blogs elsewhere contrast simplicity and complexity in architectures. There’s
no right answer. It’s an architect’s choice. I can squish my stack, or I can build an infrastructure that takes the best of many things to deliver a whole greater than the sum of the parts. On the flipside, the former might induce a jack of all trades master
of none application or the latter a software spaghetti full of technical debt. However you can get the best of both worlds. Python can drive collaborative model development and analytics
pipeline with the time-series engine the blazingly fast data and model queries.
Step 5: Respect Your Data. Don’t Torture it
“If you torture data long enough it will confess to anything“ and its variations get attributed to economist and Nobel prize winner Ronald Coase. Whatever its origin, it has become a denunciation cry for those inclined to the “lies, damned lies and statistics”
critique, making the case that the average person has one-point something legs for example. It’s true, but not really true. The webinar finished on a similarly philosophical note:
“When you’re actually in practice, the way you make money is by handling esoteric problems and treating them with integrity from a data and modelling perspective. You can make data say whatever you want it to say, but when it comes to making money, unless
your data is actually telling the truth, it’s not going to help you.“
To torture the data (and yourself), follow Heath Robinson. To find truth in your data, take the counsel of practitioners like Avi and advisors like Digital Q to help architect systems, tools and methodology unlock the real truth in your data.
Click here to watch the interview in full.