Join the Community

23,237

Expert opinions

43,771

Total members

422

New members (last 30 days)

201

New opinions (last 30 days)

29,052

Total comments

Join Sign in

5 Steps to Building AI-Ready Trading Systems

07 March 2023 Be the first to comment

Steve Wilcockson

Technical Product Marketing

Quantexa

It is rare to see quants talk about trading systems and strategies so publicly. Recently Avi Palley, Lead Quantitative Strategist in eFX Trading at Wells Fargo and Carlos Zendejas, also a one-time Quant and now CEO and co-founder of Digital Q did just that,in the Deep Q Cast series about building AI-Ready trading systems. Their conversation is compelling and passionate, two quants bringing to life the changing roles of quant amidst the excitement of automation, AI and machine learning. Avi is an industry veteran and renowned programmer with considerable experience in system development, architecture and maintenance courtesy of long careers at Galaxy Digital and Merrill Lynch Bank of America.

Avi outlined the evolution of quantitative analytics in finance, tracing it from the early days of structured finance, into areas like derivatives pricing, and the later effect of regulation and collateralization. Today the discipline is far less about, say, options pricing, which is often just a function call away, but much more about their use and business impact, as with any assets.

The role of quants too has changed. He describes three quant types: those that love mathematics, enjoy modeling and go deep into machine learning; quant developers and coders, less concerned with underlying “quant” but knowledgeable of them and great at implementing quant, and business-focused types; communicators who bridge the other types and present commercial perspectives.

As someone who has worked with quants since the discipline’s explosion in the nineties, I recognize all three types Avi describes. When their personalities shine through, they’re amazing people. A “business” type quant fund manager – albeit one with strong maths and code – I once worked with happened to be a synth player for a brilliant goth synth band that supported Depeche Mode in the nineties. I bought his band’s tapes and CDs, and was starstruck when I met him! A developer focussed quant now runs the IT stack at a major university’s library, revelling in his geekdom. When we meet up, we share stories of books and code. And the French quants whatever their fit – more often than not offer brilliant discussions about philosophy, art and cinema.

But I digress. As Avi explains, all three are subject to the criticality of data and I completely agree (I’d also add the model matters too). He and Carlos discussed the importance of data cleanliness, integrity, tooling, consistency and the workflow managing the data lifecycle. For AI systems in particular – data-driven models rather than traditional quant closed-form models – data matters even more. As Avi stated succinctly, approach data from first principles:

“Build data into your trading systems, or at least think about proper data from day one. It’s not something that you decide [on] six months down the road”

When organizations fail in forward thinking, they unwittingly develop Rube Goldberg/Heath Robinson type arrangements that incur significant technical debt. They also fail to address how the data will actually be processed and consumed over time to deliver business value. To mitigate such risks, Avi and Carlos discussed such challenges faced in developing next-generation systems and offered practical approaches to address them, which I’ve aggregated into five steps:

Step 1: Accommodate Change but Retain Control

Change never changes. Accommodate it from the outset. In architectural terms, start generic and abstract and minimize early stage hard coding in areas like schemas. Instead, set a framework that allows schemas to adapt without having to rebuild entire datasets and avoid tight coupling to, say, its visualisation to ensure future flexibility. In practical terms, present well-defined API’s with parameterization that support controlled bespoke development but are resilient to underlying system changes. Well-engineered APIs, though encapsulation, shield uses from data imperfections that may require, for example, very specific aggregations, or catering for known imperfections in the data set. As someone who once constructed a time-stamped but highly inconsistent “alternative” data-set, I recognize that challenge.

Step 2: Deliver Early and Deliver Often

The step speaks for itself but good design should never be at the cost of delivery. Don’t be over-prescriptive as different problems require different approaches. Be sufficiently pragmatic to understand issues arise that may unfound best plans, but always have – and evolve – a process and a framework. As Carlos appositely quotes Mike Tyson, “Everyone’s got a plan until they get punched in the mouth”. View things strategically and tactically (also read commercially), with the strategic vision have an end-point, but take iterative steps which entail delivering value to the business along on the way. Yes, that’s important beyond the data realm too, but from a data software standpoint, it’s great to motivate and build positive feedback loops with DevOps and, if applicable, MLOps processes.

Step 3: Use the Right Tools.

We are awash with exciting new technologies. Many promise the world but may not always deliver – how many natively support time series data for example? There remains the attraction (or is that distraction?) of new approaches and open-source initiatives that advertise well on the developer community sites but may not have supporting technical documentation or other support services to solve problems. Avi wryly notes:

“When you think about something as fundamental as data, you don’t want to be building your trading system on a house of cards.”

He cites, in contrast, the proven nature of time-series databases and analytics engines as "example[s] of technology used by every major bank on Wall Street and virtually all buy-side institutions” where native support for things like asof joining (for aligning across disparate data sets) and window joining (for grouping over time) can be performed instantly without complex parallelizing – speeding up both development time and runtime.

Step 4: …. But Use the Right Tools Optimally

Using familiar tools is important, but using them optimally matters too. One-size-fits-all may be an attractive panacea to reducing both cost and risk in transferring data and models across and within systems, but are they the right tool for the right job? Not many doctors I know use Swiss Army Knives for surgery.

Counterbalance risks and opportunities to deploy the known with the new. Avi notes it would be a mistake to say, for example, “I’m doing 80% of my work in Python, let me also do my data queries in Python” when it is clear that the aggregations mentioned above can be executed much fast and easier in the time-series engine. Continuing the discussion Avi and Carlos have a fascinating discussion about programming language strengths and weaknesses, for example:

“Python is pretty much gold standard for data analysis, machine learning AI, and the amount of productivity that you get by using Python compared to other languages for something like that is great, especially if you’re using JupyterLab….. Your productivity is massive compared to … say Java or C++..... So q,.... really, really good at data aggregation….. asof joins, being able to do SELECT statements and doing them quick……. Striking that balance where you use things to their strengths, and then bridge them together, is what I think is very important.“

A theme I explore in my blogs elsewhere contrast simplicity and complexity in architectures. There’s no right answer. It’s an architect’s choice. I can squish my stack, or I can build an infrastructure that takes the best of many things to deliver a whole greater than the sum of the parts. On the flipside, the former might induce a jack of all trades master of none application or the latter a software spaghetti full of technical debt. However you can get the best of both worlds. Python can drive collaborative model development and analytics pipeline with the time-series engine the blazingly fast data and model queries.

Step 5: Respect Your Data. Don’t Torture it

“If you torture data long enough it will confess to anything“ and its variations get attributed to economist and Nobel prize winner Ronald Coase. Whatever its origin, it has become a denunciation cry for those inclined to the “lies, damned lies and statistics” critique, making the case that the average person has one-point something legs for example. It’s true, but not really true. The webinar finished on a similarly philosophical note:

“When you’re actually in practice, the way you make money is by handling esoteric problems and treating them with integrity from a data and modelling perspective. You can make data say whatever you want it to say, but when it comes to making money, unless your data is actually telling the truth, it’s not going to help you.“

To torture the data (and yourself), follow Heath Robinson. To find truth in your data, take the counsel of practitioners like Avi and advisors like Digital Q to help architect systems, tools and methodology unlock the real truth in your data.

Click here to watch the interview in full.

External

This content is provided by an external author without editing by Finextra. It expresses the views and opinions of the author.

2350

Report

Channels

/devops /markets

Artificial Intelligence and Financial Services

Join group

443 opinions 125 members 05 June 2025

Comments: (0)

Steve Wilcockson

Technical Product Marketing

Quantexa

Member since

28 Feb 2014

Location

Diss / London

More expert opinions

John Bertrand MD at Tec 8 Limited