Blog article
See all stories »

Avoiding the Herd in Overcrowded Alt Data

Fundamental investment managers are mining alternative data, but some worry that the same data sets have been sold to too many people and the strategies may be overcrowded.

Experts at a recent panel discussion on alternative data brought some clarity to how alternative data is categorized and what are the big challenges faced by fundamental asset managers, quants and hedge funds finding tradeable signals building models.

But some of the strategies are no longer as profitable as they were initially.

“There is no amount of alpha that exists forever and ever. Eventually it gets exploited,” said Erez Katz, CEO and co-founder of Lucena Research, an Atlanta-based predictive analytics firm. Lucena provides a platform that extracts actionable signals from data using machine learning technology.

However, it is possible to prolong the lifespan of signals to buy or sell gleaned from alternative data, said Katz.

“The idea is to find data in the context of hard-to-find combinations of multiple factors from multiple data sets that can provide information that is not available to the naked eye. This is what we see are the needs of consumers on the buy side,” said Katz.

With the rise of low-cost index funds putting pressure on active managers, there is demand for data sets that can offer a competitive edge, reported the Financial Times in May 2018.  Asset managers spent a total of $400 million in 2017 on alternative data sets and hiring employees, according to a survey by industry trade body It estimates the total spend was $656 million in 2018 and will jump past $1 billion this year.

Yet, there have been some challenges to adoption for those getting into the alt data space seeking profitable returns.

“There is a need to define the source of information, to understand its value and its structure in the investment and research process around a thesis,” said Jeff Ferro, a long-time investor who was previously head of alternative data strategies at BattleFin.

Ferro said his firm had to run portfolios for outside managers side-by- side for years at a time just to prove the information coming in was of value. “I think people don’t trust the machine learning process behind it and why it’s going to help them,” said Ferro.

Still, there is so much media buzz around alternative data sets and quants hiring data scientists to parse mammoth feeds.  Investment professionals might wonder if it’s too late to enter the alt data space if they are not already consuming satellite images of parking lots and other types of esoteric data.

Ferro said the usage of alternative data is in “the first inning. There’s so much information out there and even if it’s been out a long time, it’s your ability to put a creative spin on it,” he said.

“I don’t think it’s too late,” said Olga Kokareva, head of data sourcing and strategy at Quanstellation, a multi-asset quantitative investment firm, who spoke on the panel.  “People think it’s too late because it’s overcrowded, and often it doesn’t work because everyone’s using it. It will be too late if you don’t use it, and your portfolio manager is not going to be able to compete with everyone else who is using it,” said Kokareva.

Categorizing Alt Data

While there is no strict definition of alternative data, most people describe it as anything that is not considered traditional data, such as fundamental, technical or macro-economic data.

There are three categories of alternative data, said Kokareva:

Data that hasn’t been available before like email receipts; data that existed before and has been used by hedge funds forever, like counting foot traffic in retail stores; and market structure data in which people use order book data to generate alpha.

Some popular types of alt data include social sentiment analysis, credit-card transactions, web-scraping, satellite images and mobile geolocation data.

“The biggest growing group of alternative data is from companies that collect data as part of their business, like UPS, a shipping company, and NCR from point-of-sale credit-card transactions,” said Katz. “There are companies that never planned on selling their data, but there are obviously huge opportunities to expand their channels of distribution and revenue stream by selling their data,” said Katz.

The last category is unstructured data, which has never been used in the context of investment professionals such as social media sentiment, transcriptions of earnings calls, crowd providers and other companies that are trying to digitize unstructured data through natural language processing to create a tradeable signal.

Transforming Raw Data into Information

Barry Star, founder and CEO of Wall Street Horizon, a provider of corporate pre-earnings and event data, said there is a distinction between raw data vs. information vs. knowledge.  “We spend a lot of time cleaning the raw data, manipulating it and turning it into information that is more valuable than raw data,” said Star. “You take that information, you manipulate it and you add value to it and that becomes knowledge,” said Starr. “In our world calendars are data; a change to a calendar [earnings release date] is information,” he added.

In February, Lucena joined forces with Wall Street Horizon to produce combined offerings that extract actionable signals for investment.

Lucena has validated signals from the Wall Street Horizon’s corporate events data, which is now available for assessment and consumption through validation reports, backtest simulations, model portfolios, and smart data feeds, stated the announcement.

On the panel, Katz said Lucena has identified that when a certain number of companies move their earnings date forward within a certain number of days before the earnings date, when the analyst consensus recommendation is below normal, and the pre-cash flow is below normal, those companies are going to underperform for the next quarter.

Star struck the partnership with Lucena because some buy-side clients are only interested in the signal, he said.

Since its founding in 2003, Wall Street Horizon sold its corporate events data to clients who did all the work testing the data themselves.  Other clients are interested in changes in the earnings release dates because they are interested in different information layers.  “People have different needs up and down the spectrum depending on their bench and their customization,” said Star.

As an investor, Ferro used the Lucena platform with the Wall Street Horizon data as a long strategy for three years. He was “using the featured data through the system to understand it and further combine it,” he said.

While it’s possible to identify tradeable signals, there was some debate on whether this is up to the data provider or the portfolio manager.  On one extreme there are data sets that come with no formatting and no cleaning that charge $1 million because quant funds will pay that figure, said Kokareva. At the other end, data vendors are trying to provide a signal to buy or sell.

Kokareva, who is involved in building strategies, said she is skeptical of ready-to-use signals. “The problem is that data providers extract features, they build portfolios, they measure it, they do backtesting, and they come up with performance based on volatility. Then they extract the next feature and go through the same process,” she said.  Then, they market it.  A portfolio manager buys it and finds the strategy is overcrowded and doesn’t work,” she said.   While the data vendor knows the data and can point out the features covered by the data set, it should not tell the client how to use it, she insisted. “It’s up to the fundamental discretionary hedge fund that knows the companies they cover so well to put the data into context,” she said.

Katz said that discretionary portfolio managers can apply their own human judgment to potentially override what the machine recommends or to enhance the level of decision making with the machine. “It’s not going to replace you. It can augment, extend, verify, and validate,” he said. In part, this depends on how much automation the portfolio manager is applying to the final decision to trade.

On the high-frequency trading side, there is no discretion, so the use of the data is very limited because the trade needs to be very fast, he said. In the case of the multi-day, long-term, short-term strategies, “we’re just talking about a signal that has empirical evidence that the data is predictive. We’re not in the business of telling them how to trade,” he said.










Comments: (0)

Ivy Schmerken

Ivy Schmerken

Editorial Director

FlexTrade Systems

Member since

20 Jul 2015


Great Neck

Blog posts




This post is from a series of posts in the group:

Hedge Fund Technology

Community for people who work in and service hedge fund Technology, covering everything front office to operations and investor relations

See all

Now hiring