This post is adapted from a March 2019 blog at www.bcmstrategy2.com.
Artificial Intelligence (AI). Everyone either wants to use it or fret about robots running the world. Whether it’s the “fourth industrial revolution”
under discussion at the World Economic Forum in Davos this year or the “second machine age” popularized by MIT professors recently, the net result is the same: the automation race is on when it comes
to analytics. Potential users and colleagues of our own new technology inevitably ask when and how our start up will start using AI in our new platform.
The answer is: it’s complicated. Why? Because expectations vary widely with respect to what “counts” as AI. Also, responsible deployment of various AI tools requires careful consideration. This is particularly the case when, as at BCMstrategy, Inc.,
new data sets are in play. As discussed below, validating AI outcomes at the outer reach of the AI spectrum is hard enough even when one is using conventional structured data. Validating outcomes when using new data presents unique challenges.
Artificial Intelligence? Or Just Automated Analytical Processes?
The automation spectrum regarding AI is broad, as the
Merriam-Webster definition indicates:
artificial intelligence, noun
1 : a branch of computer science dealing with the simulation of intelligent behavior in computers
2 : the capability of a machine to imitate intelligent human behavior
Note that the definition does not require autonomy or originality; it only requires mimicking and simulating human intelligence.
Viewed from this perspective, the AI spectrum is much broader than popular media implies. It involves applying human logic at a scale and speed that surpasses human capabilities. Therefore, we have all been using rudimentary AI to “outsource” logic for
years, whenever we run a macro within an Excel spreadsheet or rely on autocorrect in word processing.
Insight platforms accelerate analytical automation by sifting through and spotting patterns within much larger data sets and then
generating visualizations in response to specific queries (which themselves can be automated). In addition to enhancing operational efficiencies and permitting people to conduct information triage, dynamic data visualizations deliver “enhanced cognition.”
We connect the dots faster when insight platforms perform the first few steps of the analytical process of collecting, categorizing, and counting correlations within data sets.
From here we enter the frontier. Increasingly, automated processes generate data automatically and/or make it possible to collect new kinds of data for use in analytics.
Most new “alternative data” emanates from the internet of things, where a range of devices (mobile phones, fitbits, mousepads) collect and communicate structured numerical data (e.g., geolocation, steps taken, website clicks) to other machines for automated
analysis. Natural Language Processing (NLP) and its offshoots within sentiment analysis translate unstructured (verbal) data into structured (numerical) data which can then be analyzed using automated analytical processes.
Most new economy companies (BCMstrategy, Inc. included) fall within the insight platform category. We use these processes to deliver superior information discovery and analytics. In our case, we use automated processes to generate comprehensive quantification
of the policy process, delivering powerful – and previously unavailable -- insights regarding momentum and direction in the policy process. These are evolutionary tools that extend but do not materially change the scale and speed of the analytical process.
Even most alternative data could be considered evolutionary. Consider geolocation and fitbit data.
FinTech companies and banks can use additional information about a potential borrower’s habits based on the location of the person’s cellphone in order to identify potential new elements to include in the credit scoring process.
Insurance companies are experimenting with ways to deliver preferential pricing for health insurance that rewards healthy habits (and potentially penalizes unhealthy habits).
Automated processes can generate new signals for this new data that may be useful for financial firms, making those processes “intelligent.” But the data and its potential utility is not per se novel.
Prior (controversial) uses of geolocation data include the zip code. The potential for bias and abuse is so strong regarding geolocation data from zip codes that such “redlining” is prohibited in the United States. Increasingly, reliance on geolocation
and other alternative data is raising serious questions regarding “algorithmic
data privacy and, even
free speech for bots. Similarly, health and life insurance companies require insureds to submit lengthy questionnaires and biological test results during the underwriting process in order to assess the risk presented by an individual and, thus, the appropriate
“Deep learning” or “machine learning” mechanisms that deliver automated analysis are similarly evolutionary because they automate existing reasoning processes using neural networks that mimic processes in the human brain.
This is accomplished through a range of feedback trial-and-error loops that teach a machine how to identify semantic meaning from context or how to to identify an object from a picture. Human supervision (or pre-programmed “correct answers”) guide
the system to appreciate nuance and differences. Pop culture examples involve
pictures of cats and
leopard print sofas. Human verification on websites using ReCAPTCHA every day accelerates the training process for Google’s
computers by requiring humans to identify specific images (e.g., cars, storefronts) in order to gain access to specific website content.
The far reach of the AI frontier creates "adaptive algorithms." These systems use an exit loop in the neural network processing sequence to deliver automatic adaptation and new insights.
Because no audit trail is created, it is impossible to replicate or observe the process by which the pattern identification algorithms decided to adapt automatically.
This “black box” element conjures images of autonomous, independent machines making their own decisions without human oversight. The policy
and philosophy debates regarding this part of AI are only just getting started.
It’s All About the Training Data and Caution
It is crucial to choose the training data well.
Rushing too fast into the AI space using generic models generates a “garbage in/garbage out” risk. The outputs may look pretty, but they can prove to be meaningless or misleading if the AI system has used the wrong data.
Technical challenges increase substantially when the training data involved is new or alternative data. Relevant questions include:
Is it possible to deploy off-the-shelf of open source AI structures designed in one context (e.g., identifying pictures of cats or cars, predicting the next word in a customer relations chatbot) to deliver AI-powered insights in a different context that
uses entirely new data (credit risk or, in our case, policy risk data)?
If system robustness depends on the scale and scope of the training data from the real world, is it even conceptually possible to see the resulting automated analysis as “artificial” even when it delivers intelligent insights?
How much data is enough?
Is your data objective? Bias and normative assumptions embedded into the training data will have an exponential impact, skewing model outputs inappropriately.
How relevant is the historical data? As fellow group member Ron Coburn noted earlier this week on this platform, "More importantly, though, predictions like those made at the Seattle World’s Fair were so often inaccurate because people were able to make
predictions only based on historical data." For public policy predictive analytics, at BCMstrategy, Inc. we prefer to pursue a nowcasting approach as described inour recent blogpost for Interactive Brokers here: https://www.tradersinsight.news/traders-insight/securities/macro/how-to-trade-the-news-rule-8-the-importance-of-nowcasting/.
At BCMstrategy, Inc., we think intensively about training data challenges because we are creating new data. Our patented process generates structured data from the unstructured language used in the policy process. This data has never existed before. We
believe that because the language of policy is highly specialized, it will actually be easier for a machine to identify nuances compared with the technical challenges associated with distinguishing between a picture of a cat and a leopard print sofa. But
it will still take time to assemble a sufficient amount of training data to plug into an AI utility for testing.
So we are sitting out the AI arms race for the moment. We are committed to getting the inputs right as a crucial first step before taking various AI engines for a test drive. Our early adopters gain significant operational efficiencies and strategic market
positioning gains from the first sets of policy risk data regarding policy momentum.
Our start-up will take it step-by-step, delivering a solid insight platform with robust alternative data which will provide a foundation for experimentation with machine learning rather than promise alchemy.
It is exciting to be at the leading edge of this exploration in the policy risk field. We don’t have all the answers yet. It’s going to be a really interesting and exciting journey.