LLMs display cognitive shortfalls - BIS

When posed with a logical puzzle that demands reasoning about the knowledge of others and about counterfactuals, large language models (LLMs) display a "distinctive and revealing pattern of failure," according to a bulletin from the Bank for International Settlements.

0 05 January 2024 2 comments

Editorial

This content has been selected, created and edited by the Finextra editorial team based upon its relevance and interest to our community.

With ChatGPT capturing the public imagination and central banks around the world exploring the potential applications of LLMs, BIS has been testing their cognitive limits.

To do this, it quizzed GPT-4 with the well-known Cheryl’s birthday logic puzzle, finding that the LLM solved the puzzle flawlessly when presented with the original wording.

As the authors note, GPT-4 will have encountered the puzzle and its solution during its training. However, the model consistently failed when small incidental details - such as the names of the characters or the specific dates - were changed.

This says, the BIS bulletin, suggest a lack of true understanding of the underlying logic.

BIS says that the findings do not detract from the progress in central bank applications of machine learning to data management, macro analysis and regulation.

"Nevertheless, our findings do suggest that caution should be exercised in deploying large language models in contexts that necessitate careful and rigorous economic reasoning.

"The evidence so far is that the current generation of LLMs falls short of the rigour and clarity in reasoning required for the high-stakes analyses needed for central banking applications."

Read the bulletin

Comments: (2)

Vladimir Dimitroff Chairman at Senior Executives Forum

05 January 2024

Still more 'A' than 'I' - but working on it ;)

Report

Ketharaman Swaminathan Founder and CEO at GTM360 Marketing Solutions

08 January 2024

TBH how many human bankers do any "careful and rigorous economic reasoning" these days anyway? I wonder if any bank is planning to use Gen AI / LLM for such activities in the first place. Banks moved them to regulatory mandate and CBS, FD&P, HFT, and other software systems years ago. Whenever I ask my (human so far) RM to explain e.g. a certain TDS entry on my bank statement, his / her stock answer is, "It's according to RBI mandate" or "That's what FLEXCUBE says".

(Disclosure: I'm ex-employee of FLEXCUBE company.)

Report

/regulation & compliance

Editorial

LLMs display cognitive shortfalls - BIS

Editorial

Share

Related Company

Channels

Keywords

Comments: (2)