There’s a really interesting research paper recently published by Apple’s engineers looking at how well LLMs (Large Language Models), the heart of AI systems like ChatGPT, and asking about their ability to deliver mathematical reasoning. They conclude that they are not ready for prime time right now.
To illustrate the issue, TechCrunch gave this example:
Let’s say I asked you to solve a simple math problem like this one:
Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday. How many kiwis does Oliver have?
Obviously, the answer is 44 + 58 + (44 * 2) = 190. Though large language models (LLMs) are actually spotty on arithmetic, they can pretty reliably solve something like this. But what if I threw in a little random extra info, like this:
Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?
It’s the same math problem, right? And of course even a grade schooler would know that even a small kiwi is still a kiwi. But as it turns out, this extra data point confuses even state-of-the-art LLMs. Here’s GPT-o1-mini’s take:
[O]n Sunday, 5 of these kiwis were smaller than average. We need to subtract them from the Sunday total: 88 (Sunday’s kiwis) – 5 (smaller kiwis) = 83 kiwis
This is just a simple example out of hundreds of questions that the researchers lightly modified, but nearly all of which led to enormous drops in success rates for the models attempting them.
The conclusion of the Apple researchers report is that:
"Current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data. When we add a single clause that appears relevant to the question, we observe significant performance drops across all models. Importantly, we demonstrate that LLMs struggle even when provided with multiple examples of the same question or examples containing similar irrelevant information. This suggests deeper issues in their reasoning processes that cannot be easily mitigated through few-shot learning or fine-tuning."
Interesting … and yet, as Ron Shevlin notes on Forbes, an awful lot of consumers are relying on GPT and LLMs for financial advice. Pointing to a study by Motley Fool, they found that 54% of Americans are using ChatGPT for finance recommendations. That’s a bit worrying if the AI/ML tools can’t even work out that small kiwi is still a kiwi.
Ron concludes that banks and credit unions shouldn’t rely on AI to provide financial advice and guidance right now … and maybe not for another five to ten years.
I would agree, but it’s all about moderation. I would use AI/ML to research decisions, but would need to verify those decisions with other advisors before jumping in. For example, asking ChatGPT the question: should I invest in cryptocurrency, here’s what comes back:
Investing in cryptocurrency can be enticing due to its potential for high returns, but it also comes with significant risks and volatility. Here are a few things to consider:
- Research: Understand the different cryptocurrencies and their use cases. Not all are created equal.
- Volatility: Prices can fluctuate wildly in a short time, which can lead to substantial gains or losses.
- Diversification: Don't put all your money into one investment. Consider spreading it across different assets.
- Regulation: The regulatory landscape is constantly changing, which can impact the market.
- Long-term vs. Short-term: Determine your investment strategy. Are you looking for a quick profit or a long-term hold?
- Risk Tolerance: Assess how much risk you’re willing to take and invest accordingly.
- Security: Ensure you understand how to store your assets safely to protect against theft or loss.
If you're new to investing, it might be worth consulting a financial advisor.
Seems fair enough to me.
Chris M Skinner
Chris Skinner is best known as an independent commentator on the financial markets through his blog, TheFinanser.com, as author of the bestselling book Digital Bank, and Chair of the European networking forum the Financial Services Club. He has been voted one of the most influential people in banking by The Financial Brand (as well as one of the best blogs), a FinTech Titan (Next Bank), one of the Fintech Leaders you need to follow (City AM, Deluxe and Jax Finance), as well as one of the Top 40 most influential people in financial technology by the Wall Street Journal's Financial News. To learn more click here...