A recent research paper [*] offers an interesting perspective on the challenges of achieving AGI, given the limitations of current LLMs.
One of the key barriers to AGI is the difficulty of imbuing machines with deductive reasoning skills. Humans can use new information to draw logical conclusions, a capability that seems far out of reach for current LLMs. Despite their impressive capabilities, the study suggests that LLMs lack true understanding and deductive reasoning abilities and achieving AGI might require a paradigm shift beyond current LLM approaches.
Below is a simple example from the article illustrating how premise order affects LLM performance on logical reasoning problems. I encourage you to explore the various examples presented in the paper.
- If A then B.
- If B then C.
- A is True.
The conclusion that can be derived is: C is True.
This conclusion can be reached regardless of the order in which the premises (1-3) are stated. For humans, reordering the premises does not drastically affect our ability to do this straightforward “if-then” reasoning and derive a conclusion.
However, the researchers found that for large language models like GPT-4, the order of the premises significantly impacts their performance. Specifically:
- Presenting the premises in the order “If A then B”, “If B then C”, “A is True” (which matches the steps to derive the conclusion) achieves high accuracy.
- But if the premises are shuffled to a different order like “A is True”, “If B then C”, “If A then B”, the same LLM’s accuracy drops dramatically by over 30% in some cases.
This illustrates that despite being highly capable at many reasoning tasks, current LLMs like GPT-4 exhibit brittleness when the premises are not presented in an order that matches how the reasoning steps should be performed. The researchers argue this “premise order effect” exposes a fundamental shortcoming of today’s language models as general reasoning engines.
What comes naturally to humans often presents a challenge for artificial intelligence, although numerous instances exist where the opposite holds true.