LLMs and Complex Reasoning Problems

Formal Reasoning Meets LLMs: Toward AI for Mathematics and Verification

A marriage of formal methods and LLMs seeks to harness the strengths of both.

Achieving >97% on GSM8K: Deeply understanding the problems makes LLMs better solvers for math word problems

Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks.

ZDNet

Think AI can solve all your business problems? Apple's new study shows otherwise

Back in engineering school, I had a professor who used to glory in the misleading assignment. He would ask questions containing elements of dubious relevance to the topic at hand in the hopes that it ...

VentureBeat

Researchers find you don’t need a ton of data to train LLMs for reasoning tasks

Large language models (LLMs) can learn complex reasoning tasks without relying on large datasets, according to a new study by researchers at Shanghai Jiao Tong University. Their findings show that ...

Hosted on MSN

Enabling small language models to solve complex reasoning tasks

As language models (LMs) improve at tasks like image generation, trivia questions, and simple math, you might think that human-like reasoning is around the corner. In reality, they still trail us by a ...

EurekAlert!

Study could lead to LLMs that are better at complex reasoning

CAMBRIDGE, MA – For all their impressive capabilities, large language models (LLMs) often fall short when given challenging new tasks that require complex reasoning skills. While an accounting firm’s ...

6don MSN

Scientists Found AI’s Fatal Flaw—The Most Advanced Models Are Failing Basic Logic Tests

Identifying vulnerabilities is good for public safety, industry, and the scientists making these models.

ZDNet

Will AI think like humans? We're not even close - and we're asking the wrong question

Artificial intelligence may have impressive inferencing powers, but don't count on it to have anything close to human reasoning powers anytime soon. The march to so-called artificial general ...

VentureBeat

Self-invoking code benchmarks help you decide which LLMs to use for your programming tasks

As large language models (LLMs) continue to improve at coding, the benchmarks used to evaluate their performance are steadily becoming less useful. That's because though many LLMs have similar high ...

Communications of the ACM

How LLMs Make Sense of Time

Although chatbots such as ChatGPT, which are powered by large language models (LLMs), have some sense of time, it is conceptualized in a completely different way. As we increasingly interact with them ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results