Programming Language Benchmarks

Google Unveils Gemini 3 Pro with Benchmark-Breaking Performance

Google released Gemini 3 Pro today, marking its most advanced AI model yet with record-breaking benchmarks and a new agentic ...

Baidu unveils proprietary ERNIE 5 beating GPT-5 performance on charts, document understanding and more

While Baidu did not release full benchmark details or raw scores publicly, its performance positioning suggests a deliberate ...

Hackaday

Hackaday Podcast Episode 340: The Best Programming Language, Space Surgery, And Hacking Two 3D Printers Into One

Elliot Williams and Al Williams got together to share their favorite hacks of the week with you. If you listen in, you’ll hear exciting news about the upcoming SuperCon and the rare occurrence of Al ...

Hackaday

Ask Hackaday: What’s The Top Programming Language Of 2025

We did an informal poll around the Hackaday bunker and decided that, for most of us, our favorite programming language is solder. However, [Stephen Cass] over at IEEE Spectrum released their annual ...

blockchain

AI Achieves Perfect Score on 2025 ICPC Programming Competition: Latest Reasoning System Sets New Benchmark

According to Greg Brockman (@gdb), OpenAI's latest reasoning system has achieved a perfect score on the 2025 ICPC programming competition, as confirmed by Mostafa ...

TMCnet

MITRE and FAA Introduce Novel Aerospace Large Language Model Evaluation Benchmark

The Federal Aviation Administration (FAA) and MITRE are introducing a new benchmark to enable the evaluation and assessment of large language models (LLMs) for aerospace tasks. Given the ...

Slator

Stanford and UC Santa Cruz Launch Benchmark for Audio-Language Models

A team from Stanford University and UC Santa Cruz has introduced AHELM, a new benchmark designed to evaluate audio-language models (ALMs) across a wide range of capabilities. ALMs are multimodal ...

GitHub

Elfsong/Awesome-Code-Benchmark

Software Development Life Cycle Perspective A Survey of Benchmarks for Code Large Language Models and Agents from Xi’an Jiaotong University HumanEval Evaluating Large Language Models Trained on Code ...

GitHub

silent-sixth-cargo/languages

This document contains performance benchmarks for a compute-heavy task across multiple programming languages. The benchmark performs the same mathematical computation (matrix operations, factorial ...

Slator

German Gov-Backed AI Benchmark Tracks Large Language Models in 200 Languages

A new multilingual AI benchmarking initiative backed by the German Government aims to advance equitable access to language technologies by highlighting where today’s large language models (LLMs) ...

TechCrunch

AI coding tools are shifting to a surprising place: The terminal

For years, code-editing tools like Cursor, Windsurf, and GitHub’s Copilot have been the standard for AI-powered software development. But as agentic AI grows more powerful and vibe coding takes off, a ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results