Benchmark Human Time Entry

With AI models clobbering every benchmark, it's time for human evaluation

Artificial intelligence has traditionally advanced through automatic accuracy tests in tasks meant to approximate human knowledge. Carefully crafted benchmark tests such as The General Language ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

With AI models clobbering every benchmark, it's time for human evaluation

Trending now