Oxford study finds AI benchmarks often exaggerate model performance
A new study reveals that methodologies for evaluating AI systems often overstate performance and lack scientific rigor, raising questions about many benchmark results.
Researchers at the Oxford Intern...