An Israeli startup named Baz has ranked first in a new benchmark evaluating artificial intelligence systems designed for code review, outperforming tools from leading research groups such as OpenAI, Anthropic, Google and Cursor. Baz also placed second in the overall composite score, which measures both precision and recall.
The benchmark, called Code Review Bench, is the first evaluation focused specifically on systems that review code rather than generate it. Unlike earlier coding benchmarks that faced criticism because models were trained to optimize directly for them, this new assessment combines controlled testing with real world behavioral data to better reflect practical value for software developers.

image sourced from original article at 


