Devstral 2 from Mistral packs 123B parameters and scores 72.2% on Swaybench, helping teams fix bugs faster and automate ...
Anthropic's new Claude Opus 4.5 model achieved 80.9% on SWE-bench and scored higher than human candidates on a performance ...
The race for best vibe-coding AI model is neck and neck, according to Vals AI. OpenAI is the new king of vibe coding, according to a newly-released benchmark from AI evaluation startup Vals AI. In a ...
Mistral AI has launched Devstral 2 and Vibe CLI, offering a cost-efficient, open-weight alternative for autonomous software ...
Overview: Parallel coding agents dramatically speed up software development, working on simultaneous tasks.
OpenAI has launched ChatGPT-5.2, which is a major breakthrough in the progress of artificial intelligence. The latest version has introduced massive advancements in different tasks, including ...
Anthropic has released Claude Sonnet 4.5, its most advanced coding model to date, featuring major improvements in agentic tasks, long-horizon task performance, and computer use capabilities. The ...
Alibaba Cloud has released Qwen2.5-Coder, a new AI coding assistant that has already become the second most popular demo on Hugging Face Spaces. Early tests suggest its performance rivals GPT-4o, and ...
In a new benchmark named Vibe Code Bench, OpenAI’s GPT-5.1 achieved the highest level of accuracy in completing a series of software engineering tasks, narrowly beating rival Anthropic’s Claude 4.5 ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results