AI Coding: New Research Shows Even the Best Models Struggle With Real-World Software Engineering

By / February 25, 2025

New OpenAI research reveals that frontier AI models like Claude 3.5 and GPT-4o solve fewer than half of real-world software engineering tasks from a $1M benchmark.