Gemini Beats Claude, GPT in Google’s First Android AI Coding Benchmark

Google just published its first Android Bench leaderboard, ranking the AI models that perform best at coding Android apps. Nine models made the list, all of which included tools from Google Gemini, Anthropic Claude, and OpenAI. Not surprisingly, Gemini 3.1 Pro Preview led the benchmark with a 72.4% score, followed by Claude Opus 4.6 and GPT-5.2 Codex. Google created the benchmark to measure how well AI systems solve real Android development problems using tasks drawn from several GitHu