noodls browser compatibility check

The security settings of your browser are blocking the execution of scripts.

To use noodls, javascript support must be enabled. Please change your browser's security settings to enable javascript.

If you have changed your browser's security settings, you can click here.

related announcements

News

President of the United States

America 250: Presidential Message on the Anniversary of the Battle of[...]
Dakota State University

DSU students develop website for Hudson Fire Department
Lisa Blunt Rochester

NEWS: Senator Blunt Rochester Discusses the Future of College Sports

Science and Technology

Google LLC

06/04/2026 | Press release | Distributed by Public on 06/04/2026 10:39

Kaggle is making AI benchmark creation effortless

Your browser does not support the audio element.

Listen to article

This content is generated by Google AI. Generative AI is experimental

[[duration]] minutes

Voice Speed

Voice

Speed 0.75X 1X 1.5X 2X

As AI models evolve from simple chatbots into reasoning agents that write code, use tools and solve complex problems, traditional benchmarks are no longer enough. The community needs dynamic, rigorous evaluations - built by the people who use these models in the real-world.

That's why we launched Kaggle Benchmarks. Since then, the global AI community has created more than 10,000 evaluation tasks, creating the trustworthy, transparent public leaderboards that help labs measure and accelerate AI progress.

Today, we are taking the next step by launching local development for Kaggle Benchmarks.

Use Kaggle Benchmarks from your local development environment

Until now, creating evaluation tasks meant working exclusively in Kaggle's web-based notebook editor, instead of developers' preferred stack to build with.

Our new update enables developers to create, validate, push, run and download tasks directly from their local development environments like Antigravity, VSCode, Cursor and coding agents. This update is designed to meet developers where they work, making the journey from idea to evaluation faster and more intuitive.

Build evaluation tasks in natural language with AI coding agents

Local development also unlocks a powerful new workflow: using AI coding agents to write benchmark tasks through the write-kaggle-benchmarks skill. This skill comprises a set of structured instructions that teaches a coding agent how to build tasks using the kaggle-benchmarks SDK and the Kaggle CLI.

To add this skill to your agent, simply ask your agent to:

"Install the write-kaggle-benchmarks skill: https://github.com/Kaggle/kaggle-skills"

Once installed, you can describe an evaluation in plain language and get a working task on Kaggle. For example, you can tell your agent:

Using the write-kaggle-benchmarks skill, build a task that asks the model if "300+140=460 is correct?"

These powerful capabilities are driven by the new commands that we have built for Benchmarks in the Kaggle CLI.

Understand why community-driven evaluations matter

We built Kaggle Benchmarks to democratize trustworthy AI evaluations. We believe that if a capability can be measured, labs will race to improve it. By providing these clear, objective signals, our hope is to empower AI labs to drive model improvements in the areas that matter most.

For AI to truly benefit humanity, evaluations must reflect the full diversity of real-world challenges. We believe this launch is a significant step toward enabling anyone, anywhere, to build the evaluations that will shape the future of AI.

Ready to build? Try Kaggle Benchmarks today.

Get more stories from Google in your inbox. Get more stories from Google in your inbox.

Email address

Your information will be used in accordance with Google's privacy policy.

Done. Just one step more.

Check your inbox to confirm your subscription.

You are already subscribed to our newsletter.

You can also subscribe with a different email address .

POSTED IN:

Developer tools

Google LLC published this content on June 04, 2026, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on June 04, 2026 at 16:40 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]

Back

View original format