Paper & Citation¶

Status: The benchmark paper is in preparation. This page will be updated with the arXiv preprint link and full citation once published.

Benchmark Overview¶


Title	ASI-Bench: A Project-level Benchmark for Evaluating LLM Agents on AI for Science
Authors	(In preparation)
Venue	arXiv preprint (forthcoming)
Repository	github.com/zjw49246/Agent-AI4Sci-Bench
Website	zjw49246.github.io/Agent-AI4Sci-Bench

Provisional Citation¶

If you use this benchmark before the paper is published, please cite the repository:

@misc{agentai4scibench2026,
  title        = {ASI-Bench: A Project-level Benchmark for
                  Evaluating LLM Agents on AI for Science},
  author       = {ASI-Bench Contributors},
  year         = {2026},
  howpublished = {\url{https://github.com/zjw49246/Agent-AI4Sci-Bench}},
  note         = {Paper in preparation. Check the repository for updates.}
}

Citation will be updated

Once the arXiv preprint is published, this block will be replaced with the full BibTeX entry including author list, eprint ID, and venue information.

Key Contributions¶

The paper presents:

Project-level AI4Sci tasks — compact but realistic computational science workflows requiring multi-step reasoning, code generation, and artifact production
B1-B4 autonomy ladder — a systematic way to measure how much scaffolding an agent needs to solve the same scientific problem
Auditable evaluation — structured artifacts and provenance enabling reproducible, reviewed leaderboard entries
Multi-domain coverage — 8 scientific domains with 42+ public tasks spanning math, physics, chemistry, astronomy, biology, materials, earth science, and engineering

Paper & Citation¶

Benchmark Overview¶

Provisional Citation¶

Key Contributions¶

Related Links¶