Submit Results¶

Submit your benchmark results for inclusion on the official leaderboard. This section explains the submission workflow, required artifacts, and review process.

Submission Workflow¶

1. Run the benchmark    →  ai4sci-bench run / batch-run
2. Package results      →  Collect output directory
3. Submit for review    →  Open a GitHub issue with result metadata
4. Maintainer review    →  Provenance and reproducibility check
5. Leaderboard entry    →  Results appear on the official leaderboard

Requirements Summary¶

Requirement	Details
Benchmark version	Must target a published task set release
Sandbox mode	`--sandbox task` or `--sandbox os` required
Seed	Fixed seed (default: 42) for reproducibility
Prompt levels	At least B1, B2, B3, B4 on all submitted tasks
Artifacts	Complete output directory including `run_metadata.json`
Provenance	Agent version, model ID, configuration, and CLI version logged

Current Status¶

Early Access

The submission process is currently manual via GitHub issues. Automated submission infrastructure (Hugging Face Space) is planned for a future phase.

How to Submit¶

Complete your benchmark runs with --sandbox task or --sandbox os
Open a GitHub Issue with:
- Agent name and version
- Model name and configuration
- Sandbox mode used
- Link to or upload of run_metadata.json and batch_records/
A maintainer will review provenance and may request additional logs

Next Steps¶

Result Format — detailed artifact specification
Official Review Flow — how submissions are evaluated