Submit Results¶
Submit your benchmark results for inclusion on the official leaderboard. This section explains the submission workflow, required artifacts, and review process.
Submission Workflow¶
1. Run the benchmark → ai4sci-bench run / batch-run
2. Package results → Collect output directory
3. Submit for review → Open a GitHub issue with result metadata
4. Maintainer review → Provenance and reproducibility check
5. Leaderboard entry → Results appear on the official leaderboard
Requirements Summary¶
| Requirement | Details |
|---|---|
| Benchmark version | Must target a published task set release |
| Sandbox mode | --sandbox task or --sandbox os required |
| Seed | Fixed seed (default: 42) for reproducibility |
| Prompt levels | At least B1, B2, B3, B4 on all submitted tasks |
| Artifacts | Complete output directory including run_metadata.json |
| Provenance | Agent version, model ID, configuration, and CLI version logged |
Current Status¶
Early Access
The submission process is currently manual via GitHub issues. Automated submission infrastructure (Hugging Face Space) is planned for a future phase.
How to Submit¶
- Complete your benchmark runs with
--sandbox taskor--sandbox os - Open a GitHub Issue with:
- Agent name and version
- Model name and configuration
- Sandbox mode used
- Link to or upload of
run_metadata.jsonandbatch_records/
- A maintainer will review provenance and may request additional logs
Next Steps¶
- Result Format — detailed artifact specification
- Official Review Flow — how submissions are evaluated