News¶
What This Page Is For¶
- benchmark launch announcements
- new task or domain additions
- official leaderboard refreshes
- public release notes for scoring, visibility, or website changes
Current Update Log¶
2026-04-22 — Public Website Scaffold Established¶
The benchmark now has a real MkDocs-based web/ scaffold with:
- homepage
- overview
- task catalog
- leaderboard section
- getting-started pages
- FAQ
- paper and news pages
This moved the website from planning documents into an actual buildable public site structure.
2026-04-22 — Public Task Catalog Limited to test Tasks¶
The generated public task catalog was tightened so the website no longer mirrors the full internal task tree. The public website now shows only the test subset.
Current public task and domain counts are generated from the latest repo metadata rather than maintained by hand.
2026-04-22 — Public Task Summaries Sanitized¶
The task catalog exporter now derives safer public summaries automatically and avoids leaking template-heavy prompt text such as unresolved {{ ... }} placeholders.
This makes the public task cards and catalog entries read more like benchmark summaries and less like raw prompt exports.
2026-04-22 — Homepage, Catalog, and Leaderboard Became Data-Driven¶
The site now renders generated JSON directly for:
- homepage stats
- featured public tasks
- task catalog filtering/search
- leaderboard rendering and verified review snapshot handling
This means the public site is now tied to generated benchmark metadata instead of relying entirely on hardcoded page text.
What Should Appear Here Next¶
Likely future entries:
- paper release
- first official baseline leaderboard snapshot
- public benchmark release notes
- new domain additions
- changes to public visibility or submission policy