Official Implementation for AQA-Bench: An Interactive Benchmark for Evaluating LLMs’ Sequential Reasoning Ability in Algorithmic Environments
This work is partially supported by a gift from Open Philanthropy. We thank the Center for AI Safety, the Microsoft Accelerate Foundation Models Research Program, the OpenAI Researcher Access Program, and the Google Cloud Research Credits Program for supporting our computing needs.