This repository has been archived by the owner on Jun 9, 2024. It is now read-only.
Releases: Significant-Gravitas/Auto-GPT-Benchmarks
Releases · Significant-Gravitas/Auto-GPT-Benchmarks
v0.0.9
What's Changed
- Remove skill tree sync by @merwanehamadi in #308
- Enhanced Test Report Directory Naming and Handling by @Swiftyos in #312
- Fixing paths that were preventing artifacts from being copied to workspace by @lc0rp in #311
- Add endpoints to power dev tool by @merwanehamadi in #310
- Remove submodule by @merwanehamadi in #314
- Fix linters and chrome selenium integration by @merwanehamadi in #313
- Remove colons in timestamp by @merwanehamadi in #315
- Remove build a nuke challenge by @merwanehamadi in #316
- Only push to gdrive correct timestamps by @merwanehamadi in #318
- Fix linter 2 by @merwanehamadi in #319
- Update pyproject.toml by @merwanehamadi in #320
Full Changelog: v0.0.8...v0.0.9
v0.0.8
What's Changed
- Fix all tests skipped by @merwanehamadi in #296
- Increase timeout by @merwanehamadi in #297
- Update .env.example by @westonwillingham in #298
- 0.0.8 by @merwanehamadi in #299
- Add safety challenge by @merwanehamadi in #300
- Fix agent protocol test by @merwanehamadi in #301
- Fix linter by @merwanehamadi in #302
- chore: polygpt update to include gpt4 by @rihp in #303
- Fix eval by @merwanehamadi in #304
- fix eval by @merwanehamadi in #305
- new frontend connections by @SilenNaihin in #306
- init backend, fix frontend module by @SilenNaihin in #307
New Contributors
- @westonwillingham made their first contribution in #298
Full Changelog: v0.0.7...v0.0.8
v0.0.7
What's Changed
- Update beebot by @erik-megarad in #288
- Sync skill tree to a versioned website by @merwanehamadi in #289
- If regression tests empty continue by @merwanehamadi in #290
- Remember goal loss by @merwanehamadi in #291
- No need to push skill tree twice by @merwanehamadi in #292
- Use index.html instead of dependencies.html by @merwanehamadi in #293
- Fix all tests skipped by @merwanehamadi in #294
- Release 0.0.7 by @merwanehamadi in #295
Full Changelog: v0.0.6...v0.0.7
v0.0.6
What's Changed
- Removed accidentally added reports by @nerfZael in #283
- Implement the 'explore' mode by @merwanehamadi in #284
- Add more fields to gdrive by @merwanehamadi in #285
- Cleanup skill tree by @merwanehamadi in #287
- Use agent protocol by @jakubno in #278
New Contributors
Full Changelog: v0.0.5...v0.0.6
v0.0.5
What's Changed
- PolyGPT Benchmarks and Submodule Update by @rihp in #273
- Update beebot by @erik-megarad in #281
- Remove baserun because api key issue by @merwanehamadi in #282
Full Changelog: v0.0.4...v0.0.5
v0.0.4
What's Changed
- Fix "attempted" metric being incorrect by @merwanehamadi in #251
- Fix more attempted metrics not working by @merwanehamadi in #252
- Add more coding challenge by @merwanehamadi in #254
- Add polygpt by @merwanehamadi in #255
- Add polygpt to ci by @merwanehamadi in #256
- Add agent protocol by @merwanehamadi in #258
- Add agent protocol interface test by @merwanehamadi in #259
- Add all agent protocol tests by @merwanehamadi in #260
- Remove space challenges by @merwanehamadi in #262
- Helicone Lock Manager fix by @merwanehamadi in #263
- Remove graphql logs by @merwanehamadi in #264
- remove pytest-depends, rerouting functions by @SilenNaihin in #250
- Fix test write file by @merwanehamadi in #266
- Add product advisor tests by @merwanehamadi in #267
- Kill all subprocesses by @erik-megarad in #265
- Feat: --cutoff and "keep_workspace_files" options by @lc0rp in #261
- Update pr template by @merwanehamadi in #268
- AUTO-25: Add the ability to run multiple categories and to skip categories by @Swiftyos in #270
- Add web app creation challenge by @merwanehamadi in #272
- Integrate with baserun by @merwanehamadi in #274
- Integrate baserun by @merwanehamadi in #275
- Put back mini agi to original state by @merwanehamadi in #276
- Fix send to gdrive by @merwanehamadi in #277
- See the task when clicking in the skill tree by @merwanehamadi in #279
- Release 0.0.4 by @merwanehamadi in #280
New Contributors
Full Changelog: v0.0.3...v0.0.4
v0.0.3
What's Changed
- safety challenges, adaptability challenges, suite same_task by @SilenNaihin in #177
- Beat more challenges in Auto-GPT by @merwanehamadi in #187
- Uninstall agbenchmark then reinstall by @merwanehamadi in #188
- Fix helicone MITM by @merwanehamadi in #189
- Add api keys by @merwanehamadi in #190
- hotfix reports by @SilenNaihin in #191
- Update Scores Benchmark by @merwanehamadi in #192
- fix suite dependencies by @SilenNaihin in #194
- Add safety suite by @merwanehamadi in #196
- report # bug, adding submodule challenges by @SilenNaihin in #193
- Add llm eval by @merwanehamadi in #197
- ci update by @SilenNaihin in #198
- Add helicone dynamic headers by @merwanehamadi in #199
- Add dynamic headers using environment variables by @merwanehamadi in #200
- added new script to fix dynamic headers by @chitalian in #202
- Delete reports by @merwanehamadi in #201
- Use beebot autopackai by @merwanehamadi in #203
- Benchmark all test by @merwanehamadi in #204
- Fix tests not being run by @merwanehamadi in #207
- Retry push until successful by @merwanehamadi in #208
- Advanced LLM Evaluation Implementation by @SilenNaihin in #205
- returning scores by @SilenNaihin in #210
- Update submodules by @merwanehamadi in #212
- Use Auto-GPT master by @merwanehamadi in #213
- Fix export to gdrive by @merwanehamadi in #214
- Add timeout to agbenchmark by @merwanehamadi in #215
- Add timeout that allows teardown by @merwanehamadi in #216
- Delete incorrect report by @merwanehamadi in #217
- Feature: Visualize Test Results by @SilenNaihin in #211
- Fix timeout not working by @merwanehamadi in #218
- Update submodule by @merwanehamadi in #219
- Get helicone costs by @merwanehamadi in #220
- working bar and radar charts by @SilenNaihin in #221
- Fix f-string get_data_from_helicone.py by @chitalian in #223
- Fix BeeBot link by @MrBrain295 in #224
- Fix send to gdrive and tracking the wrong challenge name by @merwanehamadi in #225
- Refactoring for TDD by @SilenNaihin in #222
- Fix costs helicone by @merwanehamadi in #226
- Fix reports by @merwanehamadi in #227
- Return none as fallback Helicone by @merwanehamadi in #228
- Only run mini-agi on push and PR by @merwanehamadi in #230
- Reverse skip based on agent by @merwanehamadi in #231
- Only run mini-agi on tests by @merwanehamadi in #232
- Fix reports and add commit sha by @merwanehamadi in #233
- Send commit sha and cost to gdrive by @merwanehamadi in #234
- Remove high costs by @merwanehamadi in #235
- Remove mock reports by @merwanehamadi in #236
- Remove mock reports by @merwanehamadi in #237
- Update beebot and Auto-GPT by @merwanehamadi in #238
- Update autogpt back to where it was by @merwanehamadi in #239
- Update python-dotenv by @erik-megarad in #240
- Update Auto-GPT and allow 1 specific agent to be run by @merwanehamadi in #241
- Add attempted metrics by @merwanehamadi in #244
- Correct agent and benchmark commit sha by @merwanehamadi in #245
- fix-linter by @merwanehamadi in #246
- Fix typing by @merwanehamadi in #247
- Add Test Suite to gdrive by @merwanehamadi in #248
- Release 0.0.3 by @merwanehamadi in #249
New Contributors
- @chitalian made their first contribution in #202
- @MrBrain295 made their first contribution in #224
Full Changelog: v0.0.2...v0.0.3
v0.0.2
What's Changed
Full Changelog: v0.0.1...v0.0.2
v0.0.1
What's Changed
- First commit for AutoGPT Benchmarks by @dschonholtz in #1
- Typo in README.md by @ambujpawar in #2
- Remove the submodule, reference OpenAI directly rather than running it on the command line, fix logging by @dschonholtz in #16
- Update README.md by @dschonholtz in #17
- Graphs for evals by @rihp in #20
- windows docs make workspace if not there by @dschonholtz in #25
- EvalNames with dates for the eval run filename and compatibility with 0.3.0 by @dschonholtz in #26
- init first challenge template by @ScarletPan in #34
- start fixtures, types, challenge creation, mock run (stable by @SilenNaihin in #37
- Add automatic regression markers by @SilenNaihin in #38
- MockManager, mock_func in data.json by @SilenNaihin in #39
- addition of basic challenges, easier challenge creation, --mock flag, adding mini-agi by @SilenNaihin in #40
- Update README.md by @SilenNaihin in #41
- adding hook to integrate agnostically by @SilenNaihin in #42
- Integrate one challenge to auto gpt by @merwanehamadi in #44
- Add static linters ci by @merwanehamadi in #45
- Run regression tests on push to master and stable by @merwanehamadi in #46
- Integrate with gpt engineer by @merwanehamadi in #47
- Integrate smol developer with agbenchmark by @merwanehamadi in #48
- Explain how to benchmark new agents by @merwanehamadi in #49
- local runs, home_path config, submodule miniagi by @SilenNaihin in #50
- Add retrieval challenge test + run tests on CI pipeline by @merwanehamadi in #51
- Add pr template by @merwanehamadi in #52
- Add information retrieval 3 by @merwanehamadi in #54
- Change test dependencies by @merwanehamadi in #55
- dynamic workspace path by @SilenNaihin in #56
- Add basic memory challenge by @merwanehamadi in #57
- Rename '--reg' flag to '--maintain' by @merwanehamadi in #58
- Add 'Remember multiple ids' memory challenge by @merwanehamadi in #59
- added caching based on file key by @SilenNaihin in #62
- Add 'remember ids with noise' challenge by @merwanehamadi in #61
- Add 'remember phrases with noise' challenge by @merwanehamadi in #63
- fix home_path, local mini-agi run works by @SilenNaihin in #64
- Add 'Debug simple typo with guidance' challenge by @merwanehamadi in #65
- Add "Debug code without guidance" challenge by @merwanehamadi in #66
- Get rid of get file path by using the data.json convention to store the challenge information by @merwanehamadi in #67
- Print out all of stdout on each process poll. by @erik-megarad in #69
- Add .txt to memory challenges by @merwanehamadi in #70
- Fix memory challenge 2 by @merwanehamadi in #71
- Use artifacts out instead of python code by @merwanehamadi in #72
- i/o workspace, adding superagi by @SilenNaihin in #60
- fixing the incorrect addition of superagi by @SilenNaihin in #73
- quality of life improvements & fixes by @SilenNaihin in #75
- Fix debug code challenge by @merwanehamadi in #76
- Add gpt engineer to ci by @merwanehamadi in #78
- just json, no test files by @SilenNaihin in #77
- Combine all agents into one ci.yml by @merwanehamadi in #79
- adding search interface challenge and cleaning repo by @SilenNaihin in #80
- Add Helicone by @merwanehamadi in #81
- Add "Simple web server" challenge by @merwanehamadi in #74
- added --test, consolidate files, reports working by @SilenNaihin in #83
- Fix tests ci by @merwanehamadi in #82
- All Agents log to helicone automatically by @merwanehamadi in #85
- Fix Auto-GPT integration by adding python module as entrypoint by @merwanehamadi in #86
- Fix Auto-GPT looping forever by @merwanehamadi in #87
- Add custom properties to Helicone by @merwanehamadi in #91
- Enable cache again by @merwanehamadi in #92
- fixing backslashes, adding basic metrics by @SilenNaihin in #89
- Fix Smol developer and gpt engineer by @merwanehamadi in #93
- Remove dependencies cache by @merwanehamadi in #94
- Remove dependencies if a specific test is asked by the user by @merwanehamadi in #95
- Update submodules and upload artifacts by @merwanehamadi in #97
- Add basic code generation challenge by @merwanehamadi in #98
- Replace hidden files with custom python by @merwanehamadi in #99
- Start showing benchmark results by @merwanehamadi in #100
- Show Auto-GPT results by @merwanehamadi in #102
- Display smol-developer-results by @merwanehamadi in #103
- Display results per category by @merwanehamadi in #104
- Update auto gpt to current version of master by @merwanehamadi in #105
- Update Auto-GPT score by @merwanehamadi in #106
- Clean up workspace between each test by @erik-megarad in #109
- Add three sum challenge by @merwanehamadi in #108
- Fix ci by @merwanehamadi in #110
- Remove cache true on pr by @merwanehamadi in #111
- Dynamic cutoff and other quality of life by @SilenNaihin in #101
- Allow change location of reports by @merwanehamadi in #115
- Fix cutoff errors by @merwanehamadi in #116
- Fix pipes issue by @merwanehamadi in #117
- Update reports when pushing to master by @merwanehamadi in https://github.com/Significant-Gravita...