Replies: 1 comment
-
GREAT IDEA!!! We have this under works right now. Keep your eyes peeled!! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
openai_evals , mistral_evals, etc, and datasets on hugging face, all seem to be focused on way simpler usecases then here.
Different application , just to illustrate other testing approach is : benchmarking , e.g. https://aider.chat/blog/ we see many analysis during different benchmarks.
I can imagine that with Skyvern it would be challanging, but maybe interesting to create template of repository with few example tests and benchmarks (and/or datasets on hugging face , as we see in mistral_evals their github repo seems fetching hugging face datasets).
Assuming that those maybe big, then may make sense if different Skyvern users would be forking and adding tests on their forks, and then Skyvern team can later run all of then on benchmark?
Just an idea, I know it's a lot of work , but collecting evals and datasets for skyvern sounds very interesting!
Maybe some UI changes, e.g. allowing users to share trace with Skyvern team , either from app.skyvern.com run or local run
as tar file?
Or maybe UI buttons allowing to label steps/actions in fast convenient way (maybe even navigate with keyboard shortcuts and annotate with keyboard shortcuts), so user could that way get copy of annotated data for themselves and share with the team if wanted of course?
Beta Was this translation helpful? Give feedback.
All reactions