evaluation

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

Updated Nov 9, 2024
TypeScript

zzw922cn / Automatic_Speech_Recognition

Star

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

audio deep-learning tensorflow paper end-to-end evaluation cnn lstm speech-recognition rnn automatic-speech-recognition feature-vector data-preprocessing phonemes timit-dataset layer-normalization rnn-encoder-decoder chinese-speech-recognition

Updated Mar 24, 2023
Python

Knetic / govaluate

Star

Arbitrary expression evaluation for golang

go parsing evaluation expression

Updated May 31, 2024
Go

open-compass / opencompass

Star

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

benchmark evaluation openai llm chatgpt large-language-model llama2 llama3

Updated Nov 8, 2024
Python

xinshuoweng / AB3DMOT

Star

(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"

tracking machine-learning real-time computer-vision robotics evaluation evaluation-metrics multi-object-tracking kitti 3d-tracking 3d-multi-object-tracking 2d-mot-evaluation 3d-mot 3d-multi kitti-3d

Updated Apr 3, 2024
Python

TCExam is a CBA (Computer-Based Assessment) system (e-exam, CBT - Computer Based Testing) for universities, schools and companies, that enables educators and trainers to author, schedule, deliver, and report on surveys, quizzes, tests and exams.

testing school university evaluation exam cba essay computer-based-assessment cbt multiple-choice mcsa computer-based-testing e-exam tcexam mcma

Updated Apr 20, 2024
PHP

promptfoo / promptfoo

Star

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

testing ci evaluation ci-cd pentesting cicd vulnerability-scanners prompts evaluation-framework red-teaming rag llm prompt-engineering llmops prompt-testing llm-eval llm-evaluation llm-evaluation-framework

Updated Nov 9, 2024
TypeScript

ContinualAI / avalanche

Star

Avalanche: an End-to-End Library for Continual Learning based on PyTorch.

training library framework deep-learning metrics evaluation pytorch benchmarks strategies lifelong-learning continual-learning continualai

Updated Oct 29, 2024
Python

google / fuzzbench

Star

FuzzBench - Fuzzer benchmarking as a service.

security benchmarking evaluation fuzzing benchmark-framework

Updated Oct 21, 2024
Python

huggingface / evaluate

Star

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

machine-learning evaluation

Updated Sep 17, 2024
Python

sdiehl / write-you-a-haskell

Star

Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)

compiler functional-programming book lambda-calculus evaluation type-theory type pdf-book type-checking haskel type-system functional-language hindley-milner type-inference intermediate-representation

Updated Jan 11, 2021
Haskell

tatsu-lab / alpaca_eval

Star

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

nlp deep-learning leaderboard evaluation instruction-following foundation-models large-language-models rlhf

Updated Oct 23, 2024
Jupyter Notebook

microsoft / genaiops-promptflow-template

Star

GenAIOps with Prompt Flow is a "GenAIOps template and guidance" to help you build LLM-infused apps using Prompt Flow. It offers a range of features including Centralized Code Hosting, Lifecycle Management, Variant and Hyperparameter Experimentation, A/B Deployment, reporting for all runs and experiments and so on.

Updated Sep 4, 2024
Python

Maluuba / nlg-eval

Star

Evaluation code for various unsupervised automated metrics for Natural Language Generation.

nlp natural-language-processing meteor machine-translation dialogue evaluation dialog rouge natural-language-generation nlg cider rouge-l skip-thoughts skip-thought-vectors bleu-score bleu task-oriented-dialogue

Updated Aug 20, 2024
Python

Helicone / helicone

Star

🧊 Open source LLM-Observability Platform for Developers. One-line integration for monitoring, metrics, evals, agent tracing, prompt management, playground, etc. Supports OpenAI SDK, Vercel AI SDK, Anthropic SDK, LiteLLM, LLamaIndex, LangChain, and more. 🍓 YC W23

open-source playground monitoring analytics evaluation ycombinator openai gpt large-language-models llm prompt-engineering langchain llmops llama-index prompt-management llm-evaluation llm-observability agent-monitoring llm-cost

Updated Nov 9, 2024
TypeScript

uptrain-ai / uptrain

Star

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

machine-learning monitoring evaluation experimentation jailbreak-detection autoevaluation root-cause-analysis prompt-engineering llmops openai-evals llm-prompting llm-eval llm-test hallucination-detection

Updated Aug 18, 2024
Python

Improve this page

Add a description, image, and links to the evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation

Here are 1,233 public repositories matching this topic...

mrgloom / awesome-semantic-segmentation

Cloud-CV / EvalAI

MichaelGrupp / evo

explodinggradients / ragas

langfuse / langfuse

zzw922cn / Automatic_Speech_Recognition

Knetic / govaluate

open-compass / opencompass

xinshuoweng / AB3DMOT

tecnickcom / tcexam

promptfoo / promptfoo

ContinualAI / avalanche

google / fuzzbench

huggingface / evaluate

sdiehl / write-you-a-haskell

tatsu-lab / alpaca_eval

microsoft / genaiops-promptflow-template

Maluuba / nlg-eval

Helicone / helicone

uptrain-ai / uptrain

Improve this page

Add this topic to your repo