pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
-
Updated
Jan 13, 2025 - Python
pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
💱 A curated list of data valuation (DV) to design your next data marketplace
OpenDataVal: a Unified Benchmark for Data Valuation in Python (NeurIPS 2023)
This is an official repository for "LAVA: Data Valuation without Pre-Specified Learning Algorithms" (ICLR2023).
Papers about training data quality management for ML models.
PyTorch reimplementation of computing Shapley values via Truncated Monte Carlo sampling from "What is your data worth? Equitable Valuation of Data" by Amirata Ghorbani and James Zou [ICML 2019]
Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)
Code for the paper "The Journey, Not the Destination: How Data Guides Diffusion Models"
Time series data contribution via influence functions
Code for the submission to the ML Reproducibility Challenge 2022, reproducing "If you like Shapley then you'll love the core"
This is an official repository for "2D-Shapley: A Framework for Fragmented Data Valuation" (ICML2023).
The pyDVL slides for pyData Berlin 2024
Simulation environment for data collection dynamics.
Code for the reproduction of Class-wise Shapley paper from Schoch, Xu, Ji [2022].
The Medium of Exchange of Ecosystem
Code for our paper 'Interpretable Triplet Importance for Personalized Ranking' accepted by CIKM 2024.
Federated Learning implementation for Data Valuation and Differential Privacy, supporting Block-chain DP FL.
Supplementary programmes for DeRDaVa: Deletion-Robust Data Valuation for Machine Learning.
Algorithms for data valuation and benchmarks
Add a description, image, and links to the data-valuation topic page so that developers can more easily learn about it.
To associate your repository with the data-valuation topic, visit your repo's landing page and select "manage topics."