Crowdsourced Clustering via Active Querying: Practical Algorithm with Theoretical Guarantees

This repository provides code for the simulations and experiments in our paper: Crowdsourced Clustering via Active Querying: Practical Algorithm with Theoretical Guarantees @ The Eleventh AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2023).

Structure

The root of the codebase contains 5 directories.

backend contains code for the backend (removed in revised version due to space limits).
csv contains data in csv format, generated from the experiments run using the backend.
mat contains data in mat format, generated from the experiments run using the backend and simulation.
notebook contains files that used to visualize data obtained from the experiments.
simulation contains code to generate simulated results.

Environment

Simulation

The required packages and their corresponding packages are located in environment.yml. You first need to make sure the required packages are installed. One easy way to do so is to use Anaconda to create an environment directly:

conda env create -f environment.yml

This will create an environment named active-querying. To active this environment, run

conda activate active-querying

Usage

Simulation

The files related to the simulation are located in the simulation directory. There are two python files whose name starting with executor. These two files are the entry point for the simulation.

The file executor_all_sports.py runs simulation using the results obtained from the all sports experiment.
The file executor_yun_14.py runs simulation on the simulated dataset that was created for the yun 14 simulation.
The file executor.py runs the simulation in general.
Outputs of the simulation are stored in the outputs directory.
Directory yun14-related contains files related to simulations regarding yun14 algorithm.
- clusering.py contains the implementation of yun14.
- passive_simulation[].ipynb are used to generate adjacency matrix, frequency matrix, or observation matrix for simulation.
- yun14_passive_all[].ipynb runs the yun14 passive simulation.
- adpative_simulation.ipynb runs the yun14 adaptive on simulated dataset.
- adpative_allsports.ipynb runs the yun14 adaptive on allsports dataset.

The parameters related to the simulation can be set in the main() function in the two files.

Citation

If you find our repository useful for your research, please consider citing our paper:

@inproceedings{chen2023crowdsourced,
  title={Crowdsourced Clustering via Active Querying: Practical Algorithm with Theoretical Guarantees},
  author={Chen, Yi and Vinayak, Ramya Korlakai and Hassibi, Babak},
  booktitle={Proceedings of the AAAI Conference on Human Computation and Crowdsourcing},
  volume={11},
  number={1},
  pages={27--37},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
csv		csv
mat		mat
notebook		notebook
simulation		simulation
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
illusration.png		illusration.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crowdsourced Clustering via Active Querying: Practical Algorithm with Theoretical Guarantees

Structure

Environment

Simulation

Usage

Simulation

Citation

About

Releases

Packages

Languages

kitkatdafu/crowd-active-clustering

Folders and files

Latest commit

History

Repository files navigation

Crowdsourced Clustering via Active Querying: Practical Algorithm with Theoretical Guarantees

Structure

Environment

Simulation

Usage

Simulation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages