Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Implementing DBSCAN #2477

Closed
aryanpola opened this issue Dec 31, 2024 · 6 comments
Closed

[ENH] Implementing DBSCAN #2477

aryanpola opened this issue Dec 31, 2024 · 6 comments
Assignees
Labels
clustering Clustering package enhancement New feature, improvement request or other non-bug code enhancement

Comments

@aryanpola
Copy link
Contributor

Describe the feature or idea you want to propose

We currently do not have a sub-module for density-based clustering. This issue is for implementing Density-Based Spatial Clustering of Applications with Noise (DBSCAN).

Describe your proposed solution

Scikit-learn for reference: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/cluster/_dbscan.py

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@aryanpola aryanpola added the enhancement New feature, improvement request or other non-bug code enhancement label Dec 31, 2024
@aryanpola
Copy link
Contributor Author

@aeon-actions-bot assign @aryanpola

@SebastianSchmidl
Copy link
Member

DBSCAN is already implemented in sklearn, and aeon depends on sklearn. Why do we need to re-implement it directly in aeon?

@aryanpola
Copy link
Contributor Author

aryanpola commented Jan 2, 2025

The clustering examples had a graph contained all the clustering sub-modules which includes density based, so figured why not implement it in aeon.
Please let me know if this is unnecessary, I'll close this.
Thanks for checking!

@MatthewMiddlehurst MatthewMiddlehurst added the clustering Clustering package label Jan 10, 2025
@MatthewMiddlehurst
Copy link
Member

Agreed that unless there is some reason the sklearn version wont work (or is insufficient in any other way) that theres no need to have it here really. Any thoughts @chrisholder ?

@TonyBagnall
Copy link
Contributor

the problem with dbscan is that it does not have a predict, i.e. it can only cluster train data. I'm not sure of the best way to deal with this. I'll have a go with the scikit learn version

@aryanpola
Copy link
Contributor Author

the problem with dbscan is that it does not have a predict, i.e. it can only cluster train data. I'm not sure of the best way to deal with this. I'll have a go with the scikit learn version

Yep, not having a predict function makes it a lot more difficult to cluster the newer points into the existing clusters (requires reevaluation of the model).
I'll close the issue then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clustering Clustering package enhancement New feature, improvement request or other non-bug code enhancement
Projects
None yet
Development

No branches or pull requests

4 participants