Skip to content

Commit

Permalink
more
Browse files Browse the repository at this point in the history
  • Loading branch information
gagolews committed Nov 20, 2024
1 parent f756559 commit e59ddd3
Show file tree
Hide file tree
Showing 127 changed files with 4,688 additions and 167 deletions.
8 changes: 4 additions & 4 deletions .devel/pytest/test_approx.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@
#rpy2.robjects.numpy2ri.activate()
stats = importr("stats")

r_base = importr("base")
lib_loc = r_base.Sys_getenv("R_LIBS_USER")[0]
print(lib_loc)
# r_base = importr("base")
# lib_loc = r_base.Sys_getenv("R_LIBS_USER")[0]
# print(lib_loc)

genie = importr("genie", lib_loc=lib_loc)
genie = importr("genie") #, lib_loc=lib_loc)
except:
rpy2 = None
stats = None
Expand Down
3 changes: 2 additions & 1 deletion .devel/sphinx/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ pip3 install genieclust
```

from the command line or through your favourite package manager.
Note a familiar *scikit-learn*-like {cite}`sklearn_api` look-and-feel:
Note the *scikit-learn*-like {cite}`sklearn_api` API:

```python
import genieclust
Expand All @@ -73,6 +73,7 @@ labels = g.fit_predict(X)
::::



## R Version

The **R version** of *genieclust* can be downloaded from
Expand Down
Binary file modified .devel/sphinx/weave/basics-figures/basics-dendrogram-1-13.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/basics-figures/basics-dendrogram-1-13.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .devel/sphinx/weave/basics-figures/basics-dendrogram-2-15.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/basics-figures/basics-dendrogram-2-15.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .devel/sphinx/weave/basics-figures/basics-plot-genie2-11.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/basics-figures/basics-plot-genie2-11.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .devel/sphinx/weave/basics-figures/basics-plot-hdbscan-7.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/basics-figures/basics-plot-hdbscan-7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .devel/sphinx/weave/basics-figures/basics-plot-hdbscan2-9.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/basics-figures/basics-plot-hdbscan2-9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .devel/sphinx/weave/basics-figures/basics-plot-km-5.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/basics-figures/basics-plot-km-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .devel/sphinx/weave/basics-figures/basics-plot-pred-3.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/basics-figures/basics-plot-pred-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .devel/sphinx/weave/basics-figures/basics-scatter-1.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/basics-figures/basics-scatter-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 4 additions & 10 deletions .devel/sphinx/weave/basics.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Basics


*Genie* {cite}`genieins` is an agglomerative hierarchical clustering
algorithm that links clusters minding that
the Gini index (a measure of inequality used in, amongst others,
Expand All @@ -19,7 +18,6 @@ via a call to `pip3 install genieclust` from the command line.




```{python imports,results="hide"}
import numpy as np
import pandas as pd
Expand Down Expand Up @@ -72,9 +70,8 @@ plt.show()
```

Let us apply the Genie algorithm (with the default/recommended
`gini_threshold` parameter value). The `genieclust` package's interface
is compatible with the one from the popular
[scikit-learn](https://scikit-learn.org/) library {cite}`sklearn`.
`gini_threshold` parameter value). The `genieclust` package's programming
interface is [scikit-learn](https://scikit-learn.org/)-compatible {cite}`sklearn`.
In particular, an object of class `Genie` is equipped with the
`fit` and `fit_predict` methods {cite}`sklearn_api`.

Expand Down Expand Up @@ -159,8 +156,6 @@ genieclust.compare_partitions.adjusted_rand_score(labels_true, labels_kmeans)
The adjusted Rand score of $\sim 0.3$ indicates a far-from-perfect fit.




## A Comparison with HDBSCAN\*

Let's also make a comparison against a version of the DBSCAN
Expand All @@ -170,7 +165,6 @@ the [hdbscan](https://github.com/scikit-learn-contrib/hdbscan) package
{cite}`hdbscanpkg` implements its robustified variant
{cite}`hdbscan`, which makes the algorithm much more user-friendly.


Here are the clustering results with the `min_cluster_size` parameter
of 3, 5, 10, and 15:

Expand Down Expand Up @@ -247,7 +241,7 @@ plt.show()

Dendrogram plotting is possible with `scipy.cluster.hierarchy`:

```{python basics-dendrogram-1,fig.cap="Example dendrogram."}
```{python basics-dendrogram-1,fig.cap="Example dendrogram.",results="hide"}
import scipy.cluster.hierarchy
g = genieclust.Genie(compute_full_tree=True)
g.fit(X)
Expand All @@ -259,7 +253,7 @@ plt.show()

Another example:

```{python basics-dendrogram-2,fig.cap="Another example dendrogram."}
```{python basics-dendrogram-2,fig.cap="Another example dendrogram.",results="hide"}
scipy.cluster.hierarchy.dendrogram(linkage_matrix,
truncate_mode="lastp", p=15, orientation="left")
plt.show()
Expand Down
89 changes: 48 additions & 41 deletions .devel/sphinx/weave/basics.md

Large diffs are not rendered by default.

Binary file modified .devel/sphinx/weave/benchmarks_ar-figures/plot_large-3.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/benchmarks_ar-figures/plot_large-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .devel/sphinx/weave/benchmarks_ar-figures/plot_small-1.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/benchmarks_ar-figures/plot_small-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 0 additions & 5 deletions .devel/sphinx/weave/benchmarks_ar.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,6 @@ For more detailed results based on other partition similarity scores,
see the [Appendix](benchmarks_details).




```{python imports,results="hide",echo=FALSE}
import numpy as np
import pandas as pd
Expand Down Expand Up @@ -185,7 +183,6 @@ res_max = res2.groupby(["dataset", "method"]).max().\




```{python plot_small,echo=FALSE,results="hide",warn=FALSE,fig.cap="Distribution of the AR index for each algorithm (small datasets); best=1.0.",fig.height=5.9375}
do_plot(res_max)
```
Expand All @@ -199,8 +196,6 @@ compare {cite}`cvimst`)
tend to output good quality outcomes as well.




Descriptive statistics for the ranks (for each dataset,
each algorithm that gets the highest AR index rounded to 2 decimal digits,
gets a rank of 1); lower ranks are better:
Expand Down
9 changes: 2 additions & 7 deletions .devel/sphinx/weave/benchmarks_ar.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,6 @@ see the [Appendix](benchmarks_details).





## Small Datasets

As some of the algorithms tested here have failed to generate a solution
Expand All @@ -62,7 +60,6 @@ Moreover, Gaussian mixtures used `n_init=100`.




(fig:plot_small)=
```{figure} benchmarks_ar-figures/plot_small-1.*
Distribution of the AR index for each algorithm (small datasets); best=1.0.
Expand All @@ -77,8 +74,6 @@ compare {cite}`cvimst`)
tend to output good quality outcomes as well.




Descriptive statistics for the ranks (for each dataset,
each algorithm that gets the highest AR index rounded to 2 decimal digits,
gets a rank of 1); lower ranks are better:
Expand All @@ -96,7 +91,7 @@ gets a rank of 1); lower ranks are better:
| K-means | 72 | 5.6 | 3.8 | 1 | 1 | 6 | 9 | 12 |
| Single linkage | 72 | 7.4 | 5.1 | 1 | 1 | 11 | 12 | 12 |
| Spectral_RBF_5 | 72 | 5.2 | 3.5 | 1 | 1 | 6 | 8 | 11 |
| Ward linkage | 72 | 6 | 3 | 1 | 4 | 6 | 8 | 12 |
| Ward linkage | 72 | 6 | 3 | 1 | 4 | 6 | 8 | 12 |


## Large Datasets
Expand Down Expand Up @@ -127,7 +122,7 @@ Descriptive statistics for the ranks (AR index):
| ITM | 6 | 3.3 | 2.3 | 1 | 1.5 | 3 | 5.2 | 6 |
| K-means | 6 | 3.3 | 1.6 | 1 | 2.2 | 3.5 | 4.8 | 5 |
| Single linkage | 6 | 6.8 | 0.4 | 6 | 7 | 7 | 7 | 7 |
| Ward linkage | 6 | 3.2 | 1.5 | 1 | 2.2 | 3.5 | 4 | 5 |
| Ward linkage | 6 | 3.2 | 1.5 | 1 | 2.2 | 3.5 | 4 | 5 |



Expand Down
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .devel/sphinx/weave/noise-figures/noise-Genie1-3.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/noise-figures/noise-Genie1-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .devel/sphinx/weave/noise-figures/noise-Genie2-5.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/noise-figures/noise-Genie2-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .devel/sphinx/weave/noise-figures/noise-Genie3-7.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/noise-figures/noise-Genie3-7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .devel/sphinx/weave/noise-figures/noise-HDBSCAN1-9.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/noise-figures/noise-HDBSCAN1-9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .devel/sphinx/weave/noise-figures/noise-HDBSCAN2-11.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/noise-figures/noise-HDBSCAN2-11.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .devel/sphinx/weave/noise-figures/noise-scatter-1.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/noise-figures/noise-scatter-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 30 additions & 16 deletions .devel/sphinx/weave/noise.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@
# Clustering with Noise Points Detection



```python
``` python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Expand All @@ -22,8 +21,7 @@ the at [hdbscan](https://github.com/scikit-learn-contrib/hdbscan)
{cite}`hdbscanpkg` package's project site:



```python
``` python
dataset = "hdbscan"
X = np.loadtxt("%s.data.gz" % dataset, ndmin=2)
labels_true = np.loadtxt("%s.labels0.gz" % dataset, dtype=np.intp) - 1
Expand All @@ -38,11 +36,13 @@ find useful for whatever their goal is).
The `-1` labels denote noise points (light grey markers).



```python
``` python
genieclust.plots.plot_scatter(X, labels=labels_true, alpha=0.5)
plt.title("(n=%d, true n_clusters=%d)" % (X.shape[0], n_clusters))
plt.axis("equal")
```

``` python
plt.show()
```

Expand All @@ -66,8 +66,7 @@ Here are the effects of playing with the `M` parameter
(we keep the default `gini_threshold`):



```python
``` python
Ms = [2, 5, 10, 25]
for i in range(len(Ms)):
g = genieclust.Genie(n_clusters=n_clusters, M=Ms[i])
Expand All @@ -76,6 +75,9 @@ for i in range(len(Ms)):
genieclust.plots.plot_scatter(X, labels=labels_genie, alpha=0.5)
plt.title("(gini_threshold=%g, M=%d)"%(g.gini_threshold, g.M))
plt.axis("equal")
```

``` python
plt.show()
```

Expand All @@ -91,8 +93,7 @@ and then apply the clustering procedure once again
but now with respect to the original distance (here: Euclidean):



```python
``` python
# Step 1: Noise point identification
g1 = genieclust.Genie(n_clusters=n_clusters, M=50)
labels_noise = g1.fit_predict(X)
Expand All @@ -106,6 +107,9 @@ labels_noise[non_noise] = labels_genie
genieclust.plots.plot_scatter(X, labels=labels_noise, alpha=0.5)
plt.title("(gini_threshold=%g, noise points removed first; M=%d)"%(g2.gini_threshold, g1.M))
plt.axis("equal")
```

``` python
plt.show()
```

Expand All @@ -126,8 +130,7 @@ of finer or coarser granularity.




```python
``` python
ncs = [5, 6, 7, 8, 10, 15]
for i in range(len(ncs)):
g = genieclust.Genie(n_clusters=ncs[i])
Expand All @@ -137,6 +140,9 @@ for i in range(len(ncs)):
genieclust.plots.plot_scatter(X, labels=labels_noise, alpha=0.5)
plt.title("(n_clusters=%d)"%(g.n_clusters))
plt.axis("equal")
```

``` python
plt.show()
```

Expand All @@ -153,15 +159,21 @@ Labels predicted by Genie when noise points were removed from the dataset – di
Here are the results returned by `hdbscan` with default parameters:



```python
``` python
import hdbscan
```


``` python
h = hdbscan.HDBSCAN()
labels_hdbscan = h.fit_predict(X)
genieclust.plots.plot_scatter(X, labels=labels_hdbscan, alpha=0.5)
plt.title("(min_cluster_size=%d, min_samples=%d)" % (
h.min_cluster_size, h.min_samples or h.min_cluster_size))
plt.axis("equal")
```

``` python
plt.show()
```

Expand All @@ -177,8 +189,7 @@ we can obtain a partition that is even closer to the reference one:




```python
``` python
mcss = [5, 10, 25]
mss = [5, 10]
for i in range(len(mcss)):
Expand All @@ -190,6 +201,9 @@ for i in range(len(mcss)):
plt.title("(min_cluster_size=%d, min_samples=%d)" % (
h.min_cluster_size, h.min_samples or h.min_cluster_size))
plt.axis("equal")
```

``` python
plt.show()
```

Expand Down
2 changes: 0 additions & 2 deletions .devel/sphinx/weave/r.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@ install.packages("genieclust")


Below are a few basic examples of how to interact with the package.
<!-- (partially based on Marek's [forthcoming book](https://lmlcr.gagolewski.com)). -->


```{r load}
library("genieclust")
Expand Down
Binary file modified .devel/sphinx/weave/sklearn_toy_example-figures/clustering-1.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/sklearn_toy_example-figures/clustering-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 3 additions & 6 deletions .devel/sphinx/weave/sklearn_toy_example.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,7 @@ on a larger data sample and with the Genie algorithm in the game.




```python
``` python
import time
import warnings

Expand All @@ -41,8 +40,7 @@ First, we generate the datasets. Note that in the
`n_samples` was set to 1500.



```python
``` python
n_samples = 10000
noisy_circles = datasets.make_circles(n_samples=n_samples, factor=.5,
noise=.05)
Expand All @@ -66,8 +64,7 @@ varied = datasets.make_blobs(n_samples=n_samples,
Then, we run the clustering procedures and plot the results.



```python
``` python
# Set up cluster parameters
plt.figure(figsize=(9 * 1.3 + 2, 14.5))
plt.subplots_adjust(left=.02, right=.98, bottom=.001, top=.96, wspace=.05,
Expand Down
Binary file modified .devel/sphinx/weave/timings-figures/digits-3.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/timings-figures/digits-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified .devel/sphinx/weave/timings-figures/g2mg-plot-1.pdf
Binary file not shown.
Binary file modified .devel/sphinx/weave/timings-figures/g2mg-plot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit e59ddd3

Please sign in to comment.