Skip to content

Python implementation of k-medoid algorithm making use of different set and sequence similarity measures.

Notifications You must be signed in to change notification settings

gulraizchoudhary/k-medoid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

k-medoid

Cluster patients’ diagnosis using different set and sequence similarity measures, identify correct number of clusters, optimal clustering solution, and analyze the clustering results. Many distance matrices are used and are listed as follows.

Set similarity measures

Jaccard Similarity(num_intersect / (len_x + len_y - num_intersect))
Overlap Similarity (num_intersect / min(len_x, len_y))
Bram Similarity num_intersect/max(len_x, len_y)
Dice Similarity (2*num_intersect)/(len_x+len_y)

Sequence similarity

LCSS 
bram num_intersect/max(len_x, len_y)
MetricLCSS lcss.lcss(x,y)/max(len_x, len_y)

Hybrid: Set+Sequence similarity

OverlappLcss (p*lcss)+(q*overlap)
S3m (p*lcss)+(q*jaccard)
smc (num_intersect/(len_x+len_y))
tss ((p*overlap) + (q*overlapLcss))
Monge-Elkan similarity measure: The Monge-Elkan similarity measure is a type of hybrid similarity measure that combines the benefits of sequence-based and set-based methods. 

Distance Score

Bram distance socre (1 - bram similarity)
dice distance socre (1 - dice similarity)
TfIdf (1-(Monge-Elkan similarity measure/min(x,y)))
Overlap distance score (1-overlap similarity)
Jaccard distance score (1-jaccard similarity)

Example: Histograms of most frequent diseases in a cluster (mLCSS sequence similarity measure)

"Cluster 0" "Cluster 0" "Cluster 0" "Cluster 0" "Cluster 0"

Example: Histograms of most frequent diseases in a cluster (Overlap similarity measure)

"Cluster 0" "Cluster 0" "Cluster 0" "Cluster 0" "Cluster 0"

Example: Diseae trajectories with cluster representatives (medoid) using mLCSS sequence similarity measure

"Cluster 0" "Cluster 0" "Cluster 0" "Cluster 0" "Cluster 0"

Example: Diseae trajectories with cluster representatives (medoid) using Overlap similarity measure

"Cluster 0" "Cluster 0" "Cluster 0" "Cluster 0" "Cluster 0"

License

MIT

About

Python implementation of k-medoid algorithm making use of different set and sequence similarity measures.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages