forked from TatyanaV/Genome_Sequencing_Bioinformatics_II
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path1V.FarthestFirstTraversal.py
73 lines (58 loc) · 1.73 KB
/
1V.FarthestFirstTraversal.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
'''
CODE CHALLENGE: Implement the FarthestFirstTraversal clustering heuristic.
Input: Integers k and m followed by a set of points Data in m-dimensional space.
Output: A set Centers consisting of k points (centers) resulting from applying
FarthestFirstTraversal(Data, k), where the first point from Data is chosen as the
first center to initialize the algorithm
FarthestFirstTraversal(Data, k)
?Centers ? the set consisting of a single randomly chosen point from Data
??while |Centers| < k
???DataPoint ? the point in Data maximizing d(DataPoint, Centers)
???add DataPoint to Centers
?return Centers
Sample Input:
3 2
0.0 0.0
5.0 5.0
0.0 5.0
1.0 1.0
2.0 2.0
3.0 3.0
1.0 2.0
Sample Output:
0.0 0.0
5.0 5.0
0.0 5.0
https://github.com/johnmerm/bioinfo/blob/e6a255a461ca83c58e56c83d4717bc13f8880c5d/src/main/java/bioinfo/yeast/kmeans.py
https://github.com/ilap/Bioinformatics_Part2/blob/8c4fcec1df79e45b80818a281683578876060a6b/CH_3_How_Did_Yeast_Become_a_Wine-Maker/7_3_IntroToClustering.py
'''
k = 5
data = """
0.0 0.0
5.0 5.0
0.0 5.0
1.0 1.0
2.0 2.0
3.0 3.0
1.0 2.0
""".split('\n')
print(data)
Data = [tuple(edge.split(' ')) for edge in data if edge]
print(Data)
def pointDist(a,b):
return sum([(float(a[i])-float(b[i]))**2 for i in range(len(a))])
def dist(a,centers):
d = {b:pointDist(a,b) for b in centers}
dm = min(d.items(),key = lambda x:x[1])
return dm[1]
def farthestFirstTraversal(Data, k):
dataPoint = Data[0]
centers = {dataPoint}
while len(centers)<k:
dataPoint = max(Data,key = lambda x:dist(x,centers))
centers.add(dataPoint)
return centers
centers = farthestFirstTraversal(Data, k)
print (centers)
for arr in centers:
print (' '.join ([str ( val) for val in arr]))