-
Notifications
You must be signed in to change notification settings - Fork 291
/
mr-itemcf.md
79 lines (63 loc) · 1.28 KB
/
mr-itemcf.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
```
Sample input
```
```
+----+---+---+
| u0 | A | 2 |
+----+---+---+
| u0 | C | 1 |
+----+---+---+
| u1 | A | 7 |
+----+---+---+
| u1 | B | 1 |
+----+---+---+
| u2 | A | 2 |
+----+---+---+
| u2 | C | 1 |
+----+---+---+
| u0 | B | 2 |
+----+---+---+
Format is (user, item, pref).
MapReduce1
(user, item, pref) =
>
(user, [(item, pref)])
(u0, [(A, 2), (B, 2), (C, 1)])
(u1, [(A, 7), (B, 1)])
(u2, [(A, 2), (C, 1)])
MapReduce2:
Map:
(user, [(item, pref)]) =
>
{(item1, item2): (pref1, pref2)}
{(A, B): (2, 2)}
{(A, C): (2, 1)}
{(B, C): (2, 1)}
----------------
{(A, B): (7, 1)}
----------------
{(A, C): (2, 1)}
Group:
Reduce:
{(item1, item2): [(pref1, pref2)]} =
>
{(item1, sim): (item1, item2, sim)}
{(item2, sim): (item1, item2, sim)}
Input:
{(A, B): [(2, 2), (7, 1)]}
----------------
{(A, C): [(2, 1), (2, 1)]}
----------------
{(B, C): [(2, 1)]}
Output:
The sim is faked. For sample size 2, Pearson Coefficient is always 1 or -1.
For data is needed to illustrate similarities.
{(A, 0.5): (A, B, 0.5)}
{(B, 0.5): (B, A, 0.5)}
{(A, 0.4): (A, C, 0.4)}
{(C, 0.4): (C, A, 0.4)}
{(B, 0.3): (B, C, 0.3)}
{(C, 0.3): (C, B, 0.3)}
MapReduce3:
group on item, sort on (item, sim)
```