-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathprisoners_in_R.Rmd
447 lines (350 loc) · 22 KB
/
prisoners_in_R.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
---
title: "Agent-Based Tournament Coding Challenge"
output: html_document
---
# Agent-Based Tournament Coding Challenge (TU Delft - EPA1315 Data Analysis and Visualization)
## Introduction
In the agent-based tournament coding challenged each agent has to face an interated prisoner's dilemma. The prisoners dilemma is extensively studied in multiple scientific areas, such as economics, machine learning and evolutionary biology [1]. The core concept of the game is that each of the two competitors can choose 1 out of 2 options for each round. These options are to either defect or cooperate. Both player simultaneously and independently select an option and depending on both decisions a payoff for each player is determined.
Payoff | Cooperate | Defect
--- | --- | ---
Cooperate| 4/4 | 0/5
Defect | 5/0 | 2/2
The full version of this work is [hosted on GitHub](https://github.com/jasonrwang/prisoners_in_R).
### Authors
* Mees Hoff – 4963059
* Siemon Keij - 4285654
* Philip Seijger - 4272803
* Jason R Wang - 4788281
## Common Strategies
#### 1. Always Cooperate
Cooperates on every move, as a strategie this can be considered very social but not most benefial for own interest
#### 2. Always Defect
Defects on every move, this is considered a 'hard' strategy'.
#### 3. Tit for Tat
Cooperates on the first move, then simply copies the opponent’s last move. In experimental tournaments this strategy turned out surprisingly successful.
#### 4. Suspicious Tit for Tat
Same as Tit for Tat, except that it defects on the first move.
#### 5. Pavlov
Cooperates on the first move, and defects only if both the players did not agree on the previous move.
#### 6. Spiteful
Cooperates, until the opponent defects, and thereafter always defects. Can work out positively in terms of own interest, but may also be risky if first opponent defect occurs early.
#### 7. Random Player
Makes a random move each turn again.
#### 8. Periodic player CD:
Plays C, D periodically.
#### 9. Periodic player DDC
Plays D, D, C periodically.
#### 10. Periodic player CCD
Plays C, C, D periodically.
#### 11. Tit for Two Tats
Cooperates on the first move, and defects only when the opponent defects two times. Compared to the regular tit for tat, this strategy is more socially oriented.
#### 12. Soft Majority
Begins by cooperating, and cooperates as long as the number of times the opponent has cooperated is greater than or equal to the number of times it has defected. When this is no longer the case it defects.
#### 13, Hard Majority
Defects on the first move, and defects if the number of defections of the opponent is greater than or equal to the number of times it has cooperated, else cooperates.
#### 14. Hard Tit for Tat
Cooperates on the first move, and defects if the opponent has defects on any of the previous three moves, else cooperates.
#### 15. NaiveProber
Like TitforTat,but occasionally, randomly defects.
#### 16. Remorseful Prober :
Like Naive Prober, but it tries to break the series of mutual defections after defecting.
[1]
## Selected Strategy
### Axelrod Single-Objective Evolutionary Algorithm
The concept of our agent is based on the fact that for each move in the game there are four possibilities. Both players can cooperate (CC or R for reward), the first player cooperats and the other player defects (CD or S for sucker), the first player can defect and the second cooperates (DC or T for temptation), or both players defect (DD or P for penalty). For our strategie we consider the last three moves as a three letter code. For instance, the situation is which the last three rounds both agents cooperated, has a three letter sequence RRR.
This combination of letters is used to generate a number 0:63. This is done by setting R=0, T=1, S=2 and P=3 and interpret it as a number on base 4. This will result in the next formula: `(points round 1) * 4^2 + (points round 2) * 4^1 + (points round 3) * 4^0`. For example, 'SSP' yields 43. Each of the numbersfrom 0:63 either stands for defect or cooperate. Therefore, based on the last 3 moves, an option is selected. Since this decision is based upon the last the 3 moves, the first three moves cannot be defined by this strategy and are run by a modified tit for tat approach.
An Exception is made for the so called "lemon agents", these agents are introducing themselves by saying "Lemon!." and will always cheat. For this reason, in a situation of an opponent that introduces itself by "Lemon!." the defect option is always selected.
The 'main.R' code is used to call two different scripts; 'raw_parser.R' and 'extract_statistics.R'. These scripts rework the data from last years tournament into a 64-bit string which can be used in the agent.
Main.R looks as follows
```{R}
### Main File to run things in!
#setwd(Set working directory if not yet specified)
source('scripts/raw_parser.R', chdir = TRUE)
source('scripts/extract_statistics.R', chdir = TRUE)
## Run parser to create string
data_frame = axelrod_encode(raw_parser('Example_Tournament/bid_register.csv'))
## Run parsed array through statistics
axelrod_string = extract_statistics(data_frame)
# The string above can be put into the agent manually now
```
The script in 'raw_parser.R' will look at the information from a file in which the bids were registered. For every possible pair of agents, the code will save the different meetings. For every pair the script will look at three meetings and save which combination this is. For every gained combination, the script will look into if the agent won or lost in the fourth meeting. It also knows what both agents bid, this can be used to state which agent won.
'raw_parser.R' looks as follows:
```{R}
raw_parser = function(file_name) {
tourn_raw = read.csv(file_name,sep=",",stringsAsFactors = FALSE)
num_agents = max(unique(tourn_raw$id1)) # Only works if IDs are sequential!
outcome_sequences = array(data=NA,dim=c(num_agents,num_agents,40)) # 40 is wasteful. See improvement suggestion below.
# typeof(outcome_sequences) ## just for testing
## To get data about the number of games played between each pair
# seq_length_list = c()
for (i_id1 in seq(num_agents-1)) { # Iterate through all the agents
id1_index = which(tourn_raw$id1 %in% i_id1 | tourn_raw$id2 %in% i_id1) # Find where agent plays
for (i_id2 in seq(i_id1+1,num_agents)) { # Iterate through remaining combinations
id2_index = which(tourn_raw$id1 %in% i_id2 | tourn_raw$id2 %in% i_id2) # Find where agent #i_id2 plays
## The below can be an improvement, but doesn't quite work yet.
## The "list" cannot replace the single element.
# The number of times i_id1 attacks i_id2 to help pre-allocate list
match_id = which(id1_index %in% id2_index)
seq_list_len = length(match_id)
#outcome_sequences[i_id1,i_id2] = vector("list",seq_list_len) # pre-allocate
# To get data about the number of games played between each pair
#seq_length_list = c(seq_length_list,seq_list_len)
for (i in seq(seq_list_len)){
outcome_sequences[i_id1,i_id2,i] = axelrod_code(
tourn_raw$bid1[ id1_index [ match_id[i] ] ],
tourn_raw$bid2[ id1_index [ match_id[i] ] ])
}
}
}
return(outcome_sequences)
}
axelrod_code = function(bid1, bid2) {
# Determines the letter code associated with an interaction
if (is.na(bid1)) {
next
}
else if (bid1 == 'cooperate') {
if (bid1 == bid2) { # CC -> R -> 3
return(3)
} else { # CD -> S -> 2
return(2)
}
} else if (bid1 == 'defect') {
if (bid1 == bid2) { # DD -> P -> 0
return(0)
} else { # DC -> T -> 1
return(1)
}
} else {
print("Error in cooperate-defect matching!")
break
}
}
axelrod_encode = function(a_code) {
# Takes a vector of behaviour and turns it into a useful data frame for data analysis
num_agents = nrow(a_code) # Determine number of agents, n, in 'n x n' matrix input
code_id = decision = win = c()
for (i in seq(num_agents)) { # The "attacking" agent
for (j in seq(num_agents)) { # The "defending" agent
# Agents cannot play themselves, so skip in this case.
if (i == j) { # This could be improved by stripping i_id1 from i_id2's for loop declaration
next
}
for (k in seq(length(a_code[i,j,])-3)) {
if (is.na(a_code[i,j,k+3])) { ## Due to dumb coding above, NA exists in list
next
}
encoded_seq = (a_code[i,j,k] * 4*4 + a_code[i,j,k+1] * 4 + a_code[i,j,k+2] * 1)
code_id = c(code_id,encoded_seq)
if (a_code[i,j,k+3] > 1) { # 0 and 1 are defect; 2 and 3
decision = c(decision,'C')
} else {
decision = c(decision,'D')
}
if (a_code[i,j,k+3] %% 2) { # Consider R and T (1 and 3) as win
win = c(win,TRUE)
} else {
win = c(win,FALSE)
}
}
}
}
return(data.frame(code_id,decision,win,stringsAsFactors = FALSE))
}
```
The next script, 'extract_statistics.R' works further were 'raw_parser.R' stopped. It will look into the percentage of winnings after every combination for all the different pairs. This can be used to make a decision on what to do after every combination when encoutering an agent. To finalize a string is produced which states what to do after every combination.
```{R}
get_per <- function(choice_evaluate,data_frame,number) {
d = data_frame
# Count the numbers of times the pair made the same choice we are evaluating
num_chosen = 0
for (i in d$decision[d$code_id == number]) {
if (i == choice_evaluate) {
num_chosen = num_chosen + 1
}
}
# Count the number of times the agent wins by making that decision
num_wins = 0
for (j in d[ (d$code_id==number & d$decision==choice_evaluate) , c("win")]) {
if (j == TRUE) {
num_wins = num_wins + 1
}
}
if (num_chosen == 0) {
p <- 0
} else {
p <- num_wins/num_chosen
}
return(p)
}
extract_statistics <- function(data_frame) {
d = data_frame
str <- NULL
# Run statistics-gatherer for all 64 possible combinations.
for (i in 0:63) {
# Gather statistics about choices
per_c <- get_per("C",data_frame,i)
per_d <- get_per("D",data_frame,i)
# Make decision for what agent should do based on choices
if (per_c == per_d) {
str <- paste(str, "C",sep="") # Defaults to Defect
} else if (per_c > per_d) {
str <- paste(str, "C",sep="")
} else if (per_c < per_d) {
str <- paste(str, "D",sep="")
} else {
str <- paste(str, " ",sep="") # For debugging
return(data.frame(code_id,decision,win, stringsAsFactors = FALSE))
}
## For testing
# raw_parser("validate.csv")
```
The output of the scripts was the following string: CCCDCCCDCCCDCCCDCDCDCDDDCDDDDDDDCCCDCDDDCDDDDDDDDDDDCDCDCDDDDDDD. This script was used during testing, but failed to beat tit for tat and other agents consistantly. Therefore, it was decided to use a string that we got from the paper by Mittal and Deb, namely: DDDDDDCCCDDCDDDDDDCDDDDCDCCDDCDDDDCDCCCDDCDCDCDCDDCCDDDDCDDDDDCCCDCCDD. After the code, the creating and the workings of the agent will be discussed.
```{R}
library(R6)
Agent <- R6Class("Agent",
#this part was already given
public = list(
bid = NULL,
book = NULL,
greeting = NULL,
id = NULL,
opponent_id = NULL,
opponent_greeting = NULL,
round = NULL,
response = NULL,
set_book = function(book=NA) {
self$book <- book
},
set_id = function(id=NA) {
self$id = id
},
set_opponent_id = function(opponent_id=NA) {
self$opponent_id = opponent_id
},
set_response = function(response=NA) {
self$response <- response
},
set_round = function(round=NA) {
self$round <- round
},
set_greeting = function() {
greeting <- "Back to PRESTO..."
},
receive_greeting = function(greeting = NA) {
self$opponent_greeting = greeting
},
#this code was made to get a certain bid for the agent
get_bid = function() {
bid_vector <- c("cooperate","defect")
#creating a few variables which will be used later on
last_move_self <- NA
last_move_opponent <- NA
#create a database with the last rounds
last_round <- tail(subset(self$book, (id1 == self$id | id2 == self$id) & (id1 == self$opponent_id | id2 == self$opponent_id)), 3)
last_round_single <- tail(subset(self$book, (id1 == self$id | id2 == self$id) & (id1 == self$opponent_id | id2 == self$opponent_id)), 1)
#the string which will decide what the agent should do according to the paper by Mittal and Deb
axelrod_seq <- c("D","D","D","D","D","D","C","C","C","D","D","C","D","D","D","D","D","D","C","D","D","D","D","C","D","C","C","D","D","C","D","D","D","D","C","D","C","C","C","D","D","C","D","C","D","C","D","C","D","D","C","C","D","D","D","D","C","D","D","D","D","D","C","C","C","D","C","C","D","D")
#this is the index for the bid, it starts at 1 because R start at 1 with counting
number <- 1
#first check what the bid was of the last round we played this agent
if (nrow(last_round_single) == 0) {
} else if (last_round_single[1,1] == self$id) {
last_move_self <- last_round_single[1,4]
last_move_opponent <- last_round_single[1,5]
} else if (last_round_single[1,1] == self$opponent_id) {
last_move_self <- last_round_single[1,5]
last_move_opponent <- last_round_single[1,4]
}
#this is a simple tit for that function for the first couple of rounds
if (nrow(last_round) == 3) {
} else if (nrow(last_round_single)==0) {
self$bid <- bid_vector[1]
} else if ((last_move_self == bid_vector[1]) & (last_move_opponent == bid_vector[1])) {
self$bid <- sample(bid_vector, 1, prob=c(4/5,1/5))
} else if ((last_move_self == bid_vector[1]) & (last_move_opponent == bid_vector[2])) {
self$bid <- sample(bid_vector, 1, prob=c(1/2,1/2))
} else if ((last_move_self == bid_vector[2]) & (last_move_opponent == bid_vector[1])) {
self$bid <- sample(bid_vector, 1, prob=c(2/5,3/5))
} else if ((last_move_self == bid_vector[2]) & (last_move_opponent == bid_vector[2])) {
self$bid <- bid_vector[2]
}
#this is the function that looks at the last three rounds and adds it up to a number which is used to find the bid in the string
if (nrow(last_round) < 3) {
} else if ((last_round[3,1] == self$id) & (last_round[3,2] == self$opponent_id) & (last_round[3,4] == bid_vector[1]) & (last_round[3,5] == bid_vector[1])) {
number <- number + (3 * 4^2)
} else if ((last_round[3,1] == self$id) & (last_round[3,2] == self$opponent_id) & (last_round[3,4] == bid_vector[2]) & (last_round[3,5] == bid_vector[1])) {
number <- number + (1 * 4^2)
} else if ((last_round[3,1] == self$id) & (last_round[3,2] == self$opponent_id) & (last_round[3,4] == bid_vector[1]) & (last_round[3,5] == bid_vector[2])) {
number <- number + (2 * 4^2)
} else if ((last_round[3,1] == self$id) & (last_round[3,2] == self$opponent_id) & (last_round[3,4] == bid_vector[2]) & (last_round[3,5] == bid_vector[2])) {
number <- number + (0 * 4^2)
} else if ((last_round[3,1] == self$opponent_id) & (last_round[3,2] == self$id) & (last_round[3,4] == bid_vector[1]) & (last_round[3,5] == bid_vector[1])) {
number <- number + (3 * 4^2)
} else if ((last_round[3,1] == self$opponent_id) & (last_round[3,2] == self$id) & (last_round[3,4] == bid_vector[2]) & (last_round[3,5] == bid_vector[1])) {
number <- number + (2 * 4^2)
} else if ((last_round[3,1] == self$opponent_id) & (last_round[3,2] == self$id) & (last_round[3,4] == bid_vector[1]) & (last_round[3,5] == bid_vector[2])) {
number <- number + (1 * 4^2)
} else if ((last_round[3,1] == self$opponent_id) & (last_round[3,2] == self$id) & (last_round[3,4] == bid_vector[2]) & (last_round[3,5] == bid_vector[2])) {
number <- number + (0 * 4^2)
}
#this is the function that looks at the last three rounds and adds it up to a number which is used to find the bid in the string
if (nrow(last_round) < 3) {
} else if ((last_round[2,1] == self$id) & (last_round[2,2] == self$opponent_id) & (last_round[2,4] == bid_vector[1]) & (last_round[2,5] == bid_vector[1])) {
number <- number + (3 * 4^1)
} else if ((last_round[2,1] == self$id) & (last_round[2,2] == self$opponent_id) & (last_round[2,4] == bid_vector[2]) & (last_round[2,5] == bid_vector[1])) {
number <- number + (1 * 4^1)
} else if ((last_round[2,1] == self$id) & (last_round[2,2] == self$opponent_id) & (last_round[2,4] == bid_vector[1]) & (last_round[2,5] == bid_vector[2])) {
number <- number + (2 * 4^1)
} else if ((last_round[2,1] == self$id) & (last_round[2,2] == self$opponent_id) & (last_round[2,4] == bid_vector[2]) & (last_round[2,5] == bid_vector[2])) {
number <- number + (0 * 4^1)
} else if ((last_round[2,1] == self$opponent_id) & (last_round[2,2] == self$id) & (last_round[2,4] == bid_vector[1]) & (last_round[2,5] == bid_vector[1])) {
number <- number + (3 * 4^1)
} else if ((last_round[2,1] == self$opponent_id) & (last_round[2,2] == self$id) & (last_round[2,4] == bid_vector[2]) & (last_round[2,5] == bid_vector[1])) {
number <- number + (2 * 4^1)
} else if ((last_round[2,1] == self$opponent_id) & (last_round[2,2] == self$id) & (last_round[2,4] == bid_vector[1]) & (last_round[2,5] == bid_vector[2])) {
number <- number + (1 * 4^1)
} else if ((last_round[2,1] == self$opponent_id) & (last_round[2,2] == self$id) & (last_round[2,4] == bid_vector[2]) & (last_round[2,5] == bid_vector[2])) {
number <- number + (0 * 4^1)
}
#this is the function that looks at the last three rounds and adds it up to a number which is used to find the bid in the string
if (nrow(last_round) < 3) {
} else if ((last_round[1,1] == self$id) & (last_round[1,2] == self$opponent_id) & (last_round[2,4] == bid_vector[1]) & (last_round[2,5] == bid_vector[1])) {
number <- number + (3 * 4^0)
} else if ((last_round[1,1] == self$id) & (last_round[1,2] == self$opponent_id) & (last_round[1,4] == bid_vector[2]) & (last_round[1,5] == bid_vector[1])) {
number <- number + (1 * 4^0)
} else if ((last_round[1,1] == self$id) & (last_round[1,2] == self$opponent_id) & (last_round[1,4] == bid_vector[1]) & (last_round[1,5] == bid_vector[2])) {
number <- number + (2 * 4^0)
} else if ((last_round[1,1] == self$id) & (last_round[1,2] == self$opponent_id) & (last_round[1,4] == bid_vector[2]) & (last_round[1,5] == bid_vector[2])) {
number <- number + (0 * 4^0)
} else if ((last_round[1,1] == self$opponent_id) & (last_round[1,2] == self$id) & (last_round[1,4] == bid_vector[1]) & (last_round[1,5] == bid_vector[1])) {
number <- number + (3 * 4^0)
} else if ((last_round[1,1] == self$opponent_id) & (last_round[1,2] == self$id) & (last_round[1,4] == bid_vector[2]) & (last_round[1,5] == bid_vector[1])) {
number <- number + (2 * 4^0)
} else if ((last_round[1,1] == self$opponent_id) & (last_round[1,2] == self$id) & (last_round[1,4] == bid_vector[1]) & (last_round[1,5] == bid_vector[2])) {
number <- number + (1 * 4^0)
} else if ((last_round[1,1] == self$opponent_id) & (last_round[1,2] == self$id) & (last_round[1,4] == bid_vector[2]) & (last_round[1,5] == bid_vector[2])) {
number <- number + (0 * 4^0)
}
#this function uses the number to see what letter is at that index in the provided string
if (nrow(last_round) < 3) {
} else if (axelrod_seq[number] == "D") {
self$bid <- bid_vector[2]
} else if (axelrod_seq[number] == "C") {
self$bid <- bid_vector[1]
}
},
formulate_bid = function() {
self$get_bid()
}
)
)
```
This agent should beat tit for tat according to the paper by Mittal and Ded, however this hasn't been proven yet. It's optimised for some agents of last year, so it might be inefficient for the agent of this year.
### Comments on process and code
We started off on our process by reading the paper by Mittal and Deb. From this paper we got the idea to create our own 64 bit string. To make this we ran into some problems. We used the data from last year to begin with, but this didn't seem to work that well with some agents we ran together with our own agent. After tweaking our string we eventually went with the string from the paper. Our agent also has a part which is tit for tat. This is created for in the beginning, but we eventually went for defecting the first three times we meet a new agent. After these three meetings we can use the string.
Furthermore, we also have a part of the code which checks what the greeting is of the agent. This is because of the greeting for the all defect agents, they say "Lemon!.". Unfortunately this part of the code wasn't working to during our testing fase. therefore we were forced to exclude this from our delivered code.
Our first agent had different functions which were called by the get_bid function, but the provided tournament code did not work with this agent. Therefore we were forced to put all the functions in one large function. With further testing we discovered that other parts of the code that worked separately, didn't work with the provided tournament code. This let us to change our code even more.
## References
[1] S. Mittal and K. Deb, “Optimal Strategies of the Iterated Prisoner’s Dilemma Problem for Multiple Conflicting Objectives,” IEEE Trans. Evol. Comput., vol. 13, no. 3, pp. 554–565, Jun. 2009.
[2] R. Axelrod, “The evolution of strategies in the iterated prisoner’s dilemma,” in Genetic Algorithms and Simulated Annealing, L. Davis, Ed. Los Altos, CA: Morgan Kaufmann, 1987.