Skip to content

Latest commit

 

History

History
1266 lines (967 loc) · 47.6 KB

README.md

File metadata and controls

1266 lines (967 loc) · 47.6 KB

Computational Intelligence - Exam Report

Luigi Federico, s295740



1 - Introduction

The agent is build using an heuristic strategy based on pareto fronts idea and using MinMax with alpha-beta pruning and both horizontal and deep pruning.

The method section illustrates in detail how the agent works and all the strategies that I developed. Next, the Experiment section contains a tuning experiment and the results against the Random Agent. The code is at the end of this report, followed by all the laboratory reports and reviews.

2 - Method

The agent uses two different approaches depending on which mode it is in. There could be two different modes:

  • Safe Mode
  • Threat Mode

The agent is in Threat Mode if there is at least one threat piece. A threat piece is a piece that, if placed in a line with 3 pieces, will result in a quarto. If there are no threat pieces, the agent is in Safe Mode.

The agent will act differently according to the mode it is in:

  • When it is in safe mode it will use an euristic strategy that tries to map the board state to a score board, based on the frontiers idea (see 2.1 - Frontiers and Score Board). This stands for both choosing and placing a piece.
  • When it is in threat mode it will run a MinMax algorithm that uses alpha-beta pruning and both deep and horizontal pruning with a greedy sorting based on the Score Board to speed up the prunings (see 2.2 - MinMax).

2.1 - Frontiers and Score Board

The idea of piece frontiers comes after reading the reference [1] "Quarto, Part 1 (Theory)" by Steven Morse in which he tries to analyes the math behind quarto. Here is the extract that inspired me:

[...]
<< We’d like to relate the space to some object, and then use a well known dihedral group to represent the symmetries. (Instead of using a clunky flat matrix.) A good analogy is to use a 4-cube, or tesseract, and think of the binary numbers as the coordinates of the vertices of the cube in 4-space. Then if you choose a vertex v, A_v are the adjacent vertices, B_v are the vertices 2 steps away, C_v are the vertices 3 steps away, and D_v is the vertex on the opposite side of the 4-cube. Thus, H seems to be the dihedral group of a 4-cube.

Quarto piece hypercube

Like all things, in hindsight, this is quite obvious. We have a 16-element set, and we are trying to identify symmetries. Of course this should correspond to the 16-element set called the coordinates of a 4-cube in Euclidean space! Of course! >>

The idea of the frontiers comes with this hypercube configuration. We can distinguish 4 frontiers, ordered by the number of attributes shared with the source piece. The higher the number of shared attributes, the higher the advantage to place the frontier piece in the same line of the source. Here is a better rapresentation of the frontiers:

Frontiers

If we place the piece with 0 common attributes in the same line of the source piece, we "disable" the line becouse no matter what piece the agent places in that line, it will never give a quarto. If we want to place a piece safely we could choose a disabled line.

For each frontier it doesn't matter which piece you choose, because it will share exactly that number of attributes with the source. This configuration is a sort of Pareto Front system.

We can compute the frontiers of more than one piece as source. The sources can be at most three, becouse after that a line is disabled and it doesn't make sense to think about frontiers since you can't place any other piece. There are exacly 696 possible sources (16 with one piece, 120 with two pieces and 560 with three pieces) so they can be easly computed and mantained by the agent during the games.

The code I used to compute statically the frontiers is the following:

@staticmethod
def get_frontiers():
    """
        Generates all the frontiers for each possible combination of 1, 2 and 3 elements.

        Returns a dictionary frontiers with:
        - key = tuple containing the source pieces of the respetive frontiers. Could have len = 1, 2 or 3.
        - value: the frontier corresponding to the source pieces as a dictionary with:
            - key: number of attributes in common with the pieces inside the level of the frontier. key = 0, 1, 2 or 3.
            - value: pieces that share key elements in common with the source pieces.
    """
    
    def get_frontier(p: int):
        """ Frontiers with 1 piece """
        assert p >= 0 and p <= 15

        x = dec_to_bin[p]
        frontier = {
            0: [],
            1: [],
            2: [],
            3: []
        }

        for num, binary in dec_to_bin.items():
            if num == p:
                continue
            cnt = 0
            for i, j in zip(x, binary):
                if i == j:
                    cnt += 1
            frontier[cnt].append(num)
        
        return frontier

    def get_frontier_2d(a: int, b:int):
        """ Frontiers with 2 pieces """
        assert a >= 0 and a <= 15
        assert b >= 0 and b <= 15
        assert a != b

        x = dec_to_bin[a]
        y = dec_to_bin[b]
        frontier = {
            0: [],
            1: [],
            2: [],
            3: []
        }

        for num, binary in dec_to_bin.items():
            if num == a or num == b:
                continue
            cnt = 0
            for x_i, y_i, j in zip(x, y, binary):
                if x_i == j and y_i == j:
                    cnt += 1
            frontier[cnt].append(num)
        
        return frontier

    def get_frontier_3d(a: int, b:int, c: int):
        """ Frontiers with 3 pieces """
        assert a >= 0 and a <= 15
        assert b >= 0 and b <= 15
        assert c >= 0 and c <= 15
        assert a != b and a != c and b != c

        x = dec_to_bin[a]
        y = dec_to_bin[b]
        z = dec_to_bin[c]
        frontier = {
            0: [],
            1: [],
            2: [],
            3: []
        }

        for num, binary in dec_to_bin.items():
            if num == a or num == b or num == c:
                continue
            cnt = 0
            for x_i, y_i, z_i, j in zip(x, y, z, binary):
                if x_i == j and y_i == j and z_i == j:
                    cnt += 1
            frontier[cnt].append(num)
        
        return frontier

    frontier = {}

    # one piece -> 16
    for p in range(16):
        frontier[(p, )] = get_frontier(p)
    
    # two pieces -> 120 combinations
    for i in range(16):
        for j in range(i+1, 16):
            frontier[(i, j)] = get_frontier_2d(i, j)

    # three pieces -> 560 combinations
    for i in range(16):
        for j in range(i+1, 16):
            for k in range(j+1, 16):
                frontier[(i, j, k)] = get_frontier_3d(i, j, k)
                
    return frontier

Using frontiers it's easy and fast to distinguish the threat pieces from the safe ones. A threat piece would share at least one attribute with a source of three pieces. If a line has three pieces and there is at least one threat piece, than the agent enters in threat mode.

The agent uses the following method to distinguish the pieces:

def distinguish_pieces(self, board):
    """ Returns a list of safe pieces and a list of threat pieces """

    threat = set()

    # Rows
    for row in range(board.shape[0]):
        k = tuple(sorted([p for p in board[row, :] if p != -1])) # frontier source (key)
        if len(k) == 3: # Threat line
            k_threat = set(self.frontiers[k][1]) | set(self.frontiers[k][2])  # pieces with attributes in common with all of 3
            threat |= k_threat

    # Cols
    for col in range(board.shape[1]):
        k = tuple(sorted([p for p in board[:, col] if p != -1]))
        if len(k) == 3: # Threat line
            k_threat = set(self.frontiers[k][1]) | set(self.frontiers[k][2])  # pieces with attributes in common with all of 3
            threat |= k_threat
    
    # Diag 1
    k = tuple(sorted([p for p in board.diagonal() if p != -1]))    
    if len(k) == 3: # Threat line
        k_threat = set(self.frontiers[k][1]) | set(self.frontiers[k][2])  # pieces with attributes in common with all of 3
        threat |= k_threat

    # Diag 2   
    k = tuple(sorted([p for p in np.fliplr(board).diagonal() if p != -1]))  
    if len(k) == 3: # Threat line
        k_threat = set(self.frontiers[k][1]) | set(self.frontiers[k][2])  # pieces with attributes in common with all of 3
        threat |= k_threat   
    

    # Clean up
    safe = set(range(16))
    placed = set([int(p) for p in np.nditer(board) if p != -1])
    threat -= placed
    safe -= placed
    safe -= threat
    
    return list(safe), list(threat)

Using the frontiers it is possible to map how much "dangerous" is to place a piece on a specific position of the board. When we have to place a piece, we can compute for each line how many attributes would be shared with the placed pieces on that line, i.e. on which frontier the piece to place is, using the placed pieces as source, and then we multiply that number for a scaling factor. This scaling factor is equal to 100 if the source is compoused of 3 pieces, 10 if the source has 2 pieces and 1 if the source is just one piece. This scaling factor highlights the board positions where is more likely to find a quarto after the placing. If we have a number in the hundreds, we have a winning condition: place the piece there and you will have a quarto! If the number is in the tens, than the higher it is the more probable it is to create a quarto condition on multiple lines. At the end, the scores computer for each line are summed togheter on the board. Here is a quick illustration to visualize the procedure:

Score board procedure

The method used by the agent to compute the score board is the following:

def __board_intersection_scores(self, piece: int):
    """
        Computes the mapping of the board with scores in it using the frontiers
    """

    def extract_source(board_slice) -> tuple:
        """ Extracts the a key for self.frontiers """
        k = tuple(sorted([p for p in board_slice if p != -1]))
        return k

    def compute_score(frontier) -> int:
        rank = -1                           # Frontier rank = number of common attributes with piece
        for i in range(4):
            if piece in frontier[i]:
                rank = i
                break

        rescaling = 10**(len(k) - 1)
        score = rank * rescaling
        return score
        

    # Setup
    board = self.get_game().get_board_status()
    board_scores = np.zeros(shape=(Quarto.BOARD_SIDE, Quarto.BOARD_SIDE), dtype=int)   # Initialized mapped board

    # Rows
    for row in range(board.shape[0]):
        k = extract_source(board[row, :])   # Extract the frontier's source
        if k == () or len(k) == 4:
            continue 
        score = compute_score(self.frontiers[k]) 
        board_scores[row, :] += score
    
    # Cols
    for col in range(board.shape[1]):
        k = extract_source(board[:, col])       
        if k == () or len(k) == 4:
            continue 
        score = compute_score(self.frontiers[k]) 
        board_scores[:, col] += score      

    # Diag 1
    k = extract_source(board.diagonal())    
    if k != () and len(k) != 4:
        score = compute_score(self.frontiers[k]) 
        tmp = np.zeros((4, 4), dtype=int)   # Update the diagonal
        np.fill_diagonal(tmp, score)
        board_scores += tmp

    # Diag 2   
    k = extract_source(np.fliplr(board).diagonal())       
    if k != () and len(k) != 4:
        score = compute_score(self.frontiers[k]) 
        tmp = np.zeros((4, 4), dtype=int)      
        np.fill_diagonal(np.fliplr(tmp), score)
        board_scores += tmp

    # Clean up board_scores
    mask = np.array(board == -1, dtype=int)     # 0 if there is a piece in the i,j board coordinate, 1 otherwise
    board_scores = board_scores * mask

    return board_scores

I used this score board strategy as base line for the heuristic strategy that the agent uses at safe mode:

  • When the agent has to place a piece it computes all the possible actions and sorts them by ascending score and takes the action with the lowest score. This should delay as much as possible the threat mode strategy, in order to have a board configuration that is not too heavy for the MinMax algorithm to explore. It is also possible to sort the actions in descending mode, trying to create as soon as possible a threat line. In the experiment section (3 - Experiments) we can see that it is better to delay the MinMax computation in order to have faster agent moves. The method used for the "safe mode place piece" is the following:
def __safe_mode_place_piece(self, piece: int, *, delay_minmax = True) -> tuple[int, int]:
    
    # Setup
    board = self.get_game().get_board_status()    
    possible_actions = self.__list_ply_score(board, piece)

    # Sort the action 
    if delay_minmax:
        possible_actions = sorted(possible_actions, key=lambda a: a[1])    # ascending score (MinMax delayed amap)
    else:
        possible_actions = sorted(possible_actions, key=lambda a: -a[1])   # descending score (MinMax asap) 

    # Check if the action is safe (if there are enough pieces to choose)
    idx = 0
    ply, score = possible_actions[idx]

    while (self.__check_place_ply(ply, piece) == False):
        idx += 1
        if idx < len(possible_actions): 
            ply, score = possible_actions[idx]
        else:
            break

    return ply


def __list_ply_score(self, board, piece):
    """
        List all the possible positions where to place the piece with score as tuples ((x,y), score)
    """
    possible_actions = []
    score_board = self.__board_intersection_scores(piece)

    for row in range(board.shape[0]):
        for col in range(board.shape[1]):
            if board[row, col] == -1:       # Possible action
                possible_actions.append( ((row, col), score_board[row, col]) )
    
    return possible_actions

def __check_place_ply(self, ply: tuple[int, int], piece: int, *, safe_threshold = 1):
    """ 
        Returns True if there are enough safe pieces to choose after placing 
        the given piece on board using the ply coordinates.
    """
    board = self.get_game().get_board_status() 

    assert board[ply[0], ply[1]] == -1  # The ply must be legal
    
    board[ply[0], ply[1]] = piece
    safe, _ = self.distinguish_pieces(board)

    # I want enough safe pieces to choose after having placed the given piece on the board
    if len(safe) >= safe_threshold:
        return True
    else:
        return False
  • When the agent has to choose a piece in safe mode, it computes for each piece the total score and chooses the one with the minimum. This should result on giving the piece that gives the less advantage possible to the opponent and that delays the threat mode configuration as much as possible. The method that computes this is the following:
def __safe_mode_choose_piece(self, safe_pool) -> int:
    """ 
        Retrieves the piece with the lowest total score computed using the frontiers.
        This should be the piece that gives to the opponent the lowest advantage.
    """
    best_piece = -1
    best_score = 500
    
    for p in iter(safe_pool):   # Min search
        board_score = self.__board_intersection_scores(p)
        score = board_score.sum()
        if score < best_score:
            best_piece = p

    return best_piece

2.2 - MinMax

When the agent is in Threat Mode it runs the MinMax algorithm to find the best ply possible. In order to have reasonable timings, I adopted the alpha-beta pruning and both deep and horizontal pruning.

Deep pruning follows a static approach: once it reaches MINMAX_MAX_DEPTH, it will stop expanding the game tree. After some tuning I found that MINMAX_MAX_DEPTH = 2 was the best compromise between exploration and time efficiency. Note that the reached depth is actually 3, becouse the algorithm starts the search using a counter that starts form 0.

Horizontal pruning uses a greedy heuristics hoping to help the alpha-beta pruning:

  • When it has to place a piece, it sorts all the possible actions on descending order, exploring first the plys that could lead faster to a quarto. The horizontal pruning consists on exploring only the plys that has score greater than 0, becouse the cutted plys could need too much exploration and it could not be worth to explore them.
  • When it has to choose a piece, it computes the total score for each safe piece and sort them in descending order. It explores first the pieces that could lead faster to a quarto. This is not an horizontal pruning but an heuristic that tries to help the alpha-beta pruning by finding as soon as possible a winning condition.

The used methods are the following:

def min_max_place_piece(self, board_now, piece, player, cnt=0):
    """
        MinMax algorithm with alpha-beta pruning, deep pruning, horizontal pruning and greedy sorting for placing a piece.

        The greedy sorting tryes to speed up MinMax by sorting the possible actions by 
        the corresponding score computed with self.__list_ply_score(•).
        IDEA: Expand first the actions that could lead faster to a winning/drow condition,
        hoping to favor the alpha-beta pruning. 

        Args:
        - board -> current state of the board (numpy.ndarray)
        - piece -> piece to place (int)
        - player -> 0: my agent, 1: opponent

        Returns:
        - ply = coordinates as tuple es. (x, y)
        - piece = best piece to give associated with the best ply
        - score -> WIN = 5, LOSE = -5, DROW = 1
    """

    possible_actions = self.__list_ply_score(board_now, piece)          # Descending score:
    possible_actions = sorted(possible_actions, key=lambda a: a[1])     # Greedy sorting to speed up alpha-beta pruning

    if cnt > self.MINMAX_MAX_DEPTH:
        return possible_actions[0], -1, 0

    possible_actions = [a for a in possible_actions if a[1] > 0]        # Horizontal pruning
    ply_drow = (-1, -1)
    piece_drow = -1
    best_score = 0

    for ply, _ in iter(possible_actions):
        if board_now[ply[0], ply[1]] != -1:     # Check that the ply is legal
            continue
        board = deepcopy(board_now)             # Ply     
        board[ply[0], ply[1]] = piece
        
        if self.check_finished(board):               # Termination
            if player == 0:
                return ply, -1, WIN
            else:
                return ply, -1, LOSE
        
        piece, score = self.min_max_choose_piece(board, player, cnt)  # Recursion
        
        
        if score == WIN and player == 0:        # Alpha-Beta Pruning
            return ply, piece, WIN
        if score == LOSE and player == 1:
            return ply, piece, LOSE
        # If drow or else, keep looking for a better ply
        ply_drow = ply
        piece_drow = piece
        if score == DROW:
            best_score = DROW

    # No winning condition -> drow 
    return ply_drow, piece_drow, best_score

    
def check_finished(self, board):
    """ Returns True if the board is a winning condition, False otherwise """

    def __check(k):
        for attribute in range(4):
            sum = 0
            for idx in range(4):                      
                sum += dec_to_bin[ k[idx] ][attribute]
            if sum == 4 or sum == 0:    # Winning condition
                return True

    # Rows
    for row in range(board.shape[0]):
        k = tuple(sorted([p for p in board[row, :] if p != -1]))
        if len(k) == 4 and __check(k):
            return True

    # Cols
    for col in range(board.shape[1]):
        k = tuple(sorted([p for p in board[:, col] if p != -1]))
        if len(k) == 4 and __check(k):
            return True

    # Diag 1
    k = tuple(sorted([p for p in board.diagonal() if p != -1]))
    if len(k) == 4 and __check(k):
        return True

    # Diag 2
    k = tuple(sorted([p for p in np.fliplr(board).diagonal() if p != -1]))
    if len(k) == 4 and __check(k):
        return True    
                    
    return False
def min_max_choose_piece(self, board, player, cnt=0):
    """
        MinMax algorithm for choosing a piece with alpha-beta pruning and greedy sorting.

        The greedy sorting tryes to speed up MinMax by sorting the safe pieces (the only one expanded)
        by decreasing score. The score is computed as sum of the scores inside the board_score computed 
        by self.__board_intersection_scores(p) for each piece p in the safe pool.
        IDEA: look for pieces that offer more aggressive plays hoping to find faster a winning/drow condition
        in oreder to assist the alpha-beta pruning.

        Args:
        - board -> current state of the board, after having placed the previous given piece (np.ndarray)
        - player -> 0: my agent, 1: opponent

        Returns:
        - piece = choosen piece
        - score -> WIN = 5, LOSE = -5, DROW = 1
    """

    safe, threat = self.distinguish_pieces(board)   # Threat and safe pieces pools

    # Terminations
    if len(safe) + len(threat) == 0:
        return -1, DROW

    if len(safe) == 0:
        if player == 0:
            return threat[0], LOSE
        else:
            return threat[0], WIN
    
    if cnt > self.MINMAX_MAX_DEPTH:
        return safe[0], 0

    # Greedy sorting to speed up MinMax pruning
    piece_score = []
    for p in iter(safe):
        board_score = self.__board_intersection_scores(p)
        piece_score.append((p, board_score.sum()))
    piece_score = sorted(piece_score, key=lambda x: -x[1])   # Descending order

    # Recursion
    best_piece = -1
    best_score = []
    for piece, _ in iter(piece_score):
        _, _, score = self.min_max_place_piece(board, piece, (player + 1) % 2, cnt+1)  
        
        if player == 0 and score == WIN:        # Alpha-Beta Pruning
            return piece, WIN
        if player == 1 and score == LOSE:
            return piece, LOSE
        # If drow or else, keep looking
        best_piece = piece
        if score == DROW:
            best_score = DROW

    # No winning condition -> drow 
    return best_piece, best_score

2.3 - Place piece

As already said, if the agent is in threat mode, if it has to place a threat piece, he place it on the spot that creates a quarto, otherwise it runs the MinMax algorithm (min_max_place_piece(•)). The MinMax can retrieve also the best piece to give after the placing ply so it saves time avoiding to run again the MinMax aglorithm to choose the piece to give. If the agent is in safe mode, it runs the heuristic strategy.

def place_piece(self) -> tuple[int, int]:

    # Setup
    board = self.get_game().get_board_status()
    piece_to_place = self.get_game().get_selected_piece()
    safe, threat = self.distinguish_pieces(board)
    
    # Threat mode -> MinMax
    if len(threat) > 0: 
        # If I have to place a threat piece, I win
        if piece_to_place in threat:   
            score_board = self.__board_intersection_scores(piece_to_place)      
            ply_ = np.unravel_index(np.argmax(score_board), score_board.shape)   # Max score on the winning position
            ply = (ply_[1], ply_[0])
            return ply

        # MinMax
        ply_, piece, _ = self.min_max_place_piece(board, piece_to_place, player = 0)
        ply = (ply_[1], ply_[0])
        if piece != -1:
            self.piece_to_give = piece  # No need to run MinMax again to choose the piece

    # Safe mode -> Heuristic strategy
    else:   
        ply_ = self.__safe_mode_place_piece(piece_to_place, delay_minmax = True)
        ply = (ply_[1], ply_[0])
    return ply

2.4 - Choose piece

When the agent has to choose a piece, if it computed the best piece with the placing ply using MinMax, it gives that piece. Otherwise, if it is in threat mode it runs MinMax, if it is in safe mode it uses the heuristic strategy.

def choose_piece(self) -> int:
    # Best piece to give computed inside self.place_piece() with MinMax
    if self.piece_to_give != -1:
        p = self.piece_to_give
        self.piece_to_give = -1
        return p
    
    # Setup
    board = self.get_game().get_board_status()
    safe, threat = self.distinguish_pieces(board)

    # Threat mode -> MinMax
    if len(threat) > 0:
        piece, _ = self.min_max_choose_piece(board, 0)

    # Safe mode -> Heuristic strategy
    else:
        piece = self.__safe_mode_choose_piece(safe)

    return piece

3 - Experiments

I runned some tournaments in order to evaluate how well my agent performs against the RandomPlayer. The code of the tournament is the following:

from main import RandomPlayer
from quarto.players import MyPlayer
from quarto.objects import Quarto
from IPython.display import clear_output

def main(reverse = False):
    game = Quarto()
    if reverse:
        game.set_players((RandomPlayer(game), MyPlayer(game))) 
    else:
        game.set_players((MyPlayer(game), RandomPlayer(game)))
    
    winner = game.run()
    clear_output(wait=False)
    print(f"main: Winner: player {winner}")
    return winner

N_MATCHES = 200

cnt = 0
for i in range(N_MATCHES):
    if i < N_MATCHES//2:  
        w = main()
        if w == 0:
            cnt += 1
    else:
        w = main(True)
        if w == 1:
            cnt += 1

print(f'WINRATE = {cnt / N_MATCHES}') 

I wanted to see if there was a difference between sorting all the possible actions (inside the function __safe_mode_place_piece(•)) in ascending or descending mode when the agent is in safe mode. I repropose the function:

 def __safe_mode_place_piece(self, piece: int, *, delay_minmax = True) -> tuple[int, int]:
        
        # Setup
        board = self.get_game().get_board_status()    
        possible_actions = self.__list_ply_score(board, piece)

        # Sort the action 
        if delay_minmax:
            possible_actions = sorted(possible_actions, key=lambda a: a[1])    # ascending score (MinMax delayed amap)
        else:
            possible_actions = sorted(possible_actions, key=lambda a: -a[1])   # descending score (MinMax asap) 

        # Check if the action is safe (if there are enough pieces to choose)
        idx = 0
        ply, score = possible_actions[idx]

        while (self.__check_place_ply(ply, piece) == False):
            idx += 1
            if idx < len(possible_actions): 
                ply, score = possible_actions[idx]
            else:
                break

        return ply

Looking at the results (table below) it doesn't seem to have a relevant impact on the performance of the agent, maybe becouse the opponent is a Random Agent and, obviously, the game statistics (WinRate and ExecutionTime/#matches) are not really stable.

# matches delay_minmax = True WinRate Execution time ExecutionTime / # Matches
20 True 0.95 6m 45.8s 20.25s
20 True 0.9 5m 14.2s 15.7s
20 True 1.00 1m 39.8s 4.95s
20 False 0.95 5m 52.0s 17.6s
20 False 1.00 6m 59.4s 20.95s
20 False 0.9 11m 10.8s 33.5s
100 True 0.92 6m 21.2s 3.81s
100 True 0.91 3m 24.5s 2.04s
100 False 0.91 3m 24.5s 2.04s
100 False 0.96 12m 14.0s 7.34s
200 False 0.975 37m 37.5s 11.29s
200 Ture 0.965 13m 1.4s 3.91s

We can observe that the timings don't follow a pattern but they are quite random, as just said.
We can take the averages:

delay_minmax Avg WinRate Avg ExecutionTime/#matches
True 0.924 8.44s
False 0.949 15.45s

It seems to have better time performances when we use the greedy sorting to deplay the MinMax algorithm but to observe better statistics it would be better to play against a less random agent.

Overall, the winrate varies between 0.9 and 1.00. The performances against the Random Agent are promising but it would be better to test the agent against a stronger and different agents in order to have a more robust feedback.

We can conclude that the performances against the Random Agent are good, despite the execution times being relatively slow for a machine. This is due to the MinMax algorithm that, dispite the prunings and the euristics to help it to be faster, it can still be slow sometimes.


4 - References

  1. Steven Morse, "Quarto, Part 1 (Theory)", 01/03/2017, [ link ]
  2. WayBackMachine, "Quarto", 12/10/2004, [ link ]


5 - Code (players.py)

from copy import deepcopy
import numpy as np
from .objects import Player, Quarto

WIN = 5
LOSE = -5
DROW = 1

dec_to_bin = {
    0: [0, 0, 0, 0],
    1: [0, 0, 0, 1],
    2: [0, 0, 1, 0],
    3: [0, 0, 1, 1],
    4: [0, 1, 0, 0],
    5: [0, 1, 0, 1],
    6: [0, 1, 1, 0],
    7: [0, 1, 1, 1],
    8: [1, 0, 0, 0],
    9: [1, 0, 0, 1],
    10: [1, 0, 1, 0],
    11: [1, 0, 1, 1],
    12: [1, 1, 0, 0],
    13: [1, 1, 0, 1],
    14: [1, 1, 1, 0],
    15: [1, 1, 1, 1]
}

class MyPlayer(Player):

    MINMAX_MAX_DEPTH = 2

    def __init__(self, quarto: Quarto) -> None:
        super().__init__(quarto)
        self.frontiers = self.get_frontiers()
        self.piece_to_give = -1    

    
    def choose_piece(self) -> int:
        # Best piece to give computed inside self.place_piece() with MinMax
        if self.piece_to_give != -1:
            p = self.piece_to_give
            self.piece_to_give = -1
            return p
        
        # Setup
        board = self.get_game().get_board_status()
        safe, threat = self.distinguish_pieces(board)

        # Threat mode -> MinMax
        if len(threat) > 0:
            piece, _ = self.min_max_choose_piece(board, 0)

        # Safe mode -> Heuristic strategy
        else:
            piece = self.__safe_mode_choose_piece(safe)

        return piece


    def place_piece(self) -> tuple[int, int]:
    
        # Setup
        board = self.get_game().get_board_status()
        piece_to_place = self.get_game().get_selected_piece()
        safe, threat = self.distinguish_pieces(board)
        
        # Threat mode -> MinMax
        if len(threat) > 0: 
            # If I have to place a threat piece, I win
            if piece_to_place in threat:   
                score_board = self.__board_intersection_scores(piece_to_place)      
                ply_ = np.unravel_index(np.argmax(score_board), score_board.shape)   # Max score on the winning position
                ply = (ply_[1], ply_[0])
                return ply

            # MinMax
            ply_, piece, _ = self.min_max_place_piece(board, piece_to_place, player = 0)
            ply = (ply_[1], ply_[0])
            if piece != -1:
                self.piece_to_give = piece  # No need to run MinMax again to choose the piece

        # Safe mode -> Heuristic strategy
        else:   
            ply_ = self.__safe_mode_place_piece(piece_to_place, delay_minmax = True)
            ply = (ply_[1], ply_[0])
        return ply

    @staticmethod
    def get_frontiers():
        """
            Generates all the frontiers for each possible combination of 1, 2 and 3 elements.

            Returns a dictionary frontiers with:
            - key = tuple containing the source pieces of the respetive frontiers. Could have len = 1, 2 or 3.
            - value: the frontier corresponding to the source pieces as a dictionary with:
                - key: number of attributes in common with the pieces inside the level of the frontier. key = 0, 1, 2 or 3.
                - value: pieces that share key elements in common with the source pieces.
        """
        
        def get_frontier(p: int):
            """ Frontiers with 1 piece """
            assert p >= 0 and p <= 15

            x = dec_to_bin[p]
            frontier = {
                0: [],
                1: [],
                2: [],
                3: []
            }

            for num, binary in dec_to_bin.items():
                if num == p:
                    continue
                cnt = 0
                for i, j in zip(x, binary):
                    if i == j:
                        cnt += 1
                frontier[cnt].append(num)
            
            return frontier

        def get_frontier_2d(a: int, b:int):
            """ Frontiers with 2 pieces """
            assert a >= 0 and a <= 15
            assert b >= 0 and b <= 15
            assert a != b

            x = dec_to_bin[a]
            y = dec_to_bin[b]
            frontier = {
                0: [],
                1: [],
                2: [],
                3: []
            }

            for num, binary in dec_to_bin.items():
                if num == a or num == b:
                    continue
                cnt = 0
                for x_i, y_i, j in zip(x, y, binary):
                    if x_i == j and y_i == j:
                        cnt += 1
                frontier[cnt].append(num)
            
            return frontier

        def get_frontier_3d(a: int, b:int, c: int):
            """ Frontiers with 3 pieces """
            assert a >= 0 and a <= 15
            assert b >= 0 and b <= 15
            assert c >= 0 and c <= 15
            assert a != b and a != c and b != c

            x = dec_to_bin[a]
            y = dec_to_bin[b]
            z = dec_to_bin[c]
            frontier = {
                0: [],
                1: [],
                2: [],
                3: []
            }

            for num, binary in dec_to_bin.items():
                if num == a or num == b or num == c:
                    continue
                cnt = 0
                for x_i, y_i, z_i, j in zip(x, y, z, binary):
                    if x_i == j and y_i == j and z_i == j:
                        cnt += 1
                frontier[cnt].append(num)
            
            return frontier

        frontier = {}

        # one piece -> 16
        for p in range(16):
            frontier[(p, )] = get_frontier(p)
        
        # two pieces -> 120 combinations
        for i in range(16):
            for j in range(i+1, 16):
                frontier[(i, j)] = get_frontier_2d(i, j)

        # three pieces -> 560 combinations
        for i in range(16):
            for j in range(i+1, 16):
                for k in range(j+1, 16):
                    frontier[(i, j, k)] = get_frontier_3d(i, j, k)
                    
        return frontier
  

    def distinguish_pieces(self, board):
        """ Returns a list of safe pieces and a list of threat pieces """

        threat = set()

        # Rows
        for row in range(board.shape[0]):
            k = tuple(sorted([p for p in board[row, :] if p != -1])) # frontier source (key)
            if len(k) == 3: # Threat line
                k_threat = set(self.frontiers[k][1]) | set(self.frontiers[k][2])  # pieces with attributes in common with all of 3
                threat |= k_threat

        # Cols
        for col in range(board.shape[1]):
            k = tuple(sorted([p for p in board[:, col] if p != -1]))
            if len(k) == 3: # Threat line
                k_threat = set(self.frontiers[k][1]) | set(self.frontiers[k][2])  # pieces with attributes in common with all of 3
                threat |= k_threat
        
        # Diag 1
        k = tuple(sorted([p for p in board.diagonal() if p != -1]))    
        if len(k) == 3: # Threat line
            k_threat = set(self.frontiers[k][1]) | set(self.frontiers[k][2])  # pieces with attributes in common with all of 3
            threat |= k_threat

        # Diag 2   
        k = tuple(sorted([p for p in np.fliplr(board).diagonal() if p != -1]))  
        if len(k) == 3: # Threat line
            k_threat = set(self.frontiers[k][1]) | set(self.frontiers[k][2])  # pieces with attributes in common with all of 3
            threat |= k_threat   
        

        # Clean up
        safe = set(range(16))
        placed = set([int(p) for p in np.nditer(board) if p != -1])
        threat -= placed
        safe -= placed
        safe -= threat
        
        return list(safe), list(threat)


    def __board_intersection_scores(self, piece: int):
        """
            Computes the mapping of the board with scores in it using the frontiers
        """

        def extract_source(board_slice) -> tuple:
            """ Extracts the a key for self.frontiers """
            k = tuple(sorted([p for p in board_slice if p != -1]))
            return k

        def compute_score(frontier) -> int:
            rank = -1                           # Frontier rank = number of common attributes with piece
            for i in range(4):
                if piece in frontier[i]:
                    rank = i
                    break

            rescaling = 10**(len(k) - 1)
            score = rank * rescaling
            return score
            

        # Setup
        board = self.get_game().get_board_status()
        board_scores = np.zeros(shape=(Quarto.BOARD_SIDE, Quarto.BOARD_SIDE), dtype=int)   # Initialized mapped board

        # Rows
        for row in range(board.shape[0]):
            k = extract_source(board[row, :])   # Extract the frontier's source
            if k == () or len(k) == 4:
                continue 
            score = compute_score(self.frontiers[k]) 
            board_scores[row, :] += score
        
        # Cols
        for col in range(board.shape[1]):
            k = extract_source(board[:, col])       
            if k == () or len(k) == 4:
                continue 
            score = compute_score(self.frontiers[k]) 
            board_scores[:, col] += score      

        # Diag 1
        k = extract_source(board.diagonal())    
        if k != () and len(k) != 4:
            score = compute_score(self.frontiers[k]) 
            tmp = np.zeros((4, 4), dtype=int)   # Update the diagonal
            np.fill_diagonal(tmp, score)
            board_scores += tmp

        # Diag 2   
        k = extract_source(np.fliplr(board).diagonal())       
        if k != () and len(k) != 4:
            score = compute_score(self.frontiers[k]) 
            tmp = np.zeros((4, 4), dtype=int)      
            np.fill_diagonal(np.fliplr(tmp), score)
            board_scores += tmp

        # Clean up board_scores
        mask = np.array(board == -1, dtype=int)     # 0 if there is a piece in the i,j board coordinate, 1 otherwise
        board_scores = board_scores * mask

        return board_scores

 
    def __safe_mode_choose_piece(self, safe_pool) -> int:
        """ 
            Retrieves the piece with the lowest total score computed using the frontiers.
            This should be the piece that gives to the opponent the lowest advantage.
        """
        best_piece = -1
        best_score = 500
        
        for p in iter(safe_pool):   # Min search
            board_score = self.__board_intersection_scores(p)
            score = board_score.sum()
            if score < best_score:
                best_piece = p

        return best_piece


    def __safe_mode_place_piece(self, piece: int, *, delay_minmax = True) -> tuple[int, int]:
        
        # Setup
        board = self.get_game().get_board_status()    
        possible_actions = self.__list_ply_score(board, piece)

        # Sort the action 
        if delay_minmax:
            possible_actions = sorted(possible_actions, key=lambda a: a[1])    # ascending score (MinMax delayed amap)
        else:
            possible_actions = sorted(possible_actions, key=lambda a: -a[1])   # descending score (MinMax asap) 

        # Check if the action is safe (if there are enough pieces to choose)
        idx = 0
        ply, score = possible_actions[idx]

        while (self.__check_place_ply(ply, piece) == False):
            idx += 1
            if idx < len(possible_actions): 
                ply, score = possible_actions[idx]
            else:
                break

        return ply


    def __list_ply_score(self, board, piece):
        """
            List all the possible positions where to place the piece with score as tuples ((x,y), score)
        """
        possible_actions = []
        score_board = self.__board_intersection_scores(piece)

        for row in range(board.shape[0]):
            for col in range(board.shape[1]):
                if board[row, col] == -1:       # Possible action
                    possible_actions.append( ((row, col), score_board[row, col]) )
        
        return possible_actions


    def __check_place_ply(self, ply: tuple[int, int], piece: int, *, safe_threshold = 1):
        """ 
            Returns True if there are enough safe pieces to choose after placing 
            the given piece on board using the ply coordinates.
        """
        board = self.get_game().get_board_status() 

        assert board[ply[0], ply[1]] == -1  # The ply must be legal
        
        board[ply[0], ply[1]] = piece
        safe, _ = self.distinguish_pieces(board)

        # I want enough safe pieces to choose after having placed the given piece on the board
        if len(safe) >= safe_threshold:
            return True
        else:
            return False


    def min_max_place_piece(self, board_now, piece, player, cnt=0):
        """
            MinMax algorithm with alpha-beta pruning, deep pruning, horizontal pruning and greedy sorting for placing a piece.

            The greedy sorting tryes to speed up MinMax by sorting the possible actions by 
            the corresponding score computed with self.__list_ply_score(•).
            IDEA: Expand first the actions that could lead faster to a winning/drow condition,
            hoping to favor the alpha-beta pruning. 

            Args:
            - board -> current state of the board (numpy.ndarray)
            - piece -> piece to place (int)
            - player -> 0: my agent, 1: opponent

            Returns:
            - ply = coordinates as tuple es. (x, y)
            - piece = best piece to give associated with the best ply
            - score -> WIN = 5, LOSE = -5, DROW = 1
        """

        possible_actions = self.__list_ply_score(board_now, piece)          # Descending score:
        possible_actions = sorted(possible_actions, key=lambda a: a[1])     # Greedy sorting to speed up alpha-beta pruning

        if cnt > self.MINMAX_MAX_DEPTH:
            return possible_actions[0], -1, 0

        possible_actions = [a for a in possible_actions if a[1] > 0]        # Horizontal pruning
        ply_drow = (-1, -1)
        piece_drow = -1
        best_score = 0

        for ply, _ in iter(possible_actions):
            if board_now[ply[0], ply[1]] != -1:     # Check that the ply is legal
                continue
            board = deepcopy(board_now)             # Ply     
            board[ply[0], ply[1]] = piece
            
            if self.check_finished(board):               # Termination
                if player == 0:
                    return ply, -1, WIN
                else:
                    return ply, -1, LOSE
            
            piece, score = self.min_max_choose_piece(board, player, cnt)  # Recursion
            
            
            if score == WIN and player == 0:        # Alpha-Beta Pruning
                return ply, piece, WIN
            if score == LOSE and player == 1:
                return ply, piece, LOSE
            # If drow or else, keep looking for a better ply
            ply_drow = ply
            piece_drow = piece
            if score == DROW:
                best_score = DROW

        # No winning condition -> drow 
        return ply_drow, piece_drow, best_score


    def min_max_choose_piece(self, board, player, cnt=0):
        """
            MinMax algorithm for choosing a piece with alpha-beta pruning and greedy sorting.

            The greedy sorting tryes to speed up MinMax by sorting the safe pieces (the only one expanded)
            by decreasing score. The score is computed as sum of the scores inside the board_score computed 
            by self.__board_intersection_scores(p) for each piece p in the safe pool.
            IDEA: look for pieces that offer more aggressive plays hoping to find faster a winning/drow condition
            in oreder to assist the alpha-beta pruning.

            Args:
            - board -> current state of the board, after having placed the previous given piece (np.ndarray)
            - player -> 0: my agent, 1: opponent

            Returns:
            - piece = choosen piece
            - score -> WIN = 5, LOSE = -5, DROW = 1
        """

        safe, threat = self.distinguish_pieces(board)   # Threat and safe pieces pools

        # Terminations
        if len(safe) + len(threat) == 0:
            return -1, DROW

        if len(safe) == 0:
            if player == 0:
                return threat[0], LOSE
            else:
                return threat[0], WIN
        
        if cnt > self.MINMAX_MAX_DEPTH:
            return safe[0], 0

        # Greedy sorting to speed up MinMax pruning
        piece_score = []
        for p in iter(safe):
            board_score = self.__board_intersection_scores(p)
            piece_score.append((p, board_score.sum()))
        piece_score = sorted(piece_score, key=lambda x: -x[1])   # Descending order

        # Recursion
        best_piece = -1
        best_score = []
        for piece, _ in iter(piece_score):
            _, _, score = self.min_max_place_piece(board, piece, (player + 1) % 2, cnt+1)  
            
            if player == 0 and score == WIN:        # Alpha-Beta Pruning
                return piece, WIN
            if player == 1 and score == LOSE:
                return piece, LOSE
            # If drow or else, keep looking
            best_piece = piece
            if score == DROW:
                best_score = DROW

        # No winning condition -> drow 
        return best_piece, best_score


    def check_finished(self, board):
        """ Returns True if the board is a winning condition, False otherwise """

        def __check(k):
            for attribute in range(4):
                sum = 0
                for idx in range(4):                      
                    sum += dec_to_bin[ k[idx] ][attribute]
                if sum == 4 or sum == 0:    # Winning condition
                    return True

        # Rows
        for row in range(board.shape[0]):
            k = tuple(sorted([p for p in board[row, :] if p != -1]))
            if len(k) == 4 and __check(k):
                return True

        # Cols
        for col in range(board.shape[1]):
            k = tuple(sorted([p for p in board[:, col] if p != -1]))
            if len(k) == 4 and __check(k):
                return True

        # Diag 1
        k = tuple(sorted([p for p in board.diagonal() if p != -1]))
        if len(k) == 4 and __check(k):
            return True

        # Diag 2
        k = tuple(sorted([p for p in np.fliplr(board).diagonal() if p != -1]))
        if len(k) == 4 and __check(k):
            return True    
                        
        return False