This repository presents a hybrid approach combining Reinforcement Learning (RL) with the Tabucol, which is a version of tabu search specifically designed for the Graph Coloring Problem (GCP), enhanced by Graph Neural Networks (GNNs), to tackle the Graph Coloring Problem(GCP).
Let
The chromatic number of
self.action_space = spaces.Tuple((spaces.Discrete(self.N), spaces.Discrete(self.k)))
The default reward for per step is -1e-4. The reward consists of 2 parts, RL reward and Heuristic(Tabucol) reward.
In the RL reward, we consider two main factors, number of conflicts and number of colors used.
The reward function considering these two factors is as follows.
# 2 factors: # of conflicts and # of colors used
del_conflicts = conflicts - new_conflicts
del_colors_used = self.n_colors_used - n_colors_used
# reward = -lambda*(f(s)-f(s')) + mu*(C(s)-C(s'))
# We excluded color factor
_lambda = 0.01
_mu = 0
self.cur_reward = -_lambda * del_conflicts + _mu * del_colors_used
In the Heuristic(Tabucol) reward, it's calculated by the difference between the number of conflicts after
The final reward considering these 2 rewards is as follows.
The Q-Network processes these combined features through multiple layers, including GNN layers for message passing and feature aggregation, followed by fully connected layers. The final output layer produces Q-values corresponding to each (node, color) pair, representing the expected future rewards of selecting that specific action in the current state.
The node and color features are concatenated to form comprehensive (node, color) pair representations. These combined features are then passed through multiple GNN layers, particularly Graph Convolutional Network (GCN) layers, within our Q-Network. In these GCN layers, message passing is performed to aggregate and update feature representations based on the graph's connectivity, effectively capturing both local and global structural information. This process results in an updated feature graph that integrates the nuanced interactions between nodes and colors.
(Update soon..)
Our project is primarily developed in Python, with the heuristic algorithm for Tabucol implemented in C++.
The reinforcement learning framework is built using CleanRL and extensively utilizes PyTorch.
Seok-Jin Kwon / IyLias