Skip to content

Latest commit

 

History

History
177 lines (112 loc) · 8.39 KB

README.md

File metadata and controls

177 lines (112 loc) · 8.39 KB

hyperopt

Using Genetic algorithms and Apache Spark for hyperparameter optimization of Keras/TensorFlow models

When you build a model with Keras, one question you need to answer, unless it is for a well known problem and dataset, is "What values for the hyperparameters to use?". By hyperparameters, of course I mean things like "learning rate", "dropout ratio", "regularization coefficient", etc.

In this tutorial, we are going to use evolutionary, a.k.a. genetic algorithms (GA) to answer this question. Relax, you don't need to be a PhD in order to use these algorithms (although, it would probably help). There are several GA packages in the Python world in various states of adoption and support, but I personally found the DEAP excellent for my purposes. You can install it by using pip/pip3, but the official repository has an important PR (#76) yet to be merged. You can install it from the Jonathan Brant's fork with:

pip install git+git://github.com/jbrant/deap.git

First, let see how actually the GA work. Basically, the steps are:

  1. Generate a population of "individuals"

  2. Evaluate each individual, i.e. get its "fitness"

  3. Select a bunch of "individuals" to generate the next population. The selection process could vary, but in general - we aim to have the individuals with the best "fitness", to be the "parents" of the next generation.

  4. Perform some mutations and crossover operations on the selected individuals. The result of these operations are new individuals that will form the new generation

  5. Repeat steps 2-4 until you gat satisfactory results

Unlike other packages, providing a number of ready to use optimization problems, DEAP is utilizing the power of Python meta-programming features, to let you define your optimization problem yourself. DEAP documentation is good and I highly recommend it, but here is what you need to do in general:

  1. Define your "individuals". An "individual is an instance of the "thing" you are trying to optimize. In our case that would be an instance of a Keras model
    creator.create("FitnessMax", base.Fitness, weights=(1.0,))
    creator.create("Individual", IndividualFloat, fitness=creator.FitnessMax)

We create a class "Individual" with a property "fitness". Now, in the standalone version we can inherit the Keras model class (NnModel, see the nn_model.py), but it needs to be serializable in order to use it with Spark. Since Keras is notoriously known for not playing well with serialization - we define a class "IndividualFloat" which will keep the learning rate. The weights of the fitness property is a tuple with only one element - because we are optimizing only one hyperparameter. The value of the fitness, returned by the evaluate() function, will be the model accuracy, and since we want to maximize it - the weight is pozitive (1.0) The model itself will be a feed-forward MLP neural network taken straight from the Keras examples.

  1. Define the "Population". The population represents a number of individuals, evaluated in the current "generation".

We crate a function that will initialize each individual:

def rnd_lr(min, max):
    a = int(math.log10(min))
    b = int(math.log10(max))
    r = np.random.randint(a, b) * np.random.rand()
    return math.pow(10, r)


def init_individual(_class, lr):
    ind = _class(lr())
    return ind

toolbox.register("attr_lr", rnd_lr, LR_MIN, LR_MAX)
toolbox.register("init_individual", init_individual, creator.Individual, lr=toolbox.attr_lr)
toolbox.register("population", tools.initRepeat, list, toolbox.init_individual)

Starting with the last line: we define the population as a list of "model" elements. For each "model" the function init_individual(lr) will be called upon creation, and the parameter lr will be generated by another function - attr_lr. attr_lr on the other hand, will call the function rnd_lr() with parameters LR_MIN and LR_MAX. 'init_individual()' returns a new IndividualFloat instance, initialized with some random LearningRate value.

  1. Define the "evaluation" function. This is a function which returns the "fitness" of the individual. In our case that could be the value of the model's "accuracy" or "loss"
def eval_individual(ind):
    model = NnModel(ind.value)
    score = model.evaluate()
    return (score, )

toolbox.register("evaluate", eval_individual)

The evaluate() function we created above will be used by the GA algorithm internally. It will actually call eval_individual(), which will create new Keras model with the given learning rate, evaluate the model and return the prediction accuracy.

  1. Define the "mutation" function. This is a function which, given an individual, returns a new individual where the value of the parameter we want to optimize is changed. In our case - a new model with a new value for the Learning Rate
def mutate_individual(_ind):
    new_lr = toolbox.attr_lr()
    return creator.Individual(new_lr),

toolbox.register("mutate", mutate_individual)

We just return a new individual initialized with a new learning rate value.

  1. Define the "crossover" function. "Crossover" is combining the "genes" of two individuals in order to get new individual(s). In our case, for example, we can take the learning rate of old_individual_1 and the dropout ratio of old_individual_2, and combine them in new_individual_1. Saimilarly, for the new_individual_2 - we take the dropout ratio from old_individual_1 and the learning rate from old_individual_2. To keep the exmple even more simple, I'll just do lr1-lr2 for the first and lr1+lr2 for the second new individual.

The mutation function in the toolbox is called "mate":

def crossover_individuals(ind1, ind2):
    return creator.Individual(abs(ind1.value-ind2.value)), creator.Individual(ind1.value+ind2.value)

toolbox.register("mate", crossover_individuals)
  1. Define the "select" function. This function encapsulates the selection strategy for the individuals used as parents of the next generation:
toolbox.register("select", tools.selBest)

We are using the built-in function selBest, but you can replace it with your own, if you like.

There are only two things left - the HallOfFame, which will keep our best result and the Statistics object, which will print some useful stats:

hof = tools.HallOfFame(1)  # keeps only one (the best) result

stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("avg", np.mean, axis=0)
stats.register("std", np.std, axis=0)
stats.register("min", np.min, axis=0)
stats.register("max", np.max, axis=0)

Wth all this done, we can create the first population and call one of the built in GA algorithms:

pop = toolbox.population(n=10)  # 10 individuals per population
algorithms.eaSimple(pop, toolbox, CXPB, MUTPB, NGEN, stats=stats, halloffame=hof, verbose=__debug__)

Above the CXPB is the probability for individual to be crossed and the MUTPB - the probability for the individual to be mutated. NGEN is the number of generation we will run the algorithm. Of course, you can define your own algorithm, but I'll leave this to the documentation.

Now, we are evaluating neural networks here, which is a computationally demanding process, so it would be a good idea to make it distributed. In order to achieve this, we need to redefine the toolbox "map" fnction. toolbox.map defines how the evaluation is apllied to the population's individuals. Within eaSimple, the map does the following:

fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)

# and it is registered as:
self.register("map", map)

So, in the standalone version, it just calls the evaluate() for each individual in the population. In the distributed version, we are going to parallelize the population and apply the evaluate() function on each partition:

def sparkMap(eval_func, population):
    return sc.parallelize(population).map(eval_func).collect()

toolbox.register("map", sparkMap)

In the case of eaSimple, the eval_func parameter will be the toolbox.evaluate

And, that's it! You now have a distributed genetic algorithm for hyperparameter optimization.

You can find the above code at: [https://github.com/vascokk/hyperopt], and if you run hyperopt_dist.py, you should see something like this in the logs:

======================= Hall Of Fame =======================

Best LR: 0.00394039675753813

Best score: 0.9641

============================================================

Happy hacking! :)