Making things more compatible with tf.keras.Model, increase flexibi…

…lity with rapid-prototyping of optimizers (#12) * updating on tom_dev * string issue * logger name issue * adding cifar imagenet transfer learning * damping parameter * damping issue * updating transfer learning driver * updating cifar10 driver * adding seed to logger name * updating * wrong size * updating * architecture is incorrect * another error in the weights * unsetting visible devices * cuda devices * starting to add keras Model wrapper stuff to hessianlearn * updating the preconditioner due to eager issues * inferring dtype when its not passed in * updating adam * updating * updating incg * updating problem * checkpointing work on multi input output keras Model compatibility * weighted sum of losses has been implemented now * checkpointing work on kerasModelWrapper that streamlines the nn training without old hessianlearn baggage * updating with a working prototype of the kerasModelWrapper * updating getting close to merging the PR * updating * updating * getting close to merging * modifying tf version for unit tests * modifying tf version for unit tests * updating the unit tests to suppress all of tensorflows nonsense * updating the unit test Co-authored-by: Tom OR <tom.olearyroseberry@utexas.edu>
tomoleary · Dec 15, 2021 · 9f5c3bc · 9f5c3bc
1 parent 021d1c1
commit 9f5c3bc
Show file tree

Hide file tree

Showing 16 changed files with 1,022 additions and 48 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -17,8 +17,8 @@
 
 language: python
 python:
-  - "3.5"
   - "3.6"
+  - "3.7"
 install:
   - sudo apt-get update
   - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
@@ -31,7 +31,7 @@ install:
   # # Useful for debugging any issues with conda
   # - conda info -a
   # Replace dep1 dep2 ... with your dependencies
-  - conda create -n hessianlearn2 python=$TRAVIS_PYTHON_VERSION tensorflow scipy
+  - conda create -n hessianlearn2 python=$TRAVIS_PYTHON_VERSION tensorflow=2.0.0 scipy
   - conda activate hessianlearn2
   # # - python setup.py install
 script:

diff --git a/README.md b/README.md
@@ -63,12 +63,15 @@ Set `HESSIANLEARN_PATH` environmental variable
 Train a keras model
 
 ```python
+import os,sys
 import tensorflow as tf
 sys.path.append( os.environ.get('HESSIANLEARN_PATH'))
 from hessianlearn import *
 
 # Define keras neural network model
 neural_network = tf.keras.models.Model(...)
+# Define loss function and compile model
+neural_network.compile(loss = ...)
 
 ```
 
@@ -77,7 +80,9 @@ hessianlearn implements various training [`problem`](https://github.com/tomolear
 ```python
 # Instantiate the problem (this handles the loss function,
 # construction of hessian and gradient etc.)
-problem = RegressionProblem(neural_network,dtype = tf.float32)
+# KerasModelProblem extracts loss function and metrics from
+# a compiled keras model
+problem = KerasModelProblem(neural_network)
 # Instantiate the data object, this handles the train / validation split
 # as well as iterating during training
 data = Data({problem.x:x_data,problem.y_true:y_data},train_batch_size,\
@@ -94,6 +99,40 @@ HLModel = HessianlearnModel(problem,regularization,data)
 HLModel.fit()
 ```
 
+### Alternative Usage (More like Keras Interface)
+The example above was the original way the optimizer interface was implemented in hessianlearn, however to better mimic the keras interface and allow for more end-user rapid prototyping of the optimizer that is used to fit data, as of December 2021, the following way has been created
+
+```python
+import os,sys
+import tensorflow as tf
+sys.path.append( os.environ.get('HESSIANLEARN_PATH'))
+from hessianlearn import *
+
+# Define keras neural network model
+neural_network = tf.keras.models.Model(...)
+# Define loss function and compile model
+neural_network.compile(loss = ...)
+# Instance keras model wrapper which deals with the 
+# construction of the `problem` which handles the construction
+# of Hessian computational graph and variables
+HLModel = KerasModelWrapper(neural_network)
+# Then the end user can pass in an optimizer 
+# (e.g. custom end-user optimizer)
+optimizer = LowRankSaddleFreeNewton # The class constructor, not an instance
+optparameters = LowRankSaddleFreeNewtonParameters()
+optimizer_parameters['hessian_low_rank'] = 40
+HLModel.set_optimizer(optimizer,optimizer_parameters = optparameters)
+# The data object still needs to key on to the specific computational
+# graph variables that data will be passed in for.
+# Note that data can naturally handle multiple input and output data,
+# in which case problem.x, problem.y_true are lists corresponding to
+# neural_network.inputs, neural_network.outputs
+problem = HLModel.problem
+data = Data({problem.x:x_data,problem.y_true:y_data},train_batch_size,\
+	validation_data_size = validation_data_size)
+# And finally one can call fit!
+HLModel.fit(data)
+```
 
 ## Examples
 
@@ -108,7 +147,7 @@ These publications motivate and use the hessianlearn library for stochastic nonc
 [**Inexact Newton Methods for Stochastic Nonconvex Optimization with Applications to Neural Network Training**](https://arxiv.org/abs/1905.06738).
 arXiv:1905.06738.
 ([Download](https://arxiv.org/pdf/1905.06738.pdf))<details><summary>BibTeX</summary><pre>
-@article{o2019inexact,
+@article{OLearyRoseberryAlgerGhattas2019,
   title={Inexact Newton methods for stochastic nonconvex optimization with applications to neural network training},
   author={O'Leary-Roseberry, Thomas and Alger, Nick and Ghattas, Omar},
   journal={arXiv preprint arXiv:1905.06738},
@@ -117,10 +156,10 @@ arXiv:1905.06738.
 }</pre></details>
 
 - \[2\] O'Leary-Roseberry, T., Alger, N., Ghattas O.,
-[**Low Rank Saddle Free Newton**](https://arxiv.org/abs/2002.02881).
+[**Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex Optimization**](https://arxiv.org/abs/2002.02881).
 arXiv:2002.02881.
 ([Download](https://arxiv.org/pdf/2002.02881.pdf))<details><summary>BibTeX</summary><pre>
-@article{o2020low,
+@article{OLearyRoseberryAlgerGhattas2020,
   title={Low Rank Saddle Free Newton: Algorithm and Analysis},
   author={O'Leary-Roseberry, Thomas and Alger, Nick and Ghattas, Omar},
   journal={arXiv preprint arXiv:2002.02881},
@@ -133,11 +172,14 @@ arXiv:2002.02881.
 [**Derivative-Informed Projected Neural Networks for High-Dimensional Parametric Maps Governed by PDEs**](https://arxiv.org/abs/2011.15110).
 arXiv:2011.15110.
 ([Download](https://arxiv.org/pdf/2011.15110.pdf))<details><summary>BibTeX</summary><pre>
-@article{o2020derivative,
-  title={Derivative-Informed Projected Neural Networks for High-Dimensional Parametric Maps Governed by PDEs},
-  author={O'Leary-Roseberry, Thomas and Villa, Umberto and Chen, Peng and Ghattas, Omar},
-  journal={arXiv preprint arXiv:2011.15110},
-  year={2020}
+@article{OLearyRoseberryVillaChenEtAl2022,
+  title={Derivative-informed projected neural networks for high-dimensional parametric maps governed by {PDE}s},
+  author={O’Leary-Roseberry, Thomas and Villa, Umberto and Chen, Peng and Ghattas, Omar},
+  journal={Computer Methods in Applied Mechanics and Engineering},
+  volume={388},
+  pages={114199},
+  year={2022},
+  publisher={Elsevier}
 }
 }</pre></details>
 

diff --git a/applications/transfer_learning/imagenet_cifar100_classification_evaluate_test.py b/applications/transfer_learning/imagenet_cifar100_classification_evaluate_test.py
@@ -114,7 +114,6 @@
 
 pretrained_resnet50 = tf.keras.applications.resnet50.ResNet50(weights = 'imagenet',include_top=False,input_tensor=input_tensor)
 
-
 for layer in pretrained_resnet50.layers[:143]:
     layer.trainable = False
 

diff --git a/applications/transfer_learning/imagenet_cifar10_classification_evaluate_test.py b/applications/transfer_learning/imagenet_cifar10_classification_evaluate_test.py
@@ -0,0 +1,180 @@
+# This file is part of the hessianlearn package
+#
+# hessianlearn is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or any later version.
+#
+# hessianlearn is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public License
+# If not, see <http://www.gnu.org/licenses/>.
+#
+# Author: Tom O'Leary-Roseberry
+# Contact: tom.olearyroseberry@utexas.edu
+
+
+import numpy as np
+import os
+os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
+os.environ['KMP_DUPLICATE_LIB_OK']='True'
+os.environ["KMP_WARNINGS"] = "FALSE" 
+# os.environ['CUDA_VISIBLE_DEVICES'] = '1'
+import pickle
+import tensorflow as tf
+import time, datetime
+# if int(tf.__version__[0]) > 1:
+#   import tensorflow.compat.v1 as tf
+#   tf.disable_v2_behavior()
+
+
+# Memory issue with GPUs
+gpu_devices = tf.config.experimental.list_physical_devices('GPU')
+for device in gpu_devices:
+    tf.config.experimental.set_memory_growth(device, True)
+# Load hessianlearn library
+import sys
+sys.path.append( os.environ.get('HESSIANLEARN_PATH', "../../"))
+from hessianlearn import *
+
+# Parse run specifications
+from argparse import ArgumentParser
+
+parser = ArgumentParser(add_help=True)
+parser.add_argument("-optimizer", dest='optimizer',required=False, default = 'lrsfn', help="optimizer type",type=str)
+parser.add_argument('-fixed_step',dest = 'fixed_step',\
+					required= False,default = 1,help='boolean for fixed step vs globalization',type = int)
+parser.add_argument('-alpha',dest = 'alpha',required = False,default = 1e-4,help= 'learning rate alpha',type=float)
+parser.add_argument('-hessian_low_rank',dest = 'hessian_low_rank',required= False,default = 40,help='low rank for sfn',type = int)
+parser.add_argument('-record_spectrum',dest = 'record_spectrum',\
+					required= False,default = 0,help='boolean for recording spectrum',type = int)
+# parser.add_argument('-weight_burn_in',dest = 'weight_burn_in',\
+# 					required= False,default = 0,help='',type = int)
+
+# parser.add_argument('-data_seed',dest = 'data_seed',\
+# 					required= False,default = 0,help='',type = int)
+
+parser.add_argument('-batch_size',dest = 'batch_size',required= False,default = 32,help='batch size',type = int)
+parser.add_argument('-hess_batch_size',dest = 'hess_batch_size',required= False,default = 8,help='hess batch size',type = int)
+parser.add_argument('-keras_epochs',dest = 'keras_epochs',required= False,default = 50,help='keras_epochs',type = int)
+parser.add_argument("-keras_opt", dest='keras_opt',required=False, default = 'adam', help="optimizer type for keras",type=str)
+parser.add_argument('-keras_alpha',dest = 'keras_alpha',required= False,default = 1e-3,help='keras learning rate',type = float)
+parser.add_argument('-max_sweeps',dest = 'max_sweeps',required= False,default = 1,help='max sweeps',type = float)
+parser.add_argument('-weights_file',dest = 'weights_file',required= False,default = 'None',help='weight file pickle',type = str)
+
+args = parser.parse_args()
+
+try:
+  tf.set_random_seed(0)
+except:
+  tf.random.set_seed(0)
+
+# GPU Environment Details
+gpu_availabe = tf.test.is_gpu_available()
+built_with_cuda = tf.test.is_built_with_cuda()
+print(80*'#')
+print(('IS GPU AVAILABLE: '+str(gpu_availabe)).center(80))
+print(('IS BUILT WITH CUDA: '+str(built_with_cuda)).center(80))
+print(80*'#')
+
+settings = {}
+# Set run specifications
+# Data specs
+settings['batch_size'] = args.batch_size
+settings['hess_batch_size'] = args.hess_batch_size
+
+
+################################################################################
+# Instantiate data
+(x_train, y_train), (_x_test, _y_test) = tf.keras.datasets.cifar10.load_data()
+
+# # Normalize the data
+# x_train = x_train.astype('float32') / 255.
+# x_test = x_test.astype('float32') / 255.
+
+x_train = tf.keras.applications.resnet50.preprocess_input(x_train)
+x_test_full = tf.keras.applications.resnet50.preprocess_input(_x_test)
+x_val = x_test_full[:2000]
+x_test = x_test_full[2000:]
+
+y_train = tf.keras.utils.to_categorical(y_train)
+y_test_full = tf.keras.utils.to_categorical(_y_test)
+y_val = y_test_full[:2000]
+y_test = y_test_full[2000:]
+
+################################################################################
+# Create the neural network in keras
+
+# tf.keras.backend.set_floatx('float64')
+
+resnet_input_shape = (200,200,3)
+input_tensor = tf.keras.Input(shape = resnet_input_shape)
+
+pretrained_resnet50 = tf.keras.applications.resnet50.ResNet50(weights = 'imagenet',include_top=False,input_tensor=input_tensor)
+
+for layer in pretrained_resnet50.layers[:143]:
+    layer.trainable = False
+
+classifier = tf.keras.models.Sequential()
+classifier.add(tf.keras.layers.Input(shape=(32,32,3)))
+classifier.add(tf.keras.layers.Lambda(lambda image: tf.image.resize(image, resnet_input_shape[:2])))
+classifier.add(pretrained_resnet50)
+classifier.add(tf.keras.layers.Flatten())
+classifier.add(tf.keras.layers.BatchNormalization())
+classifier.add(tf.keras.layers.Dense(64, activation='relu'))
+classifier.add(tf.keras.layers.Dropout(0.5))
+classifier.add(tf.keras.layers.BatchNormalization())
+classifier.add(tf.keras.layers.Dense(10, activation='softmax'))
+
+
+if args.keras_opt == 'adam':
+    optimizer = tf.keras.optimizers.Adam(learning_rate = args.keras_alpha,epsilon = 1e-8)
+elif args.keras_opt == 'sgd':
+    optimizer = tf.keras.optimizers.SGD(learning_rate=args.keras_alpha)
+else: 
+    raise
+
+classifier.compile(optimizer=optimizer,
+                  loss=tf.keras.losses.CategoricalCrossentropy(from_logits = True),
+                  metrics=['accuracy'])
+
+loss_test_0, acc_test_0 = classifier.evaluate(x_test,y_test,verbose=2)
+print('acc_test = ',acc_test_0)
+loss_val_0, acc_val_0 = classifier.evaluate(x_val,y_val,verbose=2)
+print('acc_val = ',acc_val_0)
+
+
+if args.weights_file is not 'None':
+    try:
+        logger = open(args.weights_file, 'rb')
+        best_weights = pickle.load(logger)['best_weights']
+        for layer_name,weight in best_weights.items():
+            classifier.get_layer(layer_name).set_weights(weight)
+    except:
+        print('Issue loading best weights')
+
+loss_test_final, acc_test_final = classifier.evaluate(x_test,y_test,verbose=2)
+print('acc_test final = ',acc_test_final)
+loss_val_final, acc_val_final = classifier.evaluate(x_val,y_val,verbose=2)
+print('acc_val final = ',acc_val_final)
+
+################################################################################
+# Evaluate again on all the data.
+(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
+
+# # Normalize the data
+# x_train = x_train.astype('float32') / 255.
+# x_test = x_test.astype('float32') / 255.
+
+x_train = tf.keras.applications.resnet50.preprocess_input(x_train)
+x_test = tf.keras.applications.resnet50.preprocess_input(x_test)
+
+y_train = tf.keras.utils.to_categorical(y_train)
+y_test = tf.keras.utils.to_categorical(y_test)
+
+loss_test_total, acc_test_total = classifier.evaluate(x_test,y_test,verbose=2)
+print(80*'#')
+print('After hessianlearn training'.center(80))
+print('acc_test_total = ',acc_test_total)
diff --git a/hessianlearn/algorithms/adam.py b/hessianlearn/algorithms/adam.py
@@ -89,14 +89,15 @@ def minimize(self,feed_dict = None):
 		gradient = self.sess.run(self.grad,feed_dict = feed_dict)
 
 		self.m = self.parameters['beta_1']*self.m + (1-self.parameters['beta_1'])*gradient 
-		# m_hat = [m/(1 - self.parameters['beta_1']**self.iter) for m in self.m]
+		m_hat = self.m / (1.0 - self.parameters['beta_1']**self._iter)
 
 		g_sq_vec = np.square(gradient) 
 		self.v = self.parameters['beta_2']*self.v + (1-self.parameters['beta_2'])*g_sq_vec 
-		v_root = np.sqrt(self.v)
+		v_hat = self.v / (1.0 - self.parameters['beta_2']**self._iter)
+		v_root = np.sqrt(v_hat)
 
 
-		update = -alpha*self.m/(v_root +self.parameters['epsilon'])
+		update = -alpha*m_hat/(v_root +self.parameters['epsilon'])
 		self.p = update
 		self._sweeps += [1,0]
 		self.sess.run(self.problem._update_ops,feed_dict = {self.problem._update_placeholder:update})

diff --git a/hessianlearn/algorithms/inexactNewtonCG.py b/hessianlearn/algorithms/inexactNewtonCG.py
@@ -137,13 +137,12 @@ def minimize(self,feed_dict = None,hessian_feed_dict = None):
 			if not self.trust_region_initialized:
 				self.initialize_trust_region()
 			# Set trust region radius
-			self.cg_solver.set_trust_region_radius(self.trust_region.radius)
-			p,on_boundary = self.cg_solver.solve(-gradient,feed_dict)
-			self._sweeps += [1,2*self.cg_solver.iter]
-			self.p = p
+			self.cg_solver.set_trust_region_radius(self.trust_region.radius)		
 			# Solve for candidate step
 			p, on_boundary  = self.cg_solver.solve(-gradient,hessian_feed_dict)
 			pg = np.dot(p,gradient)
+			self._sweeps += [1,2*self.cg_solver.iter]
+			self.p = p
 			# Calculate predicted reduction
 			feed_dict[self.cg_solver.problem.dw] = p
 			Hp 					= self.sess.run(self.cg_solver.Aop,feed_dict)

diff --git a/hessianlearn/algorithms/inexactNewtonMINRES.py b/hessianlearn/algorithms/inexactNewtonMINRES.py
@@ -118,6 +118,5 @@ def minimize(self,feed_dict = None,hessian_feed_dict = None):
 
 
 
-
 
 
diff --git a/hessianlearn/algorithms/lowRankSaddleFreeNewton.py b/hessianlearn/algorithms/lowRankSaddleFreeNewton.py
@@ -35,7 +35,7 @@
 
 
 def ParametersLowRankSaddleFreeNewton(parameters = {}):
-	parameters['alpha']                         = [1e0, "Initial steplength, or learning rate"]
+	parameters['alpha']                         = [1e-3, "Initial steplength, or learning rate"]
 	parameters['rel_tolerance']                 = [1e-3, "Relative convergence when sqrt(g,g)/sqrt(g_0,g_0) <= rel_tolerance"]
 	parameters['abs_tolerance']                 = [1e-4,"Absolute converge when sqrt(g,g) <= abs_tolerance"]
 	parameters['default_damping']        		= [1e-3, "Levenberg-Marquardt damping when no regularization is used"]
@@ -95,6 +95,8 @@ def __init__(self,problem,regularization = None,sess = None,parameters = Paramet
 
 		self._rq_std = 0.0
 
+		self.eigenvalues = None
+
 	@property
 	def rank(self):
 		return self._rank

diff --git a/hessianlearn/algorithms/optimizer.py b/hessianlearn/algorithms/optimizer.py
@@ -88,6 +88,18 @@ def iter(self):
 	def regularization(self):
 		return self._regularization
 
+	@property
+	def set_sess(self):
+		return self._set_sess
+
+
+	def _set_sess(self,sess):
+		r"""
+		Sets the tf.Session()
+		"""
+		self._sess = sess
+		if 'H' in dir(self):
+			self.H._sess = sess
 
 	def minimize(self):
 		r"""