-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
having problems with OOM #25
Comments
In Setting the batch size to 4 or even 2 should solve the issue. How large is your GPU memory? |
Thank you so much for your kind help! I'm using two Titan Xp GPU and the memory spec is as follows: currently the execution is using only 1 GPU..don't know why I have one more question! In your data_coco.py, there is convert_to_voc function. |
I've always used 1 GPU for training a model, but it should work with multiple GPUs as well. The documentation of
The weights you mentioned are not trainable parameters... See #14 for more details. |
Thank you for the reply! I played some parameters in fit_generator() ( I also tried using If anyone solved this issue, please share your knowledge |
Search for |
Hello, Mr.Volk
Thank you very much for your nice codes!
I have one question for you
I'm new to deep learning, have only basic understanding about keras codes, and currently trying to run your DSOD_train.py.
Problem is, I keep getting OOM errors while executing the "Train" section of the code (error message below)
I tried to use only one GPU out of two I have, and to use 'allow_growth' option in tensorflow, and neither worked
I believe I need to reduce the size of minibatch(guess your code using batch size 128, am I right?), but I have no idea where to find the code to make this change. (just changing batch_size = 26 to some number lower didn't solve the problem, so I searched your .py files, ended up with no clue)
I'd really appreciate your help on my problem
By the way, I'm using Ubuntu 16.04 and latest tensorflow-keras
------------------------------------------------error message
ResourceExhaustedError Traceback (most recent call last)
in
49 workers=1,
50 #use_multiprocessing=False,
---> 51 initial_epoch=initial_epoch)
/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
89 warnings.warn('Update your
' + object_name + '
call to the ' +90 'Keras 2 API: ' + signature, stacklevel=2)
---> 91 return func(*args, **kwargs)
92 wrapper._original_function = func
93 return wrapper
/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
1416 use_multiprocessing=use_multiprocessing,
1417 shuffle=shuffle,
-> 1418 initial_epoch=initial_epoch)
1419
1420 @interfaces.legacy_generator_methods_support
/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
215 outs = model.train_on_batch(x, y,
216 sample_weight=sample_weight,
--> 217 class_weight=class_weight)
218
219 outs = to_list(outs)
/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
1215 ins = x + y + sample_weights
1216 self._make_train_function()
-> 1217 outputs = self.train_function(ins)
1218 return unpack_singleton(outputs)
1219
/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py in call(self, inputs)
2713 return self._legacy_call(inputs)
2714
-> 2715 return self._call(inputs)
2716 else:
2717 if py_any(is_tensor(x) for x in inputs):
/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py in _call(self, inputs)
2673 fetched = self._callable_fn(*array_vals, run_metadata=self.run_metadata)
2674 else:
-> 2675 fetched = self._callable_fn(*array_vals)
2676 return fetched[:len(self.outputs)]
2677
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in call(self, *args, **kwargs)
1456 ret = tf_session.TF_SessionRunCallable(self._session._session,
1457 self._handle, args,
-> 1458 run_metadata_ptr)
1459 if run_metadata:
1460 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[6,1376,32,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node batch_normalization_302/FusedBatchNorm}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[6,1376,32,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node batch_normalization_302/FusedBatchNorm}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored.
The text was updated successfully, but these errors were encountered: