FileNotFoundError looking for ckpt files #1009

codeananda · 2022-11-30T17:50:42Z

codeananda
Nov 30, 2022

When training multiple neuralprophet models on multiple timeseries in parallel, I often get the error

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_10be610b-19f1-4e43-97f1-029b9385b9a0.ckpt'`

It's looking for a ckpt file that isn't there, even though I have not specified for NP to store checkpoints.

I can provide more info but need to rush off now and thought someone may know the solution just from this. What else would you like?

Note that this does not happen all the time. Sometimes parallel execution works for all series, sometimes not. The series on which it fails change on each run too

NP v0.5 installed from source
Full traceback below (using betterexceptions)

@ourownstory @karl-richter

2022-11-30 18:33:07.871 | ERROR    | ForwardPredictor:_predict_parallel:583 - An error has been caught in function '_predict_parallel', process 'LokyProcess-9' (1827), thread 'MainThread' (140700283275072):
Traceback (most recent call last):

  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
           │         │     └ {'__name__': '__main__', '__doc__': None, '__package__': 'joblib.externals.loky.backend', '__loader__': <_frozen_importlib_ex...
           │         └ <code object <module> at 0x7ff7519b2ea0, file "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/jobl...
           └ <function _run_code at 0x7ff755e82940>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
         │     └ {'__name__': '__main__', '__doc__': None, '__package__': 'joblib.externals.loky.backend', '__loader__': <_frozen_importlib_ex...
         └ <code object <module> at 0x7ff7519b2ea0, file "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/jobl...
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/externals/loky/backend/popen_loky_posix.py", line 170, in <module>
    exitcode = process_obj._bootstrap()
               │           └ <function BaseProcess._bootstrap at 0x7ff7557778b0>
               └ <LokyProcess name='LokyProcess-9' parent=1772 started>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
    │    └ <function BaseProcess.run at 0x7ff7557e2ee0>
    └ <LokyProcess name='LokyProcess-9' parent=1772 started>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {}
    │    │        │    │        └ <LokyProcess name='LokyProcess-9' parent=1772 started>
    │    │        │    └ (<joblib.externals.loky.process_executor._SafeQueue object at 0x7ff7519cd0a0>, <joblib.externals.loky.backend.queues.SimpleQu...
    │    │        └ <LokyProcess name='LokyProcess-9' parent=1772 started>
    │    └ <function _process_worker at 0x7ff7555efdc0>
    └ <LokyProcess name='LokyProcess-9' parent=1772 started>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 428, in _process_worker
    r = call_item()
        └ CallItem(3, <joblib._parallel_backends.SafeFunction object at 0x7ff7519dd8b0>, (), {})
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
           │    │   │    │       │    └ {}
           │    │   │    │       └ CallItem(3, <joblib._parallel_backends.SafeFunction object at 0x7ff7519dd8b0>, (), {})
           │    │   │    └ ()
           │    │   └ CallItem(3, <joblib._parallel_backends.SafeFunction object at 0x7ff7519dd8b0>, (), {})
           │    └ <joblib._parallel_backends.SafeFunction object at 0x7ff7519dd8b0>
           └ CallItem(3, <joblib._parallel_backends.SafeFunction object at 0x7ff7519dd8b0>, (), {})
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
           │    │     │       └ {}
           │    │     └ ()
           │    └ <joblib.parallel.BatchedCalls object at 0x7ff6c0beaa90>
           └ <joblib._parallel_backends.SafeFunction object at 0x7ff7519dd8b0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
            │     │       └ {}
            │     └ (date
            │       2013-11-24         NaN
            │       2013-11-25         NaN
            │       2013-11-26         NaN
            │       2013-11-27         NaN
            │       2013-11-28         NaN
            │           ...
            └ <bound method ForwardPredictor._predict_parallel of <ForwardPredictor.ForwardPredictor object at 0x7ff75196f040>>

> File "/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/ForwardPredictor.py", line 583, in _predict_parallel
    forecast = self.predict(series)
               │    │       └ date
               │    │         2013-11-24         NaN
               │    │         2013-11-25         NaN
               │    │         2013-11-26         NaN
               │    │         2013-11-27         NaN
               │    │         2013-11-28         NaN
               │    │              ...
               │    └ <function ForwardPredictor.predict at 0x7ff6c0be98b0>
               └ <ForwardPredictor.ForwardPredictor object at 0x7ff75196f040>

  File "/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/ForwardPredictor.py", line 328, in predict
    model.fit(train_df)
    │     │   └              ds         y
    │     │     0    2017-07-16  4.857143
    │     │     1    2017-07-17  4.857143
    │     │     2    2017-07-18  4.857143
    │     │     3    2017-07-19  4.85...
    │     └ <function NeuralProphet.fit at 0x7ff6c4db5f70>
    └ <neuralprophet.forecaster.NeuralProphet object at 0x7ff6c0bf9d00>

  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/neuralprophet/forecaster.py", line 730, in fit
    metrics_df = self._train(df, minimal=minimal, continue_training=continue_training)
                 │    │      │           │                          └ False
                 │    │      │           └ False
                 │    │      └              ds         y      ID
                 │    │        0    2017-07-16  4.857143  __df__
                 │    │        1    2017-07-17  4.857143  __df__
                 │    │        2    2017-07-18  4.8571...
                 │    └ <function NeuralProphet._train at 0x7ff6c4dbd0d0>
                 └ <neuralprophet.forecaster.NeuralProphet object at 0x7ff6c0bf9d00>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/neuralprophet/forecaster.py", line 2558, in _train
    lr_finder = self.trainer.tuner.lr_find(
                │    │       │     └ <function Tuner.lr_find at 0x7ff6c51aa040>
                │    │       └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7ff6c03223a0>
                │    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7ff6c036ed90>
                └ <neuralprophet.forecaster.NeuralProphet object at 0x7ff6c0bf9d00>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/tuner/tuning.py", line 199, in lr_find
    result = self.trainer.tune(
             │    │       └ <function Trainer.tune at 0x7ff6c51454c0>
             │    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7ff6c036ed90>
             └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7ff6c03223a0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1052, in tune
    result = self.tuner._tune(
             │    │     └ <function Tuner._tune at 0x7ff6c5191040>
             │    └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7ff6c03223a0>
             └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7ff6c036ed90>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/tuner/tuning.py", line 70, in _tune
    result["lr_find"] = lr_find(self.trainer, model, **lr_find_kwargs)
    │                   │       │    │        │        └ {'min_lr': 1e-06, 'max_lr': 10, 'num_training': 232, 'mode': 'exponential', 'early_stop_threshold': None, 'update_attr': False}
    │                   │       │    │        └ TimeNet(
    │                   │       │    │            (metrics_train): MetricCollection(
    │                   │       │    │              (MAE): MeanAbsoluteError()
    │                   │       │    │              (RMSE): MeanSquaredError()
    │                   │       │    │            )
    │                   │       │    │            (metrics_va...
    │                   │       │    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7ff6c036ed90>
    │                   │       └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7ff6c03223a0>
    │                   └ <function lr_find at 0x7ff6c51910d0>
    └ {}
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/tuner/lr_finder.py", line 269, in lr_find
    trainer._checkpoint_connector.restore(ckpt_path)
    │       │                     │       └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_ef45368b-e624-4f23-95e3-e1394fdbada7.ckpt'
    │       │                     └ <function CheckpointConnector.restore at 0x7ff6c5233d30>
    │       └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7ff6c0322460>
    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7ff6c036ed90>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 136, in restore
    self.resume_start(checkpoint_path)
    │    │            └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_ef45368b-e624-4f23-95e3-e1394fdbada7.ckpt'
    │    └ <function CheckpointConnector.resume_start at 0x7ff6c5233b80>
    └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7ff6c0322460>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 78, in resume_start
    self.resume_checkpoint_path = self._hpc_resume_path or checkpoint_path
    │    │                        │    │                   └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_ef45368b-e624-4f23-95e3-e1394fdbada7.ckpt'
    │    │                        │    └ <property object at 0x7ff6c5239f90>
    │    │                        └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7ff6c0322460>
    │    └ None
    └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7ff6c0322460>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 66, in _hpc_resume_path
    max_version = self.__max_ckpt_version_in_folder(dir_path_hpc, "hpc_ckpt_")
                  │                                 └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor'
                  └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7ff6c0322460>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 506, in __max_ckpt_version_in_folder
    files = [os.path.basename(f["name"]) for f in fs.listdir(dir_path)]
             │  │    │                            │  │       └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor'
             │  │    │                            │  └ <function AbstractFileSystem.listdir at 0x7ff6c62b64c0>
             │  │    │                            └ <fsspec.implementations.local.LocalFileSystem object at 0x7ff6c036e1c0>
             │  │    └ <function basename at 0x7ff755f64160>
             │  └ <module 'posixpath' from '/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/posixpath.py'>
             └ <module 'os' from '/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/os.py'>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/spec.py", line 1313, in listdir
    return self.ls(path, detail=detail, **kwargs)
           │    │  │            │         └ {}
           │    │  │            └ True
           │    │  └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor'
           │    └ <function LocalFileSystem.ls at 0x7ff6c6251c10>
           └ <fsspec.implementations.local.LocalFileSystem object at 0x7ff6c036e1c0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/implementations/local.py", line 60, in ls
    return [self.info(f) for f in it]
            │    │                └ <posix.ScandirIterator object at 0x7ff6c035dc00>
            │    └ <function LocalFileSystem.info at 0x7ff6c6251d30>
            └ <fsspec.implementations.local.LocalFileSystem object at 0x7ff6c036e1c0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/implementations/local.py", line 60, in <listcomp>
    return [self.info(f) for f in it]
            │    │    │      └ <DirEntry '.lr_find_10be610b-19f1-4e43-97f1-029b9385b9a0.ckpt'>
            │    │    └ <DirEntry '.lr_find_10be610b-19f1-4e43-97f1-029b9385b9a0.ckpt'>
            │    └ <function LocalFileSystem.info at 0x7ff6c6251d30>
            └ <fsspec.implementations.local.LocalFileSystem object at 0x7ff6c036e1c0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/implementations/local.py", line 71, in info
    out = path.stat(follow_symlinks=False)
          │    └ <method 'stat' of 'posix.DirEntry' objects>
          └ <DirEntry '.lr_find_10be610b-19f1-4e43-97f1-029b9385b9a0.ckpt'>

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_10be610b-19f1-4e43-97f1-029b9385b9a0.ckpt'

karl-richter · 2022-11-30T18:18:50Z

karl-richter
Nov 30, 2022
Collaborator

Hi @codeananda thanks for raising this issue and providing a detailed traceback!

The checkpoints are automatically created by the PyTorch Lightning learning rate finder which is automatically activated if you don't provide a learning rate. Lightning essentially checkpoints the model before trying out different learning rates and then re-loads the model from the checkpoint to restore the model weights to what they have been initialised with.

This is would be the line from your traceback where that happens.

File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/tuner/lr_finder.py", line 269, in lr_find
    trainer._checkpoint_connector.restore(ckpt_path)
    │       │                     │       └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_ef45368b-e624-4f23-95e3-e1394fdbada7.ckpt'
    │       │                     └ <function CheckpointConnector.restore at 0x7ff6c5233d30>
    │       └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7ff6c0322460>
    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7ff6c036ed90>

So a fast workaround could be to manually provide a learning rate (which I understand is not feasible in many cases).

I assume your are working on Colab, is that correct? Could you manually inspect whether the referenced file exists at the provided location? I have a slight suspicion that the parallel trained models might overwrite the checkpoints of each other or that the mountpoint has tiny outages which cause the learning rate finder not to find the file.

0 replies

codeananda · 2022-12-01T09:49:52Z

codeananda
Dec 1, 2022
Author

Hi @karl-richter thanks for your super speedy response!

I assume your are working on Colab, is that correct?

I'm working in VS Code but storing/accessing files on Google Drive. Similar to Colab but not identical.

Could you manually inspect whether the referenced file exists at the provided location?

The file did exist. However, see comment below as I don't think it existed when the program called it.

I ran it once with a specified learning rate and it worked. Then re-ran and got this error. In general, we've adopted the policy of manually deleting the lightning_logs folder before we run our code as it often causes errors. But obviously it would be great if we didn't have to do that!

Display error

2022-12-01 10:38:45.192 | ERROR    | ForwardPredictor:_predict_parallel:583 - An error has been caught in function '_predict_parallel', process 'LokyProcess-2' (4123), thread 'MainThread' (140568232052544):
Traceback (most recent call last):

  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
           │         │     └ {'__name__': '__main__', '__doc__': None, '__package__': 'joblib.externals.loky.backend', '__loader__': <_frozen_importlib_ex...
           │         └ <code object <module> at 0x7fd892bd6ea0, file "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/jobl...
           └ <function _run_code at 0x7fd8970a6940>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
         │     └ {'__name__': '__main__', '__doc__': None, '__package__': 'joblib.externals.loky.backend', '__loader__': <_frozen_importlib_ex...
         └ <code object <module> at 0x7fd892bd6ea0, file "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/jobl...
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/externals/loky/backend/popen_loky_posix.py", line 170, in <module>
    exitcode = process_obj._bootstrap()
               │           └ <function BaseProcess._bootstrap at 0x7fd89699b8b0>
               └ <LokyProcess name='LokyProcess-2' parent=4087 started>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
    │    └ <function BaseProcess.run at 0x7fd896a06ee0>
    └ <LokyProcess name='LokyProcess-2' parent=4087 started>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {}
    │    │        │    │        └ <LokyProcess name='LokyProcess-2' parent=4087 started>
    │    │        │    └ (<joblib.externals.loky.process_executor._SafeQueue object at 0x7fd892bf1130>, <joblib.externals.loky.backend.queues.SimpleQu...
    │    │        └ <LokyProcess name='LokyProcess-2' parent=4087 started>
    │    └ <function _process_worker at 0x7fd896813dc0>
    └ <LokyProcess name='LokyProcess-2' parent=4087 started>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 428, in _process_worker
    r = call_item()
        └ CallItem(2, <joblib._parallel_backends.SafeFunction object at 0x7fd892c01940>, (), {})
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
           │    │   │    │       │    └ {}
           │    │   │    │       └ CallItem(2, <joblib._parallel_backends.SafeFunction object at 0x7fd892c01940>, (), {})
           │    │   │    └ ()
           │    │   └ CallItem(2, <joblib._parallel_backends.SafeFunction object at 0x7fd892c01940>, (), {})
           │    └ <joblib._parallel_backends.SafeFunction object at 0x7fd892c01940>
           └ CallItem(2, <joblib._parallel_backends.SafeFunction object at 0x7fd892c01940>, (), {})
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
           │    │     │       └ {}
           │    │     └ ()
           │    └ <joblib.parallel.BatchedCalls object at 0x7fd801e04cd0>
           └ <joblib._parallel_backends.SafeFunction object at 0x7fd892c01940>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
            │     │       └ {}
            │     └ (date
            │       2013-11-24          NaN
            │       2013-11-25          NaN
            │       2013-11-26          NaN
            │       2013-11-27          NaN
            │       2013-11-28          NaN...
            └ <bound method ForwardPredictor._predict_parallel of <ForwardPredictor.ForwardPredictor object at 0x7fd892b94040>>

> File "/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/ForwardPredictor.py", line 583, in _predict_parallel
    forecast = self.predict(series)
               │    │       └ date
               │    │         2013-11-24          NaN
               │    │         2013-11-25          NaN
               │    │         2013-11-26          NaN
               │    │         2013-11-27          NaN
               │    │         2013-11-28          NaN
               │    │         ...
               │    └ <function ForwardPredictor.predict at 0x7fd801e068b0>
               └ <ForwardPredictor.ForwardPredictor object at 0x7fd892b94040>

  File "/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/ForwardPredictor.py", line 328, in predict
    model.fit(train_df)
    │     │   └              ds          y
    │     │     0    2017-06-11  13.428571
    │     │     1    2017-06-12  13.428571
    │     │     2    2017-06-13  13.428571
    │     │     3    2017-06-14  ...
    │     └ <function NeuralProphet.fit at 0x7fd805fb7f70>
    └ <neuralprophet.forecaster.NeuralProphet object at 0x7fd801e11f40>

  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/neuralprophet/forecaster.py", line 730, in fit
    metrics_df = self._train(df, minimal=minimal, continue_training=continue_training)
                 │    │      │           │                          └ False
                 │    │      │           └ False
                 │    │      └              ds          y      ID
                 │    │        0    2017-06-11  13.428571  __df__
                 │    │        1    2017-06-12  13.428571  __df__
                 │    │        2    2017-06-13  13....
                 │    └ <function NeuralProphet._train at 0x7fd805fbd0d0>
                 └ <neuralprophet.forecaster.NeuralProphet object at 0x7fd801e11f40>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/neuralprophet/forecaster.py", line 2567, in _train
    self.trainer.fit(
    │    │       └ <function Trainer.fit at 0x7fd806346040>
    │    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fd8015918e0>
    └ <neuralprophet.forecaster.NeuralProphet object at 0x7fd801e11f40>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
    self._call_and_handle_interrupt(
    │    └ <function Trainer._call_and_handle_interrupt at 0x7fd8063aaf70>
    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fd8015918e0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           │           │       └ {}
           │           └ (TimeNet(
           │               (metrics_train): MetricCollection(
           │                 (MAE): MeanAbsoluteError()
           │                 (RMSE): MeanSquaredError()
           │               )
           │               (metrics_v...
           └ <bound method Trainer._fit_impl of <pytorch_lightning.trainer.trainer.Trainer object at 0x7fd8015918e0>>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
              │    │    │                │    └ <property object at 0x7fd8963571d0>
              │    │    │                └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fd8015918e0>
              │    │    └ TimeNet(
              │    │        (metrics_train): MetricCollection(
              │    │          (MAE): MeanAbsoluteError()
              │    │          (RMSE): MeanSquaredError()
              │    │        )
              │    │        (metrics_va...
              │    └ <function Trainer._run at 0x7fd8063465e0>
              └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fd8015918e0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run
    results = self._run_stage()
              │    └ <function Trainer._run_stage at 0x7fd806346820>
              └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fd8015918e0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage
    return self._run_train()
           │    └ <function Trainer._run_train at 0x7fd806346940>
           └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fd8015918e0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1283, in _run_train
    self.fit_loop.run()
    │    └ <property object at 0x7fd8963ac950>
    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fd8015918e0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 201, in run
    self.on_advance_end()
    │    └ <function FitLoop.on_advance_end at 0x7fd807469790>
    └ <pytorch_lightning.loops.fit_loop.FitLoop object at 0x7fd8015414c0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 299, in on_advance_end
    self.trainer._call_callback_hooks("on_train_epoch_end")
    │    └ <property object at 0x7fd8064d9ef0>
    └ <pytorch_lightning.loops.fit_loop.FitLoop object at 0x7fd8015414c0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1597, in _call_callback_hooks
    fn(self, self.lightning_module, *args, **kwargs)
    │  │     │    │                  │       └ {}
    │  │     │    │                  └ ()
    │  │     │    └ <property object at 0x7fd896357b80>
    │  │     └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fd8015918e0>
    │  └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fd8015918e0>
    └ <bound method ModelCheckpoint.on_train_epoch_end of <pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint object at 0...
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 311, in on_train_epoch_end
    self._save_topk_checkpoint(trainer, monitor_candidates)
    │    │                     │        └ {'MAE': tensor(5.9332), 'RMSE': tensor(7.2154), 'Loss': tensor([0.4365]), 'RegLoss': tensor([0.]), 'epoch': tensor(1), 'step'...
    │    │                     └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fd8015918e0>
    │    └ <function ModelCheckpoint._save_topk_checkpoint at 0x7fd80740d160>
    └ <pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint object at 0x7fd801321070>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 384, in _save_topk_checkpoint
    self._save_none_monitor_checkpoint(trainer, monitor_candidates)
    │    │                             │        └ {'MAE': tensor(5.9332), 'RMSE': tensor(7.2154), 'Loss': tensor([0.4365]), 'RegLoss': tensor([0.]), 'epoch': tensor(1), 'step'...
    │    │                             └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fd8015918e0>
    │    └ <function ModelCheckpoint._save_none_monitor_checkpoint at 0x7fd80740daf0>
    └ <pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint object at 0x7fd801321070>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 674, in _save_none_monitor_checkpoint
    trainer.strategy.remove_checkpoint(previous)
    │       │                          └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/lightning_logs/version_146/checkpoints/epoch=0-step=62.ckpt'
    │       └ <property object at 0x7fd89634bf90>
    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fd8015918e0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 455, in remove_checkpoint
    self.checkpoint_io.remove_checkpoint(filepath)
    │    │                               └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/lightning_logs/version_146/checkpoints/epoch=0-step=62.ckpt'
    │    └ <property object at 0x7fd80660fcc0>
    └ <pytorch_lightning.strategies.single_device.SingleDeviceStrategy object at 0x7fd8015418b0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/plugins/io/torch_plugin.py", line 95, in remove_checkpoint
    fs.rm(path, recursive=True)
    │  │  └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/lightning_logs/version_146/checkpoints/epoch=0-step=62.ckpt'
    │  └ <function LocalFileSystem.rm at 0x7fd80747d310>
    └ <fsspec.implementations.local.LocalFileSystem object at 0x7fd801591400>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/implementations/local.py", line 169, in rm
    os.remove(p)
    │  │      └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/lightning_logs/version_146/checkpoints/epoch=0-step=62.ckpt'
    │  └ <built-in function remove>
    └ <module 'os' from '/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/os.py'>

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/lightning_logs/version_146/checkpoints/epoch=0-step=62.ckpt'

These are the files in the checkpoints dir (I'm just training for 3 epochs for speed). So, this file doesn't exist.

In general though, it seems like if I a) delete the lightning_logs folder and b) run it with a specified learning rate, it works. However, I'm just doing this on 8 series for 3 epochs each. My colleague says when he's running it on 1k+ series for 300+ epochs that he sometimes does still get the ckpt error even when manually specifiying a learning rate.

0 replies

codeananda · 2022-12-01T10:11:34Z

codeananda
Dec 1, 2022
Author

UPDATE

Just re-ran without LR and kept a close eye on the ckpt files it created.

I'm running on 8 series in parallel (using 20 cores) but only 6 ckpt files were created. I get the same ckpt FileNotFoundError but the file actually did not exist. I get the error three times and they are all looking for the same ckpt file .lr_find_61823080-15ab-4d18-8c69-66f4635a69ad.ckpt

Error 1

2022-12-01 10:50:54.023 | ERROR    | ForwardPredictor:_predict_parallel:583 - An error has been caught in function '_predict_parallel', process 'LokyProcess-1' (5416), thread 'MainThread' (139888175351616):
Traceback (most recent call last):

  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
           │         │     └ {'__name__': '__main__', '__doc__': None, '__package__': 'joblib.externals.loky.backend', '__loader__': <_frozen_importlib_ex...
           │         └ <code object <module> at 0x7f3a3c33aea0, file "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/jobl...
           └ <function _run_code at 0x7f3a4080a940>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
         │     └ {'__name__': '__main__', '__doc__': None, '__package__': 'joblib.externals.loky.backend', '__loader__': <_frozen_importlib_ex...
         └ <code object <module> at 0x7f3a3c33aea0, file "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/jobl...
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/externals/loky/backend/popen_loky_posix.py", line 170, in <module>
    exitcode = process_obj._bootstrap()
               │           └ <function BaseProcess._bootstrap at 0x7f3a400ff8b0>
               └ <LokyProcess name='LokyProcess-1' parent=5381 started>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
    │    └ <function BaseProcess.run at 0x7f3a4016aee0>
    └ <LokyProcess name='LokyProcess-1' parent=5381 started>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {}
    │    │        │    │        └ <LokyProcess name='LokyProcess-1' parent=5381 started>
    │    │        │    └ (<joblib.externals.loky.process_executor._SafeQueue object at 0x7f3a3c354130>, <joblib.externals.loky.backend.queues.SimpleQu...
    │    │        └ <LokyProcess name='LokyProcess-1' parent=5381 started>
    │    └ <function _process_worker at 0x7f3a3ff78dc0>
    └ <LokyProcess name='LokyProcess-1' parent=5381 started>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 428, in _process_worker
    r = call_item()
        └ CallItem(1, <joblib._parallel_backends.SafeFunction object at 0x7f3a3c365940>, (), {})
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
           │    │   │    │       │    └ {}
           │    │   │    │       └ CallItem(1, <joblib._parallel_backends.SafeFunction object at 0x7f3a3c365940>, (), {})
           │    │   │    └ ()
           │    │   └ CallItem(1, <joblib._parallel_backends.SafeFunction object at 0x7f3a3c365940>, (), {})
           │    └ <joblib._parallel_backends.SafeFunction object at 0x7f3a3c365940>
           └ CallItem(1, <joblib._parallel_backends.SafeFunction object at 0x7f3a3c365940>, (), {})
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
           │    │     │       └ {}
           │    │     └ ()
           │    └ <joblib.parallel.BatchedCalls object at 0x7f39ab570b50>
           └ <joblib._parallel_backends.SafeFunction object at 0x7f3a3c365940>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
            │     │       └ {}
            │     └ (date
            │       2013-11-24          NaN
            │       2013-11-25          NaN
            │       2013-11-26          NaN
            │       2013-11-27          NaN
            │       2013-11-28          NaN...
            └ <bound method ForwardPredictor._predict_parallel of <ForwardPredictor.ForwardPredictor object at 0x7f3a3c2f7040>>

> File "/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/ForwardPredictor.py", line 583, in _predict_parallel
    forecast = self.predict(series)
               │    │       └ date
               │    │         2013-11-24          NaN
               │    │         2013-11-25          NaN
               │    │         2013-11-26          NaN
               │    │         2013-11-27          NaN
               │    │         2013-11-28          NaN
               │    │         ...
               │    └ <function ForwardPredictor.predict at 0x7f39ab5718b0>
               └ <ForwardPredictor.ForwardPredictor object at 0x7f3a3c2f7040>

  File "/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/ForwardPredictor.py", line 328, in predict
    model.fit(train_df)
    │     │   └              ds          y
    │     │     0    2017-01-08   4.285714
    │     │     1    2017-01-09   4.285714
    │     │     2    2017-01-10   4.285714
    │     │     3    2017-01-11  ...
    │     └ <function NeuralProphet.fit at 0x7f39af736f70>
    └ <neuralprophet.forecaster.NeuralProphet object at 0x7f39ab57edc0>

  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/neuralprophet/forecaster.py", line 730, in fit
    metrics_df = self._train(df, minimal=minimal, continue_training=continue_training)
                 │    │      │           │                          └ False
                 │    │      │           └ False
                 │    │      └              ds          y      ID
                 │    │        0    2017-01-08   4.285714  __df__
                 │    │        1    2017-01-09   4.285714  __df__
                 │    │        2    2017-01-10   4....
                 │    └ <function NeuralProphet._train at 0x7f39af73d0d0>
                 └ <neuralprophet.forecaster.NeuralProphet object at 0x7f39ab57edc0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/neuralprophet/forecaster.py", line 2558, in _train
    lr_finder = self.trainer.tuner.lr_find(
                │    │       │     └ <function Tuner.lr_find at 0x7f39afb2a040>
                │    │       └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7f39aacac760>
                │    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7f39aacf7f70>
                └ <neuralprophet.forecaster.NeuralProphet object at 0x7f39ab57edc0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/tuner/tuning.py", line 199, in lr_find
    result = self.trainer.tune(
             │    │       └ <function Trainer.tune at 0x7f39afac54c0>
             │    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7f39aacf7f70>
             └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7f39aacac760>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1052, in tune
    result = self.tuner._tune(
             │    │     └ <function Tuner._tune at 0x7f39afb12040>
             │    └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7f39aacac760>
             └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7f39aacf7f70>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/tuner/tuning.py", line 70, in _tune
    result["lr_find"] = lr_find(self.trainer, model, **lr_find_kwargs)
    │                   │       │    │        │        └ {'min_lr': 1e-06, 'max_lr': 10, 'num_training': 233, 'mode': 'exponential', 'early_stop_threshold': None, 'update_attr': False}
    │                   │       │    │        └ TimeNet(
    │                   │       │    │            (metrics_train): MetricCollection(
    │                   │       │    │              (MAE): MeanAbsoluteError()
    │                   │       │    │              (RMSE): MeanSquaredError()
    │                   │       │    │            )
    │                   │       │    │            (metrics_va...
    │                   │       │    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7f39aacf7f70>
    │                   │       └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7f39aacac760>
    │                   └ <function lr_find at 0x7f39afb120d0>
    └ {}
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/tuner/lr_finder.py", line 269, in lr_find
    trainer._checkpoint_connector.restore(ckpt_path)
    │       │                     │       └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_44c78caa-2937-49f9-b629-b0bc0e2a5c45.ckpt'
    │       │                     └ <function CheckpointConnector.restore at 0x7f39afbb4d30>
    │       └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7f39aacac700>
    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7f39aacf7f70>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 136, in restore
    self.resume_start(checkpoint_path)
    │    │            └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_44c78caa-2937-49f9-b629-b0bc0e2a5c45.ckpt'
    │    └ <function CheckpointConnector.resume_start at 0x7f39afbb4b80>
    └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7f39aacac700>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 78, in resume_start
    self.resume_checkpoint_path = self._hpc_resume_path or checkpoint_path
    │    │                        │    │                   └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_44c78caa-2937-49f9-b629-b0bc0e2a5c45.ckpt'
    │    │                        │    └ <property object at 0x7f39afb41130>
    │    │                        └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7f39aacac700>
    │    └ None
    └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7f39aacac700>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 66, in _hpc_resume_path
    max_version = self.__max_ckpt_version_in_folder(dir_path_hpc, "hpc_ckpt_")
                  │                                 └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor'
                  └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7f39aacac700>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 506, in __max_ckpt_version_in_folder
    files = [os.path.basename(f["name"]) for f in fs.listdir(dir_path)]
             │  │    │                            │  │       └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor'
             │  │    │                            │  └ <function AbstractFileSystem.listdir at 0x7f39b0c484c0>
             │  │    │                            └ <fsspec.implementations.local.LocalFileSystem object at 0x7f39aacf7280>
             │  │    └ <function basename at 0x7f3a408ec160>
             │  └ <module 'posixpath' from '/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/posixpath.py'>
             └ <module 'os' from '/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/os.py'>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/spec.py", line 1313, in listdir
    return self.ls(path, detail=detail, **kwargs)
           │    │  │            │         └ {}
           │    │  │            └ True
           │    │  └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor'
           │    └ <function LocalFileSystem.ls at 0x7f39b0be3c10>
           └ <fsspec.implementations.local.LocalFileSystem object at 0x7f39aacf7280>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/implementations/local.py", line 60, in ls
    return [self.info(f) for f in it]
            │    │                └ <posix.ScandirIterator object at 0x7f39aacc7810>
            │    └ <function LocalFileSystem.info at 0x7f39b0be3d30>
            └ <fsspec.implementations.local.LocalFileSystem object at 0x7f39aacf7280>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/implementations/local.py", line 60, in <listcomp>
    return [self.info(f) for f in it]
            │    │    │      └ <DirEntry '.lr_find_61823080-15ab-4d18-8c69-66f4635a69ad.ckpt'>
            │    │    └ <DirEntry '.lr_find_61823080-15ab-4d18-8c69-66f4635a69ad.ckpt'>
            │    └ <function LocalFileSystem.info at 0x7f39b0be3d30>
            └ <fsspec.implementations.local.LocalFileSystem object at 0x7f39aacf7280>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/implementations/local.py", line 71, in info
    out = path.stat(follow_symlinks=False)
          │    └ <method 'stat' of 'posix.DirEntry' objects>
          └ <DirEntry '.lr_find_61823080-15ab-4d18-8c69-66f4635a69ad.ckpt'>

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_61823080-15ab-4d18-8c69-66f4635a69ad.ckpt'
2022-12-01 10:50:54.050 | ERROR    | ForwardPredictor:_predict_parallel:583 - An error has been caught in function '_predict_parallel', process 'LokyProcess-7' (5422), thread 'MainThread' (140164371756864):
Traceback (most recent call last):

  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
           │         │     └ {'__name__': '__main__', '__doc__': None, '__package__': 'joblib.externals.loky.backend', '__loader__': <_frozen_importlib_ex...
           │         └ <code object <module> at 0x7f7a8aca5ea0, file "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/jobl...
           └ <function _run_code at 0x7f7a8f175940>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
         │     └ {'__name__': '__main__', '__doc__': None, '__package__': 'joblib.externals.loky.backend', '__loader__': <_frozen_importlib_ex...
         └ <code object <module> at 0x7f7a8aca5ea0, file "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/jobl...
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/externals/loky/backend/popen_loky_posix.py", line 170, in <module>
    exitcode = process_obj._bootstrap()
               │           └ <function BaseProcess._bootstrap at 0x7f7a8ea6a8b0>
               └ <LokyProcess name='LokyProcess-7' parent=5381 started>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
    │    └ <function BaseProcess.run at 0x7f7a8ead5ee0>
    └ <LokyProcess name='LokyProcess-7' parent=5381 started>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {}
    │    │        │    │        └ <LokyProcess name='LokyProcess-7' parent=5381 started>
    │    │        │    └ (<joblib.externals.loky.process_executor._SafeQueue object at 0x7f7a8acbf130>, <joblib.externals.loky.backend.queues.SimpleQu...
    │    │        └ <LokyProcess name='LokyProcess-7' parent=5381 started>
    │    └ <function _process_worker at 0x7f7a8e8e2dc0>
    └ <LokyProcess name='LokyProcess-7' parent=5381 started>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 428, in _process_worker
    r = call_item()
        └ CallItem(6, <joblib._parallel_backends.SafeFunction object at 0x7f7a8acd0940>, (), {})
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
           │    │   │    │       │    └ {}
           │    │   │    │       └ CallItem(6, <joblib._parallel_backends.SafeFunction object at 0x7f7a8acd0940>, (), {})
           │    │   │    └ ()
           │    │   └ CallItem(6, <joblib._parallel_backends.SafeFunction object at 0x7f7a8acd0940>, (), {})
           │    └ <joblib._parallel_backends.SafeFunction object at 0x7f7a8acd0940>
           └ CallItem(6, <joblib._parallel_backends.SafeFunction object at 0x7f7a8acd0940>, (), {})
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
           │    │     │       └ {}
           │    │     └ ()
           │    └ <joblib.parallel.BatchedCalls object at 0x7f79f9eda9d0>
           └ <joblib._parallel_backends.SafeFunction object at 0x7f7a8acd0940>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
            │     │       └ {}
            │     └ (date
            │       2013-11-24         NaN
            │       2013-11-25         NaN
            │       2013-11-26         NaN
            │       2013-11-27         NaN
            │       2013-11-28         NaN
            │           ...
            └ <bound method ForwardPredictor._predict_parallel of <ForwardPredictor.ForwardPredictor object at 0x7f7a8ac63040>>

> File "/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/ForwardPredictor.py", line 583, in _predict_parallel
    forecast = self.predict(series)
               │    │       └ date
               │    │         2013-11-24         NaN
               │    │         2013-11-25         NaN
               │    │         2013-11-26         NaN
               │    │         2013-11-27         NaN
               │    │         2013-11-28         NaN
               │    │              ...
               │    └ <function ForwardPredictor.predict at 0x7f79f9ed88b0>
               └ <ForwardPredictor.ForwardPredictor object at 0x7f7a8ac63040>

  File "/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/ForwardPredictor.py", line 328, in predict
    model.fit(train_df)
    │     │   └              ds         y
    │     │     0    2017-06-18  6.571429
    │     │     1    2017-06-19  6.571429
    │     │     2    2017-06-20  6.571429
    │     │     3    2017-06-21  6.57...
    │     └ <function NeuralProphet.fit at 0x7f79fe0b6f70>
    └ <neuralprophet.forecaster.NeuralProphet object at 0x7f79f9ee8c40>

  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/neuralprophet/forecaster.py", line 730, in fit
    metrics_df = self._train(df, minimal=minimal, continue_training=continue_training)
                 │    │      │           │                          └ False
                 │    │      │           └ False
                 │    │      └              ds         y      ID
                 │    │        0    2017-06-18  6.571429  __df__
                 │    │        1    2017-06-19  6.571429  __df__
                 │    │        2    2017-06-20  6.5714...
                 │    └ <function NeuralProphet._train at 0x7f79fe0bd0d0>
                 └ <neuralprophet.forecaster.NeuralProphet object at 0x7f79f9ee8c40>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/neuralprophet/forecaster.py", line 2558, in _train
    lr_finder = self.trainer.tuner.lr_find(
                │    │       │     └ <function Tuner.lr_find at 0x7f79fe4aa040>
                │    │       └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7f79f96114f0>
                │    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7f79f9658130>
                └ <neuralprophet.forecaster.NeuralProphet object at 0x7f79f9ee8c40>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/tuner/tuning.py", line 199, in lr_find
    result = self.trainer.tune(
             │    │       └ <function Trainer.tune at 0x7f79fe4454c0>
             │    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7f79f9658130>
             └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7f79f96114f0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1052, in tune
    result = self.tuner._tune(
             │    │     └ <function Tuner._tune at 0x7f79fe491040>
             │    └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7f79f96114f0>
             └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7f79f9658130>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/tuner/tuning.py", line 70, in _tune
    result["lr_find"] = lr_find(self.trainer, model, **lr_find_kwargs)
    │                   │       │    │        │        └ {'min_lr': 1e-06, 'max_lr': 10, 'num_training': 232, 'mode': 'exponential', 'early_stop_threshold': None, 'update_attr': False}
    │                   │       │    │        └ TimeNet(
    │                   │       │    │            (metrics_train): MetricCollection(
    │                   │       │    │              (MAE): MeanAbsoluteError()
    │                   │       │    │              (RMSE): MeanSquaredError()
    │                   │       │    │            )
    │                   │       │    │            (metrics_va...
    │                   │       │    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7f79f9658130>
    │                   │       └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7f79f96114f0>
    │                   └ <function lr_find at 0x7f79fe4910d0>
    └ {}
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/tuner/lr_finder.py", line 269, in lr_find
    trainer._checkpoint_connector.restore(ckpt_path)
    │       │                     │       └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_b15ebc4c-7d54-49b9-b11a-a252ac64fdbb.ckpt'
    │       │                     └ <function CheckpointConnector.restore at 0x7f79fe534d30>
    │       └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7f79f9611460>
    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7f79f9658130>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 140, in restore
    self.restore_model()
    │    └ <function CheckpointConnector.restore_model at 0x7f79fe534e50>
    └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7f79f9611460>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 178, in restore_model
    if self._hpc_resume_path is not None:
       │    └ <property object at 0x7f79fe53bf40>
       └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7f79f9611460>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 66, in _hpc_resume_path
    max_version = self.__max_ckpt_version_in_folder(dir_path_hpc, "hpc_ckpt_")
                  │                                 └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor'
                  └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7f79f9611460>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 506, in __max_ckpt_version_in_folder
    files = [os.path.basename(f["name"]) for f in fs.listdir(dir_path)]
             │  │    │                            │  │       └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor'
             │  │    │                            │  └ <function AbstractFileSystem.listdir at 0x7f79ff5c84c0>
             │  │    │                            └ <fsspec.implementations.local.LocalFileSystem object at 0x7f79f9658040>
             │  │    └ <function basename at 0x7f7a8f257160>
             │  └ <module 'posixpath' from '/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/posixpath.py'>
             └ <module 'os' from '/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/os.py'>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/spec.py", line 1313, in listdir
    return self.ls(path, detail=detail, **kwargs)
           │    │  │            │         └ {}
           │    │  │            └ True
           │    │  └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor'
           │    └ <function LocalFileSystem.ls at 0x7f79ff565c10>
           └ <fsspec.implementations.local.LocalFileSystem object at 0x7f79f9658040>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/implementations/local.py", line 60, in ls
    return [self.info(f) for f in it]
            │    │                └ <posix.ScandirIterator object at 0x7f79f962eea0>
            │    └ <function LocalFileSystem.info at 0x7f79ff565d30>
            └ <fsspec.implementations.local.LocalFileSystem object at 0x7f79f9658040>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/implementations/local.py", line 60, in <listcomp>
    return [self.info(f) for f in it]
            │    │    │      └ <DirEntry '.lr_find_61823080-15ab-4d18-8c69-66f4635a69ad.ckpt'>
            │    │    └ <DirEntry '.lr_find_61823080-15ab-4d18-8c69-66f4635a69ad.ckpt'>
            │    └ <function LocalFileSystem.info at 0x7f79ff565d30>
            └ <fsspec.implementations.local.LocalFileSystem object at 0x7f79f9658040>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/implementations/local.py", line 71, in info
    out = path.stat(follow_symlinks=False)
          │    └ <method 'stat' of 'posix.DirEntry' objects>
          └ <DirEntry '.lr_find_61823080-15ab-4d18-8c69-66f4635a69ad.ckpt'>

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_61823080-15ab-4d18-8c69-66f4635a69ad.ckpt'

Error 2

2022-12-01 10:50:54.128 | ERROR    | ForwardPredictor:_predict_parallel:583 - An error has been caught in function '_predict_parallel', process 'LokyProcess-4' (5419), thread 'MainThread' (140631394588480):
Traceback (most recent call last):

  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
           │         │     └ {'__name__': '__main__', '__doc__': None, '__package__': 'joblib.externals.loky.backend', '__loader__': <_frozen_importlib_ex...
           │         └ <code object <module> at 0x7fe747853ea0, file "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/jobl...
           └ <function _run_code at 0x7fe74bd23940>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
         │     └ {'__name__': '__main__', '__doc__': None, '__package__': 'joblib.externals.loky.backend', '__loader__': <_frozen_importlib_ex...
         └ <code object <module> at 0x7fe747853ea0, file "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/jobl...
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/externals/loky/backend/popen_loky_posix.py", line 170, in <module>
    exitcode = process_obj._bootstrap()
               │           └ <function BaseProcess._bootstrap at 0x7fe74b6198b0>
               └ <LokyProcess name='LokyProcess-4' parent=5381 started>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
    │    └ <function BaseProcess.run at 0x7fe74b682ee0>
    └ <LokyProcess name='LokyProcess-4' parent=5381 started>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {}
    │    │        │    │        └ <LokyProcess name='LokyProcess-4' parent=5381 started>
    │    │        │    └ (<joblib.externals.loky.process_executor._SafeQueue object at 0x7fe74786d130>, <joblib.externals.loky.backend.queues.SimpleQu...
    │    │        └ <LokyProcess name='LokyProcess-4' parent=5381 started>
    │    └ <function _process_worker at 0x7fe74b491dc0>
    └ <LokyProcess name='LokyProcess-4' parent=5381 started>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 428, in _process_worker
    r = call_item()
        └ CallItem(0, <joblib._parallel_backends.SafeFunction object at 0x7fe74787e940>, (), {})
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
           │    │   │    │       │    └ {}
           │    │   │    │       └ CallItem(0, <joblib._parallel_backends.SafeFunction object at 0x7fe74787e940>, (), {})
           │    │   │    └ ()
           │    │   └ CallItem(0, <joblib._parallel_backends.SafeFunction object at 0x7fe74787e940>, (), {})
           │    └ <joblib._parallel_backends.SafeFunction object at 0x7fe74787e940>
           └ CallItem(0, <joblib._parallel_backends.SafeFunction object at 0x7fe74787e940>, (), {})
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
           │    │     │       └ {}
           │    │     └ ()
           │    └ <joblib.parallel.BatchedCalls object at 0x7fe6b6a86a90>
           └ <joblib._parallel_backends.SafeFunction object at 0x7fe74787e940>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/joblib/parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
            │     │       └ {}
            │     └ (date
            │       2013-11-24         NaN
            │       2013-11-25         NaN
            │       2013-11-26         NaN
            │       2013-11-27         NaN
            │       2013-11-28         NaN
            │           ...
            └ <bound method ForwardPredictor._predict_parallel of <ForwardPredictor.ForwardPredictor object at 0x7fe747810040>>

> File "/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/ForwardPredictor.py", line 583, in _predict_parallel
    forecast = self.predict(series)
               │    │       └ date
               │    │         2013-11-24         NaN
               │    │         2013-11-25         NaN
               │    │         2013-11-26         NaN
               │    │         2013-11-27         NaN
               │    │         2013-11-28         NaN
               │    │              ...
               │    └ <function ForwardPredictor.predict at 0x7fe6b6a858b0>
               └ <ForwardPredictor.ForwardPredictor object at 0x7fe747810040>

  File "/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/ForwardPredictor.py", line 328, in predict
    model.fit(train_df)
    │     │   └              ds         y
    │     │     0    2016-10-23  5.857143
    │     │     1    2016-10-24  5.857143
    │     │     2    2016-10-25  5.857143
    │     │     3    2016-10-26  5.85...
    │     └ <function NeuralProphet.fit at 0x7fe6bac36f70>
    └ <neuralprophet.forecaster.NeuralProphet object at 0x7fe6b6a93d00>

  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/neuralprophet/forecaster.py", line 730, in fit
    metrics_df = self._train(df, minimal=minimal, continue_training=continue_training)
                 │    │      │           │                          └ False
                 │    │      │           └ False
                 │    │      └              ds         y      ID
                 │    │        0    2016-10-23  5.857143  __df__
                 │    │        1    2016-10-24  5.857143  __df__
                 │    │        2    2016-10-25  5.8571...
                 │    └ <function NeuralProphet._train at 0x7fe6bac3d0d0>
                 └ <neuralprophet.forecaster.NeuralProphet object at 0x7fe6b6a93d00>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/neuralprophet/forecaster.py", line 2558, in _train
    lr_finder = self.trainer.tuner.lr_find(
                │    │       │     └ <function Tuner.lr_find at 0x7fe6bb02a040>
                │    │       └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7fe6b61bf3a0>
                │    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fe6b620bd90>
                └ <neuralprophet.forecaster.NeuralProphet object at 0x7fe6b6a93d00>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/tuner/tuning.py", line 199, in lr_find
    result = self.trainer.tune(
             │    │       └ <function Trainer.tune at 0x7fe6bafc54c0>
             │    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fe6b620bd90>
             └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7fe6b61bf3a0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1052, in tune
    result = self.tuner._tune(
             │    │     └ <function Tuner._tune at 0x7fe6bb06b160>
             │    └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7fe6b61bf3a0>
             └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fe6b620bd90>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/tuner/tuning.py", line 70, in _tune
    result["lr_find"] = lr_find(self.trainer, model, **lr_find_kwargs)
    │                   │       │    │        │        └ {'min_lr': 1e-06, 'max_lr': 10, 'num_training': 234, 'mode': 'exponential', 'early_stop_threshold': None, 'update_attr': False}
    │                   │       │    │        └ TimeNet(
    │                   │       │    │            (metrics_train): MetricCollection(
    │                   │       │    │              (MAE): MeanAbsoluteError()
    │                   │       │    │              (RMSE): MeanSquaredError()
    │                   │       │    │            )
    │                   │       │    │            (metrics_va...
    │                   │       │    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fe6b620bd90>
    │                   │       └ <pytorch_lightning.tuner.tuning.Tuner object at 0x7fe6b61bf3a0>
    │                   └ <function lr_find at 0x7fe6bb06b040>
    └ {}
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/tuner/lr_finder.py", line 269, in lr_find
    trainer._checkpoint_connector.restore(ckpt_path)
    │       │                     │       └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_f77e9123-123f-4777-bc2b-ac3d9f5fd6a6.ckpt'
    │       │                     └ <function CheckpointConnector.restore at 0x7fe6bb0b4d30>
    │       └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7fe6b61bf460>
    └ <pytorch_lightning.trainer.trainer.Trainer object at 0x7fe6b620bd90>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 136, in restore
    self.resume_start(checkpoint_path)
    │    │            └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_f77e9123-123f-4777-bc2b-ac3d9f5fd6a6.ckpt'
    │    └ <function CheckpointConnector.resume_start at 0x7fe6bb0b4b80>
    └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7fe6b61bf460>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 78, in resume_start
    self.resume_checkpoint_path = self._hpc_resume_path or checkpoint_path
    │    │                        │    │                   └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_f77e9123-123f-4777-bc2b-ac3d9f5fd6a6.ckpt'
    │    │                        │    └ <property object at 0x7fe6bb0baf90>
    │    │                        └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7fe6b61bf460>
    │    └ None
    └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7fe6b61bf460>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 66, in _hpc_resume_path
    max_version = self.__max_ckpt_version_in_folder(dir_path_hpc, "hpc_ckpt_")
                  │                                 └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor'
                  └ <pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector object at 0x7fe6b61bf460>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 506, in __max_ckpt_version_in_folder
    files = [os.path.basename(f["name"]) for f in fs.listdir(dir_path)]
             │  │    │                            │  │       └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor'
             │  │    │                            │  └ <function AbstractFileSystem.listdir at 0x7fe6bc13a4c0>
             │  │    │                            └ <fsspec.implementations.local.LocalFileSystem object at 0x7fe6b620b1c0>
             │  │    └ <function basename at 0x7fe74be05160>
             │  └ <module 'posixpath' from '/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/posixpath.py'>
             └ <module 'os' from '/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/os.py'>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/spec.py", line 1313, in listdir
    return self.ls(path, detail=detail, **kwargs)
           │    │  │            │         └ {}
           │    │  │            └ True
           │    │  └ '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor'
           │    └ <function LocalFileSystem.ls at 0x7fe6bc0d6c10>
           └ <fsspec.implementations.local.LocalFileSystem object at 0x7fe6b620b1c0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/implementations/local.py", line 60, in ls
    return [self.info(f) for f in it]
            │    │                └ <posix.ScandirIterator object at 0x7fe6b61dd960>
            │    └ <function LocalFileSystem.info at 0x7fe6bc0d6d30>
            └ <fsspec.implementations.local.LocalFileSystem object at 0x7fe6b620b1c0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/implementations/local.py", line 60, in <listcomp>
    return [self.info(f) for f in it]
            │    │    │      └ <DirEntry '.lr_find_61823080-15ab-4d18-8c69-66f4635a69ad.ckpt'>
            │    │    └ <DirEntry '.lr_find_61823080-15ab-4d18-8c69-66f4635a69ad.ckpt'>
            │    └ <function LocalFileSystem.info at 0x7fe6bc0d6d30>
            └ <fsspec.implementations.local.LocalFileSystem object at 0x7fe6b620b1c0>
  File "/home/codeananda/anaconda3/envs/neuralprophet/lib/python3.9/site-packages/fsspec/implementations/local.py", line 71, in info
    out = path.stat(follow_symlinks=False)
          │    └ <method 'stat' of 'posix.DirEntry' objects>
          └ <DirEntry '.lr_find_61823080-15ab-4d18-8c69-66f4635a69ad.ckpt'>

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/ForwardPredictor/.lr_find_61823080-15ab-4d18-8c69-66f4635a69ad.ckpt'

I can post the 3rd error if you'd like but I imagine 2 is enough to be getting on with.

The 6 ckpt files created simultaneously (ignore the green one, that is from my first comment)

Code just hung and never completed running. Should take 1 min, I stopped it after 15 mins.

0 replies

codeananda · 2022-12-09T14:33:27Z

codeananda
Dec 9, 2022
Author

Adding minimal reproducible example.

@ourownstory @karl-richter
Any update?

import pandas as pd
import numpy as np
from joblib import Parallel, delayed
from neuralprophet import NeuralProphet

data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"

df = pd.read_csv(data_location + "wp_log_peyton_manning.csv")
dfs = [df.copy() for _ in range(8)]

model = NeuralProphet(epochs=10)

results = Parallel(n_jobs=-1)(delayed(model.fit)(df) for df in dfs)

Display output

INFO - (NP.df_utils._infer_frequency) - Major frequency D corresponds to 99.966% of the data.
INFO - (NP.df_utils._infer_frequency) - Dataframe freq automatically defined as D
INFO - (NP.config.init_data_params) - Setting normalization to global as only one dataframe provided for training.
INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
INFO - (NP.df_utils._infer_frequency) - Major frequency D corresponds to 99.966% of the data.
INFO - (NP.df_utils._infer_frequency) - Dataframe freq automatically defined as D
INFO - (NP.df_utils._infer_frequency) - Major frequency D corresponds to 99.966% of the data.
INFO - (NP.df_utils._infer_frequency) - Dataframe freq automatically defined as D
INFO - (NP.config.init_data_params) - Setting normalization to global as only one dataframe provided for training.
INFO - (NP.config.init_data_params) - Setting normalization to global as only one dataframe provided for training.
INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
INFO - (NP.df_utils._infer_frequency) - Major frequency D corresponds to 99.966% of the data.
INFO - (NP.df_utils._infer_frequency) - Dataframe freq automatically defined as D
INFO - (NP.df_utils._infer_frequency) - Major frequency D corresponds to 99.966% of the data.
INFO - (NP.df_utils._infer_frequency) - Dataframe freq automatically defined as D
INFO - (NP.config.init_data_params) - Setting normalization to global as only one dataframe provided for training.
INFO - (NP.config.init_data_params) - Setting normalization to global as only one dataframe provided for training.
INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
INFO - (NP.df_utils._infer_frequency) - Major frequency D corresponds to 99.966% of the data.
INFO - (NP.df_utils._infer_frequency) - Dataframe freq automatically defined as D
INFO - (NP.config.init_data_params) - Setting normalization to global as only one dataframe provided for training.
INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
INFO - (NP.df_utils._infer_frequency) - Major frequency D corresponds to 99.966% of the data.
INFO - (NP.df_utils._infer_frequency) - Dataframe freq automatically defined as D
INFO - (NP.config.init_data_params) - Setting normalization to global as only one dataframe provided for training.
INFO - (NP.df_utils._infer_frequency) - Major frequency D corresponds to 99.966% of the data.
INFO - (NP.df_utils._infer_frequency) - Dataframe freq automatically defined as D
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 32
INFO - (NP.config.init_data_params) - Setting normalization to global as only one dataframe provided for training.
INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 32
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 32
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 32
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 32
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 32
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 32
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 32
WARNING - (NP.config.set_lr_finder_args) - Learning rate finder: The number of batches (93) is too small than the required number for the learning rate finder (237). The results might not be optimal.
WARNING - (NP.config.set_lr_finder_args) - Learning rate finder: The number of batches (93) is too small than the required number for the learning rate finder (237). The results might not be optimal.
WARNING - (NP.config.set_lr_finder_args) - Learning rate finder: The number of batches (93) is too small than the required number for the learning rate finder (237). The results might not be optimal.
WARNING - (NP.config.set_lr_finder_args) - Learning rate finder: The number of batches (93) is too small than the required number for the learning rate finder (237). The results might not be optimal.
WARNING - (NP.config.set_lr_finder_args) - Learning rate finder: The number of batches (93) is too small than the required number for the learning rate finder (237). The results might not be optimal.
WARNING - (NP.config.set_lr_finder_args) - Learning rate finder: The number of batches (93) is too small than the required number for the learning rate finder (237). The results might not be optimal.
WARNING - (NP.config.set_lr_finder_args) - Learning rate finder: The number of batches (93) is too small than the required number for the learning rate finder (237). The results might not be optimal.
WARNING - (NP.config.set_lr_finder_args) - Learning rate finder: The number of batches (93) is too small than the required number for the learning rate finder (237). The results might not be optimal.
Finding best initial lr: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 237/237 [00:02<00:00, 107.18it/s]
Finding best initial lr: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 237/237 [00:02<00:00, 97.37it/s]
Finding best initial lr: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 237/237 [00:02<00:00, 92.92it/s]
Finding best initial lr: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 237/237 [00:02<00:00, 97.13it/s]
Finding best initial lr: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 237/237 [00:02<00:00, 95.01it/s]
Finding best initial lr: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 237/237 [00:02<00:00, 91.83it/s]
Finding best initial lr: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 237/237 [00:02<00:00, 92.11it/s]
Finding best initial lr: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 237/237 [00:02<00:00, 92.44it/s]
Missing logger folder: /mnt/g/My Drive/1 Projects/1 AltDG - Adam/BreakoutDetector/lightning_logs
joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 428, in _process_worker
    r = call_item()
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/joblib/parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/joblib/parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/neuralprophet/forecaster.py", line 795, in fit
    metrics_df = self._train(
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/neuralprophet/forecaster.py", line 2648, in _train
    lr_finder = self.trainer.tuner.lr_find(
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/pytorch_lightning/tuner/tuning.py", line 199, in lr_find
    result = self.trainer.tune(
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1052, in tune
    result = self.tuner._tune(
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/pytorch_lightning/tuner/tuning.py", line 70, in _tune
    result["lr_find"] = lr_find(self.trainer, model, **lr_find_kwargs)
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/pytorch_lightning/tuner/lr_finder.py", line 269, in lr_find
    trainer._checkpoint_connector.restore(ckpt_path)
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 140, in restore
    self.restore_model()
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 178, in restore_model
    if self._hpc_resume_path is not None:
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 66, in _hpc_resume_path
    max_version = self.__max_ckpt_version_in_folder(dir_path_hpc, "hpc_ckpt_")
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 506, in __max_ckpt_version_in_folder
    files = [os.path.basename(f["name"]) for f in fs.listdir(dir_path)]
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/fsspec/spec.py", line 1313, in listdir
    return self.ls(path, detail=detail, **kwargs)
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/fsspec/implementations/local.py", line 60, in ls
    return [self.info(f) for f in it]
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/fsspec/implementations/local.py", line 60, in <listcomp>
    return [self.info(f) for f in it]
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/fsspec/implementations/local.py", line 71, in info
    out = path.stat(follow_symlinks=False)
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/BreakoutDetector/.lr_find_2d942da8-285c-4fdd-b0bd-fb810083bb2a.ckpt'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/g/My Drive/1 Projects/1 AltDG - Adam/BreakoutDetector/BreakoutDetector.py", line 1471, in <module>
    results = Parallel(n_jobs=-1)(delayed(model.fit)(df) for df in dfs)
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/joblib/parallel.py", line 1098, in __call__
    self.retrieve()
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/joblib/parallel.py", line 975, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/site-packages/joblib/_parallel_backends.py", line 567, in wrap_future_result
    return future.result(timeout=timeout)
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/concurrent/futures/_base.py", line 446, in result
    return self.__get_result()
  File "/home/codeananda/anaconda3/envs/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/g/My Drive/1 Projects/1 AltDG - Adam/BreakoutDetector/.lr_find_2d942da8-285c-4fdd-b0bd-fb810083bb2a.ckpt'
(local) codeananda@King:/mnt/g/My Drive/1 Projects/1 AltDG - Adam/BreakoutDetector$ /home/codeananda/anaconda3/envs/local/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 8 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Using:

neuralprophet 0.5.0rc2 (downloaded and installed from Github)
python 3.9.15

3 replies

karl-richter Dec 9, 2022
Collaborator

Thanks for adding the example! Looking into this, had to get some things done for our 0.5.0 release in the last days. In general, I feel like this is something related to Pytorch Lightning since we do not handle anything related to checkpoints on our side manually. I also experience some wired behavior with the learning rate finder on my local machine since it should technically delete the checkpoint after it found a learning rate but never happens to do so.

karl-richter Dec 9, 2022
Collaborator

We now released version 0.5.0 on pypi, but I assume we have no bigger changes in the area where you experience problems.

codeananda Dec 12, 2022
Author

Great that 0.5.0 has been released! Unfortunately, I still get the same error. Hmm ok, I might post on PyTorch Lightning then to see if they can shed any light on it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileNotFoundError looking for ckpt files #1009

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

FileNotFoundError looking for ckpt files #1009

codeananda Nov 30, 2022

Replies: 4 comments · 3 replies

karl-richter Nov 30, 2022 Collaborator

codeananda Dec 1, 2022 Author

codeananda Dec 1, 2022 Author

codeananda Dec 9, 2022 Author

karl-richter Dec 9, 2022 Collaborator

karl-richter Dec 9, 2022 Collaborator

codeananda Dec 12, 2022 Author

codeananda
Nov 30, 2022

Replies: 4 comments 3 replies

karl-richter
Nov 30, 2022
Collaborator

codeananda
Dec 1, 2022
Author

codeananda
Dec 1, 2022
Author

codeananda
Dec 9, 2022
Author

karl-richter Dec 9, 2022
Collaborator

karl-richter Dec 9, 2022
Collaborator

codeananda Dec 12, 2022
Author