Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[minor] Support data without datestamps #1188

Closed
wants to merge 30 commits into from

Conversation

SimonWittner
Copy link
Collaborator

@SimonWittner SimonWittner commented Mar 2, 2023

🔬 Background

🔮 Key changes

  • Created a helper function to add equidistant datestamps
  • Function can be manually called and adjusted to the required dates and frequency
  • Function is automatically called if df consists only of column "y"

📋 Review Checklist

  • I have performed a self-review of my own code.
  • I have commented my code, added docstrings and data types to function definitions.
  • I have added pytests to check whether my feature / fix works.

@karl-richter
Copy link
Collaborator

karl-richter commented Mar 8, 2023

Hi @SimonWittner, I know you are just getting started with this pr - but maybe have a look at this. This will make it easier to have meaningful commit messages, pr titles and so on :)

Also, the model/metrics pytest does currently not work from forks yet. You can ignore it if that pytest fails for now.

@noxan noxan added the status: in development Pull requests which are in development label Mar 8, 2023
@SimonWittner SimonWittner changed the title Start [minor] Support data without datestamps Mar 8, 2023
@ourownstory
Copy link
Owner

Welcome @SimonWittner !
You picked a not so trivial issue to get started with. I look forward to hear how you plan to tackle it!
I suggest you have a chat with @judussoari as he is working on sample and datestamp processing things right now.

@SimonWittner
Copy link
Collaborator Author

Thank you for your messages! @karl-richter please let me know, if I can further improve the PR.
@ourownstory I'll get back to you!

@codecov-commenter
Copy link

codecov-commenter commented Mar 18, 2023

Codecov Report

Merging #1188 (509c618) into main (2b303f8) will decrease coverage by 0.15%.
The diff coverage is 81.15%.

❗ Current head 509c618 differs from pull request most recent head 82ba128. Consider uploading reports for the commit 82ba128 to get more accurate results

@@            Coverage Diff             @@
##             main    #1188      +/-   ##
==========================================
- Coverage   89.88%   89.74%   -0.15%     
==========================================
  Files          38       38              
  Lines        5103     5139      +36     
==========================================
+ Hits         4587     4612      +25     
- Misses        516      527      +11     
Impacted Files Coverage Δ
neuralprophet/data/process.py 91.96% <50.00%> (-0.60%) ⬇️
neuralprophet/forecaster.py 86.67% <79.59%> (-0.73%) ⬇️
neuralprophet/data/split.py 90.47% <80.00%> (-0.35%) ⬇️
neuralprophet/df_utils.py 95.31% <100.00%> (+0.05%) ⬆️

@SimonWittner SimonWittner marked this pull request as ready for review March 21, 2023 20:41
Copy link
Owner

@ourownstory ourownstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @SimonWittner for your contribution!

While I was thinking of adding true support for data without datestamps, which would be entail large codebase changes, your approach is a rather simple alternative that may get the job done. I think this is a good addition, so I'd like to proceed to get your contribution merge-ready.
To do so, I suggest to still do the following:

  • rename mentions of 'random' to 'dummy' or 'placeholder'
  • use seconds as the frequency (to avoid issues with very long time-series)
  • make sure to disable auto-seasonality when your code is triggered. (as the daily/weekly/yearly seasonalities will have no meaning)
  • May there be a better location (less nested and less frequently called) than prep_or_copy_df for your code to be called?
  • Add to the user warning that the frequency was set to seconds
  • soft suggestion: have the function only return a series of datestamps, (no passing of df), and use a more efficient implementation of adding the column to the df than concat (outside the function).

Please let me know if you disagree with my suggestions or have further ideas

@SimonWittner
Copy link
Collaborator Author

Thank you @ourownstory for your feedback! I implemented all your suggestions. Please let me know if I can further improve the PR!

@leoniewgnr leoniewgnr added status: needs review PR needs to be reviewed by Reviewer(s) and removed status: in development Pull requests which are in development labels Apr 26, 2023
@leoniewgnr leoniewgnr added this to the Release 1.0.0 RC1 milestone Apr 26, 2023
neuralprophet/df_utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@leoniewgnr leoniewgnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just change back to IDs and lets check why one test fails

@leoniewgnr leoniewgnr added status: ready PR is ready to be merged and removed status: needs review PR needs to be reviewed by Reviewer(s) labels Apr 27, 2023
Copy link
Collaborator

@leoniewgnr leoniewgnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@leoniewgnr leoniewgnr requested a review from ourownstory April 27, 2023 04:07
Copy link
Owner

@ourownstory ourownstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good changes!

I noticed that the model is passed to the datestamp creation function.
Let us separate the disabling (editing of model config) from the timestamps creation.
So, for datestamp creation function, let us not pass the model, but only the precise args that are needed.

Further, the editing of model parameters (in a separate function), should not happen in make_future_dataframe. If seasonalities are enabled there and no datestamps present, it should create an error.

@@ -513,6 +511,42 @@ def check_single_dataframe(df, check_y, covariates, regressors, events, seasonal
return df


def create_dummy_datestamps(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall!

Let us separate the disabling (editing of model config) from the timestamps creation.

So, for this function, let us not pass the model, but only the precise args that are needed.

@leoniewgnr leoniewgnr added status: needs update PR has outstanding comment(s) or PR test(s) that need to be resolved and removed status: ready PR is ready to be merged labels Apr 27, 2023
@ourownstory ourownstory removed this from the Release 1.0.0 RC1 milestone Apr 28, 2023
@leoniewgnr leoniewgnr added status: needs review PR needs to be reviewed by Reviewer(s) and removed status: needs update PR has outstanding comment(s) or PR test(s) that need to be resolved labels Apr 29, 2023
@SimonWittner
Copy link
Collaborator Author

SimonWittner commented May 12, 2023

@ourownstory in the latest commit, I removed the function call for create_dummy_datestamps from check_dataframe.
However in this commit, the create_dummy_datestamps is sometimes called in a forecaster function (for example in crossvalidation_split_df).

I think that it would be prettier to leave the function call in check_dataframe (commit 467fee9 "improved model handling_v3").

Please let me know which commit you prefer!

@SimonWittner
Copy link
Collaborator Author

Handling of data without datestamp fixed with #1432

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: needs review PR needs to be reviewed by Reviewer(s)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-date data: Support equidistant data without datestamps
7 participants