AR-Net Future Method Produces Autocorrelation Issues, Historic Data Errors Linearly Increase #1546

mmangione · 2024-01-24T21:42:05Z

mmangione
Jan 24, 2024

Discussed in #1519

^{Originally posted by mmangione January 23, 2024}
We use NP for a number of prediction tasks and I've been focusing on improving the accuracy of our forecasts. One problem that I've been running into is that the shift-method the NP team uses to iteratively produce future predictions is both undocumented, undiscussed, and unexamined. Can someone from the NP team explain it to me?

For context here is the result of the training fit. This is a 60-day hold out test where we fit the training data and test it on the holdout set for prediction. It worked great.:

Here is the result of the 30 day forecast fit:

As I progress forward in my future predictions, the previous performance suffers. The historic fits to the data become less and less well-fitting, and every metric seems to suffer. I first noticed this in the uncertainty measurements. I have been using CQR and I notice that my miscoverage rate linearly increases with yhat.

So, I began to examine my results and found that it is no fluke. Each successive yhat has poorer performance on it's own historic data. So, I started looking at the residuals, and that's when I saw an indicator as to what is happening... whatever operation is being performed on the data is introducing autocorrelation issues. Very significant ones too.

In one particular dataset that was daily data with 1.5 years of history and a 30 day forward prediction window, yhat1 had:

Dubin-Watson statistic of 1.897
Ljung-Box statistic (with 1 lag) of

	lb_stat	lb_pvalue
1	1.399921	0.236737

However, when I looked at yhat30, it had:

Durbin-Watson statistic of 0.844
Ljung-Box statistic (with 1 lag) of

	lb_stat	lb_pvalue
1	184.792187	4.35678e-42

For yhat1, the autocorrelation falls with in the acceptable parameters. For yhat30, the autocorrelation is very pronounced. DW says we've introduced positive autocorrelation errors, and LB just says it's displaying significant issues - on it's own historic prediction data.

TLDR: Whatever is being done to produce these future predictions is introducing autocorrelation issues, does not maintain consistency for ar-lag numbers, and seems to result in linearly increasing errors for the historical data period. These are pretty significant results too.

mmangione · 2024-02-01T21:41:10Z

mmangione
Feb 1, 2024
Author

@ourownstory - I think you're probably the person that would need to answer this. I think this is mainly your method. I can see where you do it in your code, but I don't understand your rationale.

0 replies

ourownstory · 2024-02-12T23:23:17Z

ourownstory
Feb 12, 2024
Maintainer

@mmangione Thank you for the nice explanation of what you are observing.
I am not too surprised by these results. What you are observing here are several things that are typical with time series forecasting.

First off, the accuracy of prediction on your holdout set tends to become progressively worse as you get farther from the end of training data. Especially if there are any trend or other paradigm changes. However, this seems to be a smaller contributing factor in your case.

Second, the prediction accuracy for each forecast step is lower as you predict farther into the future - further away from your prediction origin (last state observed by model). This is why yhat-30 performs far worse than yhat-1 - just like the weather forecast for tomorrow is far more accurate than one made for 30 days from today.

Third, your dataset is very small for the size of your model. If you are predicting 30 days ahead, and are using 60 days as lagged observations, with no hidden layers, your model size for the AR component alone is (30x60 = ) 1800 parameters, which will most likely overfit on (1.5x365 -30 -60 +1 = ) 458 data samples.

As you are using statistics commonly used with ARIMA and are referring to a 'shift-method [...] to iteratively produce future predictions', I presume you may be expecting that the model is fitted in a traditional ARIMA style for a single forecast prediction step and then unrolled for the desired number of steps. This is however not the case - the model fits a matrix (with optional latent layers) regressing each lag onto each forecast step - akin fitting an AR model for each forecast step.

I suggest reducing your forecast horizon, and if a larger horizon is needed, to fit a second model working on lower frequency data, e.g. weekly data.

I hope this can help you despite my late answer, and please let me know if I misunderstood anything.

0 replies

mmangione · 2024-02-19T21:22:03Z

mmangione
Feb 19, 2024
Author

@ourownstory Thank you for taking your time to answer my questions! I appreciate the detail in your answers. They do help me understand quite a bit what's happening here and how I might improve the model.

I have a couple of follow-up questions:

If autocorrelation analysis of the residuals is not correct to use here, what would you suggest would be a better method for measuring the predictive health of the AR models?

I have since switched from Ljung-Box to a more general Lagrange Multiplier method. It is saying the same thing, the emergence of a significant value indicates a potential "missing variable". In this case, to support your point, an improper time horizon would appear as a missing variable since the predictive power of the data alone is not enough.

(Off topic, but it is curious that in the overfit state, the autocorrelation measures all fell within tolerance and suggested no autocorrelation issues. That makes me wary of these methods for this application.)

In my analysis of the residuals, I cut off the future prediction data and just looked at the historical fit. What is the mechanism that causes the historical data model to deviate from the training model?

I suspect that it might simply be that my model was transitioning from an overfit state to a more correct fit state through the regressing process. Although, I am curious as to your assessment.

Thank you for taking the time to help me understand this.

P.S. I have already improved my models significantly from your last post. Based on your feedback, I started to include economic data as regressor variables. That seems to have helped significantly as these are unit shipments from a warehouse.

0 replies

ourownstory · 2024-02-20T21:11:03Z

ourownstory
Feb 20, 2024
Maintainer

Hi @mmangione you are welcome.

I would suggest using (seasonal) Naive predictions as a baseline benchmark to troubleshoot/benchmark against. It is a solid and simple to interpret, yet not so simple to beat, baseline in many applications.

Next, look at how your accuracy (in your train set) changes from yhat1 to yhatN.
I am not sure if I understand your second question properly.

To help properly I would need to better understand your forecasting task and available data.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AR-Net Future Method Produces Autocorrelation Issues, Historic Data Errors Linearly Increase #1546

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

AR-Net Future Method Produces Autocorrelation Issues, Historic Data Errors Linearly Increase #1546

mmangione Jan 24, 2024

Discussed in #1519

Replies: 4 comments

mmangione Feb 1, 2024 Author

ourownstory Feb 12, 2024 Maintainer

mmangione Feb 19, 2024 Author

ourownstory Feb 20, 2024 Maintainer

mmangione
Jan 24, 2024

mmangione
Feb 1, 2024
Author

ourownstory
Feb 12, 2024
Maintainer

mmangione
Feb 19, 2024
Author

ourownstory
Feb 20, 2024
Maintainer