Replies: 1 comment 7 replies
-
Hi @PK1706 Lots of items here so hopefully I can address at least a few:
|
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi everyone,
I have some questions regarding the right way of accessing values from the posterior, depending on what I want to compare.
Based on PyMC-Marketing I implemented an MMM (some business requirements and conforming to legacy code made this necessary) which has the following structure:
Part of the control variables are a trend, holiday and seasonality component which were created using Prophet (therefore I didn't use the Fourier contributions provided by PyMC-Marketing).
I used a Truncated Normal Distribution for the control variables to allow for negative coefficients for all control coefficients except for the ones related to Prophet.
To make this more structured I will number my questions:
To get control contributions and channel contributions I currently calculate the median (also tried it with the mean) of all of the parameters and then perform transformations and multiply by the coefficients. I do this in the same way as e.g. in the budget allocation code. When I on the other hand access the posterior median of channel or control contributions directly (also sometimes used in the PyMC-Marketing Code), this results in different values. What could be a possible cause of this and what is the right way to access these values?
What is the difference between extracting the posterior median of mu and the posterior median of y? Basically what I am confused about is when interpreting the influence of variables we basically only look at their influence on mu and not on the likelihood y right? What implications does this have, for example also when sampling from the posterior predictive distribution?
This question may be or may be not connected to the previous questions. Below you see...:
a) the posterior predictive plot (which looks quite good)
b) a residual dependence plot (comparing the observed values with the sum of channel contributions, control contributions and intercept) --> the dashed line is at 0
c) a goodness of fit plot (comparing the observed values (red) with the sum of median channel contributions, control contributions and intercept (blue))
Unfortunately I had to remove the axis labels.
What is however directly noticeable is a kind of constant bias especially visible in the residual dependence plot. It almost looks like the model is trying to predict my observed value, plus a constant. In the Posterior predictive plot this is however not as visible.
Do you have any idea what could lead to this bias? I have tried different kinds of priors, managing to reduce the bias (to the level you see in the plots) but not completely remove it. Am I accessing the wrong thing by using the median of the sum of median channel contributions, control contributions and intercept? If so, why would this be (vastly) different from the median of y and lead to this constant bias?
I hope that I could somehow convey what my problem is. If you need some more information to answer any of these questions please let me know!
I would highly appreciate an answer to any of these questions!
Beta Was this translation helpful? Give feedback.
All reactions