Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexable model #167

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

pdeffebach
Copy link

This is a very short PR that serves as a test for a feature that I think would be nice in StatsModels. It allows you to index a model by a Term.

It is motivated by Stata functionality like _b[`var'] which allows you to, get the beta coefficient for the column represented by var. This is really useful when making tables and graphs programatically.

My approach is to take in a model and a AbstractTerm. Then check if the AbstractTerm matches, roughly, something in the model. If it does, match, it returns a NamedTuple with the coefficient name, coefficient, and the standard error.

julia> t = (y = rand(100), x = rand(100), b = rand(Bool, 100));
julia> m = lm(@formula(y ~ x + x & b), t);
julia> getparams(m, Term(:x)) 

Note that in the last line Term(:x) is not a ContinuousTerm or CategoricalTerm, I just match the x.sym paramter.

@codecov-io
Copy link

codecov-io commented Jan 4, 2020

Codecov Report

Merging #167 into master will decrease coverage by 2.5%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #167      +/-   ##
==========================================
- Coverage   84.78%   82.28%   -2.51%     
==========================================
  Files           9        9              
  Lines         493      508      +15     
==========================================
  Hits          418      418              
- Misses         75       90      +15
Impacted Files Coverage Δ
src/statsmodel.jl 68.91% <0%> (-17.53%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e59b8e...1364d83. Read the comment docs.

@Tokazama
Copy link

Tokazama commented Jan 6, 2020

I think this might begin to address a larger problem we have. Although we have the ability to go from a formula to a model, we don't have a way to do the reverse. I know this is difficult to ensure because it's up to packages like GLM.jl and MixedModels.jl to support this sort of behavior, but if we want to be able to plot the results of lm(@formula(y ~ x + x & b), t) then we need to be able to parse out the relevant terms as intercepts, slopes, etc.

@kleinschmidt
Copy link
Member

I like this idea generally, and it's related to other discussions about overhauling the modeling API (e.g., requiring that StatisticalModels keep the formula themselves, instead of relying on the wrapper type like we currently do). #32 I think is the relevant issue...

@pdeffebach
Copy link
Author

Presumably this would be a fall back, returning a Named Tuple for just the statistics models are required to have, if any, right? Most packages would have to write their own getparams function.

@Tokazama
Copy link

Tokazama commented Jan 8, 2020

After looking at #32 it seems like we want to move towards the formula interface and less so TableModels. Would it make sense to have a getparams for formulas so that it could be model agnostic? This wouldn't necessarily preclude the current PR, but would instead make it easier to extend to new types of terms.

@Tokazama
Copy link

Related issue is #111, which would be extremely useful.

@pdeffebach
Copy link
Author

I'm reviving this.

David do you have anything in mind for how exactly this should work? The inter-linking of PRs and Issues here suggests this problem is likely in a chicken-or-the-egg state.

Is there concrete groundwork that really has to be done before we can implement this feature more?

@palday
Copy link
Member

palday commented Mar 15, 2020

What would this look like for FormulaTerms? Random effects in MixedModels.jl are initially parsed as FormulaTerms.

@pdeffebach
Copy link
Author

What would this look like for FormulaTerms? Random effects in MixedModels.jl are initially parsed as FormulaTerms.

I don't know! Probably something like

julia> getparams(m, Term(RandomEffect, :x: )) 

But my knowledge of all of this is pretty weak.

@Tokazama I agree that #111 would be super useful here. It would make it a lot easier to index into a model.

@pdeffebach
Copy link
Author

Just realized that R's outputs are indexable via heavy use of named arrays

r$> confint(m_ols)                                                                                        
                 2.5 %    97.5 %
(Intercept)  3.8836484 5.5022774
exprop       0.3584959 0.6164465
latitude    -0.2919431 2.3197287

r$> confint(m_ols)["exprop", "2.5 %"]                                                                     
[1] 0.3584959

I regret having let this languish. Hopefully I can pick it up soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants