Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model vs. Distribution solution #25

Open
ariddell opened this issue Aug 7, 2015 · 5 comments
Open

Model vs. Distribution solution #25

ariddell opened this issue Aug 7, 2015 · 5 comments

Comments

@ariddell
Copy link
Contributor

ariddell commented Aug 7, 2015

So the distinction between Model and Distribution is confusing for very simple models (e.g., sometimes a Gaussian is the model; you want to .add_data to that.). What about a class decorator or metaclass wrapper that creates a DistNameModel for each DistName with no code duplication.

Or perhaps you're considering other API changes?

@mattjj
Copy link
Owner

mattjj commented Aug 7, 2015

In this code a Model means something with extensive latent variables (in addition to intensive latent variables). That is, a Model has a latent variable for each data point, hence it needs an add_data to 'wrap' data sequences into objects that glue on those latent variables.

That's how it is now, but maybe something else would be better. I think you're suggesting that Distributions should also have an add_data so they can implicitly remember data, so that e.g. calling resample() with no arguments would implicitly be resampling based on that added data. Is that right?

If that's the case, we could probably accomplish that with a mixin like this one (untested, probably doesn't work with Python 3 or at all):

class _AddDataMixin(object):
    def __init__(self, *args, **kwargs):
        self.data_list = []
        super(_AddDataMixin, self).__init__(*args, **kwargs)

    def add_data(self, data):
        self.data_list.append(data)

    def resample(self, data):
        super(_AddDataMixin, self).resample(combine_data(data, self.data_list))

@mattjj
Copy link
Owner

mattjj commented Aug 7, 2015

Can you spell out the use case you have in mind? Maybe gluing some data to a distribution for when we're working in a semi-supervised setting?

@ariddell
Copy link
Contributor Author

ariddell commented Aug 7, 2015

yeah, that was more or less what I was thinking of. Perhaps there's some
additional checking the mixin should do to make sure that the thing it
is being mixed in with already has a resample method?

Also, what's the convention you're following for adding _ before some
mixins and not for others?

On 08/07, Matthew Johnson wrote:

In this code a Model means something with extensive latent variables (in addition to intensive latent variables). That is, a Model has a latent variable for each data point, hence it needs an add_data to 'wrap' data sequences into objects that glue on those latent variables.

That's how it is now, but maybe something else would be better. I think you're suggesting that Distributions should also have an add_data so they can implicitly remember data, so that e.g. calling resample() with no arguments would implicitly be resampling based on that added data. Is that right?

If that's the case, we could probably accomplish that with a mixin like this one (untested, probably doesn't work with Python 3 or at all):

class _AddDataMixin(object):
    def __init__(self, *args, **kwargs):
        self.data_list = []
        super(_AddDataMixin, self).__init__(*args, **kwargs)

    def add_data(self, data):
        self.data_list.append(data)

    def resample(self, data):
        super(_AddDataMixin, self).resample(combine_data(data, self.data_list))

Reply to this email directly or view it on GitHub:
#25 (comment)

@mattjj
Copy link
Owner

mattjj commented Aug 7, 2015

I think it's a Python convention to put an underscore in front of "internal" things that aren't part of the user API. Or maybe I just made it up.

@ariddell
Copy link
Contributor Author

ariddell commented Aug 7, 2015

Got it. Just thinking that in this case one really wants to encourage reuse of the traits by users -- i.e., folks adding new distributions and (derived) models.

I might get around to doing this. It seems like it's worthwhile just for teaching, e.g., show somehow how to get samples from a GaussianModel (derived from Gaussian) which, in theory, someone might want to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants