Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: view should return a consistent object #459

Closed
henryiii opened this issue Sep 26, 2020 · 5 comments
Closed

refactor!: view should return a consistent object #459

henryiii opened this issue Sep 26, 2020 · 5 comments

Comments

@henryiii
Copy link
Member

henryiii commented Sep 26, 2020

Currently, .view() returns two very different dtypes; "plain" dtypes / simple arrays in the case of simple storages, and structured types / View arrays in the case of accumulator storages. This makes it impossible to ducktype on the view. How about if it always returns a structured dtype? That is, a new View (SimpleView, for example) would be added:

@fields("value")
class SimpleView(View):
    ...

This way, if you want the histogram value, h.view().value works on simple storages too now, and therefore all storages that have the concept of value provide a .value. A single-field structured dtype is identical in memory to a simple dtype.

If you wanted to provide a simple output array of values, a .values() or .value()` method could be added to histogram (see the various cross-library protocol discussions). Or, you could require this as part of the Protocol (not as ideal for other libraries, but possible).

TLDR, values = h.view() is replaced by values = h.view().value, so now that works across all storage types.

@HDembinski, thoughts?

@HDembinski
Copy link
Member

I reject this. boost-histogram is supposed to a thin minimal wrapper around the C++ histogram and this is moving it away from that. For simple histograms with basic counters, view should return a simple array, not a structured array. I also do not want a .value() or .values() method on the histogram for reasons that I already explained. This is breaking the orthogonality of Boost.Histograms components.

@henryiii
Copy link
Member Author

henryiii commented Sep 28, 2020

Then how do we provide a consistent API? Please write a function that takes an arbitrary storage histogram and accesses the values. Assume this is a recipe you have to give to many packages (mplhep, fitting tools., etc), so excessive use of isinstance/dtype checks are discouraged; and other histogram packages (at least @HDembinski's Uproot4) have to be able to produce identical behaviors without depending on boost-histogram. Better yet, assume that some day we add a storage (maybe #378) that has no "values" at all. The latter should correctly break the function that you write, since there are no values to access. We need this in Scikit-HEP to plot, run fitting tools, and more. Having a structured array from .view() is something that would be hard to convince everyone of, but probably not impossible, if you really didn't want to have a .value()/.values(). Also, details of the dtype should be considered mostly private, so we have to depend on attributes.

Can I also remind you that you also don't want the storage type to be queryable from Python (the method is hidden)? But you want to only have an API that changes based on the storage type? You also insisted on only having one Histogram class, rather than one for each storage, but you then want the API to depend the hidden, interior type. Even from a pure Boost-Histogram perspective, shouldn't you be able to take an existing analysis, change the storage type from Double to Weight, and then start using the new information iteratively where you need the variances, instead of having the whole analysis break because a simple array is no longer returned?

Note that Boost.Histogram doesn't live in a larger ecosystem, and C++ doesn't have a strong history of duck typing (slightly more common with templated code now, but still you still always have the underlying type defined and accessible).

Also, C++ does not return a NumPy array. The NumPy array is already providing a view of the raw array that does not exist in C++. dtype=[('value', 'f8')]) is just as valid a NumPy view of the data as is dtype="f8".

I'm not stuck on any particular solution, but I want a solution. You seem to be saying you will not accept any change that solves this key issue?

@henryiii
Copy link
Member Author

A nice consequence: h.view().value = ... is valid Python syntax, while h.view() = ... is not.

@HDembinski
Copy link
Member

I don't want a consistent API for boost-histogram. You can stop suggesting anything, because I will reject all suggestions. Any attempt to offer a uniform API is limiting the flexibility and freedom of Boost.Histogram to offer arbitrary accumulators.

You can check the type of the view and react to that in wrappers.

@HDembinski
Copy link
Member

To be more specific, it is a terrible idea to try to uniform the view, which is the most direct technical access to the Accumulator. I may support adding a crude lossy generic API to the Histogram class itself. I added some thoughts on generic APIs to the issue on the generic protocol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants