Replies: 4 comments 3 replies
-
Hi Corwin! This is very much something we'd be interested in. I think that TorchTyping would probably the best option, unless there will be some "future" mode that gets PEP 646 into earlier versions of Python 3. In general, I think we want to avoid runtime checks because they do add lots of overhead. I realize our typing is a bit of a mess right now, and so it'd probably also be good to get some linting infrastructure in place that requires that all function arguments have types. @Balandat thoughts? |
Beta Was this translation helpful? Give feedback.
-
As an update, I've been exploring this concept using jax typing: https://github.com/google/jaxtyping
Please note that:
I think these run-time tests are also a helpful debugging tool to track down where dimensions (or storage types such as: dense, sparse, diagnonal, ...) are changing in an unexpected way. To illustrate some of the challenges, and what this would look like with linear_operator, here are a few key annotations for the base LinearOperator class. I think typing is helpful here but would welcome feedback:
Here is a simple example where I think a layout field could help:
I think we can improve this by adding a field to jax layout to specify the internal storage layout (like what is done in TorchTyping).
A big potential application of this could be to track when a linear operator changes from "sparse" to "dense". Finally, there are some operations that can dynamically add/remove dimensions. I'm not sure how to define a better signature for these:
Work-in-progress where I am exploring these ideas can be found at https://github.com/corwinjoy/linear_operator/tree/jaxtyping. |
Beta Was this translation helpful? Give feedback.
-
Awesome, that all sounds great. I'm happy to take a look at the PR when it's ready. DaCe sounds very interesting, I'll check out the video! On 23 Nov 2022 03:04, Corwin Joy ***@***.***> wrote:
I'm still working on this. I got run-time type checking turned on for the unit tests which is great in terms of being able to properly test these signatures. The downside is that around 400 of the existing signatures were imprecise/inaccurate so I have to work through a number of changes to get accurate starting signatures. So, this will be a bit of a bigger PR than I would like. In the meantime, I was at SC22 last week and came across a very cool project by Alexandros Ziogas. He has created a cool framework called DaCe. It can take improved type information and use this to generate better parallel implementations for numerical algorithms. This might help us further accelerate some of our operations once we have these signatures in place. Here is a link - the introductory video is probably the best place to start:
DaCe Framework
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
@dannyfriar @gpleiss I have gone ahead and done an initial PR with these improved signatures. I would say that there is more to do along these lines but I believe this is a step in the right direction. Anyway, I hope it can create a productive discussion around these ideas. Thanks! |
Beta Was this translation helpful? Give feedback.
-
@dannyfriar and I have been discussing ideas on how to improve the library. One good suggestion that he made is that stronger type annotations could be a big help.
In fact, I've noticed that a number of recent issues have involved type difficulties, which stronger annotations might be able to catch. For example:
Avoid evaluating kernel when adding jitter cornellius-gp/gpytorch#2140
The "predict" method in the Deep GP tutorial does not work correctly. cornellius-gp/gpytorch#1892
Broadly speaking, both of these are kinds of type errors. One with broadcasting and the other with the matrix storage type.
I think that better checking of [dimensions, element type, matrix storage type] might make the code clearer and help catch some of these bugs.
I am happy to add these annotations, but I am not sure what the best approach would be.
Some options include:
Ideas for array shape typing in Python gives an overview of how shape types can be helpful.
My plan would be to start with the linear_operator library and see if I can make the functions more explicit there. One simple idea would be to do a minor extension of TorchTyping like:
This would give checking of dimensions + data type but would need to be extended to capture the storage type (dense, interleaved, etc.).
Some of the dynamic functions like slicing and permuting would not be helped much by this approach since it is hard to say too much statically.
Anyway, I wanted to see if there was interest in this idea and/or suggestions on how best to proceed.
Thanks!
Corwin
Beta Was this translation helpful? Give feedback.
All reactions