Implementing custom transformations? #3749

thisiscam · 2020-07-14T04:47:54Z

thisiscam
Jul 14, 2020

I really like the design of JAX, that at its core is a program transformation system.

So I was wondering how easy would it be to implement some of my custom program transformations.

For example, I wanted to implement the following transformation:

def f(x):
  return x + np.exp(x)

def expected_transformed_f(x, x_prime):
  return x + np.exp(x_prime) + np.exp(x_prime) * (x - x_prime)

x, x_prime  = ... random ...
assert custom_transform(f)(x, x_prime) == expected_transformed_f(x, x_prime)

In particular, what the transformation is doing is that it identifies the subexpression np.exp(x), and rewrite it with its first order Taylor expansion around x_prime. Then it adds the x_prime as a new formal parameter to the transformed function.

I thought the transformation should be achievable with Jax; if so, what would be a good place for me to look at for a starter?
I have already peeked into Jax's core a little bit, and was wondering what's the relationship between linear_util.transform and Trace/Tracer? My understanding is that the former is just a utility wrapper, while the later is doing the actual lifting of the program transformation (while doing an abstract interpretation).
Is there some documentation of Trace/Tracer? In particular, I got confused of what pure, lift etc does...

Answered by thisiscam

Jul 15, 2020

I think I'm going to lay out my plan here, just so to see if the Jax team has any advice or interest in (all or part of) the plan.

I'm interested in using custom transformations for a scenario of "distributed approximate optimization" (name not determined yet). The idea is that the user first writes a non-distributed model, and my tool will:

Generate an approximation to that model, with the "Taylor approximate transformation" I mentioned above
Perform optimization on the approximated model but still non-distributed (standard Jax stuff, so no big deal here)
Split the non-distributed function into distributed, with potentially a Jax transformation. This sort of works like pmap, but in…

View full answer

hawkinsp · 2020-07-14T18:02:03Z

hawkinsp
Jul 14, 2020
Maintainer

This seems like it would do better as a Github discussion (https://github.com/google/jax/discussions), rather than an issue. There's no action for us to take here. Closing; feel free to reopen in the discussions section.

To answer your concrete question, have you seen:
https://jax.readthedocs.io/en/latest/notebooks/Writing_custom_interpreters_in_Jax.html
and
https://jax.readthedocs.io/en/latest/notebooks/How_JAX_primitives_work.html

?

7 replies

thisiscam Jul 14, 2020
Author

I have read both links and gone through the examples. I think they missed in introducing the types Trace and Tracer. Also I'm not sure I understood how the abstract interpretation worked in Jax. E.g. in https://jax.readthedocs.io/en/latest/notebooks/Writing_custom_interpreters_in_Jax.html, there's a custom function pv_like, which I really like to know if Jax internally has a similar function defined somewhere (e.g. for doing program analysis during jit/vmap/vectorization etc.)

thisiscam Jul 14, 2020
Author

Or do all those analysis (jit/vmap/vectorization), work on the Jaxpr level (not the python level)?

jekbradbury Jul 15, 2020

Many of the analyses done inside JAX work at the Python level, but it's usually much easier (especially for users new to JAX internals) to implement analyses and transformations at the jaxpr level (as described in the custom interpreters notebook).

thisiscam Jul 15, 2020
Author

Just curious: which analysis is done at the Python level (one example will be good enough, so I can take a look of how its done)?

For analysis at Jaxpr level, I think one will need to implement "rules" for all Jax primitives (as I see in most transformations this is the case). Is there a comprehensive list of all the primitives, so a developer can check against for coverage?

jekbradbury Jul 15, 2020

I guess "at the Python level" is a little misleading. All analyses are done at the jaxpr level (using rules for all JAX primitives), but some of them are done on jaxprs and others are done using tracing in Python.

Primitives are an open set (both experimental code in JAX and code outside JAX can add their own), and implementing a particular transformation for a particular primitive is somewhat jointly the responsibility of the transformation author and the primitive author (it's fine if your transformation only covers the primitives you need).

thisiscam · 2020-07-14T19:45:08Z

thisiscam
Jul 14, 2020
Author

I also saw the callback_transformation https://github.com/google/jax/blob/master/jax/experimental/callback.py#L32. This does seem relevant, though looking at the PR that created this: #2665.

We assign the callback to the master trace. While this seems to work, Matt notes that there might be failure cases.

I just hope there's some simple working examples (callback.py counts, modulo the warning above) of using Trace and Tracer, so I can dig myself of doing custom transformations.

1 reply

jekbradbury Jul 15, 2020

I think callback_transformation is intended to cover examples of transformations that preserve semantics but carry some extra information along; your transformation is a little more complex than that (it seems more similar to jvp in ad.py and jet in jet.py).

thisiscam · 2020-07-15T20:37:07Z

thisiscam
Jul 15, 2020
Author

I think I'm going to lay out my plan here, just so to see if the Jax team has any advice or interest in (all or part of) the plan.

I'm interested in using custom transformations for a scenario of "distributed approximate optimization" (name not determined yet). The idea is that the user first writes a non-distributed model, and my tool will:

Generate an approximation to that model, with the "Taylor approximate transformation" I mentioned above
Perform optimization on the approximated model but still non-distributed (standard Jax stuff, so no big deal here)
Split the non-distributed function into distributed, with potentially a Jax transformation. This sort of works like pmap, but instead of requiring multiple GPUs, it just splits out individual callable functions to be performed manually at each distributed site, and I assume the results are collected and transferred manually (e.g. over mail after COVID is over). However this transformation should try to minimize communication between sites, and unfortunately I realized How does Jax handle loop invariants? #3770 could be a hurdle to me, and LICM is crucial in reducing the number of comminications.

Right now, I do sort of see how I can achieve 1) and 2). But I'm a bit afraid of 3), in that Jaxpr may be too high level to perform this analysis (though in theory it's all possible).

I do see that 3) is potentially of interest to others and the Jax team, as it's basically a pmap that may allow custom collaboration across machines (e.g. via MPI instead of mail). Also I might end up implementing a LICM pass if I have decided to go along this path.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing custom transformations? #3749

{{title}}

Replies: 3 comments 8 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Implementing custom transformations? #3749

thisiscam Jul 14, 2020

Replies: 3 comments · 8 replies

hawkinsp Jul 14, 2020 Maintainer

thisiscam Jul 14, 2020 Author

thisiscam Jul 14, 2020 Author

jekbradbury Jul 15, 2020

thisiscam Jul 15, 2020 Author

jekbradbury Jul 15, 2020

thisiscam Jul 14, 2020 Author

jekbradbury Jul 15, 2020

thisiscam Jul 15, 2020 Author

thisiscam
Jul 14, 2020

Replies: 3 comments 8 replies

hawkinsp
Jul 14, 2020
Maintainer

thisiscam Jul 14, 2020
Author

thisiscam Jul 14, 2020
Author

thisiscam Jul 15, 2020
Author

thisiscam
Jul 14, 2020
Author

thisiscam
Jul 15, 2020
Author