Make a partial function into a pytree #21475

randomekek · 2024-05-29T01:38:10Z

randomekek
May 29, 2024

Hello, I've written a library which takes any function and converts a partial application it into a pytree. You write f(*input, *parameters), and the decorator converts it into a function of f(*parameters)(*input).

The idea is that a model has two parts, the parameters and the input, we want to vary the input, and differentiate the parameters.

It's like a differentiable version of functools.partial. I think it would be cool if jax provided such a feature by default. It's kinda like equinox, if you only implement __call__.

@funtree
def Model(input, parameter):
  return input + parameter

model = Model(parameter=jnp.arrray([1,2,3]))
# model is a pytree, can get gradients wrt parameter
grad = jax.grad(lambda f, x: f(x))(model, input)

This lets you write terse code such as attention below, which makes reading the math easy. The complicated part is all moved to an external function to actually initialize a new model. I think that a lot of the complexity of frameworks is magic running to get shape sizes, which should all be handled by the initializer function, which should know every parameter. I have no idea how to do state, because I haven't tried batch norm yet.

@funtree.makefun
def Mlp(x, key, up, down, dropout_p: float):
    x_norm = rms_norm(x)
    expanded = jax.nn.gelu(einsum(x_norm, up, 'L E, E U -> L U'))
    lowered = einsum(expanded, down, 'L U, U E -> L E')
    return dropout(lowered, key, dropout_p)

@funtree.makefun
def Attention(x, key, qkv, out, heads: int, dropout_p: float):
    x_norm = rms_norm(x)
    parts = einsum(x_norm, qkv, 'L E, E HsplitD -> L HsplitD')
    k, q, v = rearrange(parts, 'L (H split D) -> split H L D', split=3, H=heads)
    q, k = norm(q), norm(k)
    H, L, D = k.shape
    mask = jnp.tril(jnp.ones([L, L]))
    similarity = einsum(k, q, 'H L D, H L2 D -> H L L2') * (D ** -0.5)
    masked_similarity = jnp.where(mask, similarity, -jnp.inf)
    attention = jax.nn.softmax(masked_similarity, axis=-1)
    attention = dropout(attention, key, dropout_p)
    gather = einsum(attention, v, 'H L L2, H L2 V -> H L V')
    gather = rearrange(gather, 'H L V -> L (H V)')
    output = einsum(gather, out, 'L Z, Z E -> L E')
    return output

@funtree.makefun
def GPT(x, key, embedding, positional, layers, unembed):
    L = x.shape[0]
    hidden = embedding[x] + positional[:L, :]
    for layer, k in utils.zipkey(layers, key):
        hidden = hidden + layer(hidden, key=k)
    logits = einsum(unembed, hidden, 'E O, L E -> L O')
    return logits

def init_gpt_model(vocab, embedding, heads, layer_count, expansion, max_length, use_swiglu):
   return GPT(embedding=..., )

I have an implementation: https://github.com/randomekek/sequence/blob/main/funtree.py#L17-L52

Answered by jakevdp

May 29, 2024

How does this compare to jax.tree_util.Partial?

View full answer

jakevdp · 2024-05-29T01:41:22Z

jakevdp
May 29, 2024
Maintainer

How does this compare to jax.tree_util.Partial?

1 reply

randomekek May 29, 2024
Author

Hmm that does make a pytree, so it would be differentiable. I have to check later how it deals with non differentiable fields. I've co-opted the x: float|int|bool type annotation to indicate to not store the field in the pytree, but as a static parameter.

randomekek · 2024-05-29T04:01:13Z

randomekek
May 29, 2024
Author

Your right, I think a combination of jax.tree_util.Partial for jax.Array and functools.partial for int, bool, floats is really all you need, no need for equinox and haiku etc. The entire model can be written as functions, and most of the complexity gets moved into a init_model call at some point, which has to build the partial function to start it off.

The main differences is that I defer the transformation, so you need to call 3 times, so funtree is a transformation like jit or grad. The benefit is that funtree(f) is now a type.

mine: funtree(f)(params)(input)
existing: jtu.partial(f, params)(input)
ie funtree(f)(params) == jtu.partial(f, params)

2 replies

jakevdp May 29, 2024
Maintainer

If funtree(f)(params) is basically equivalent to Partial(f, params), then funtree(f) is basically equivalent to Partial(Partial, f), and so funtree is basically equivalent to Partial(Partial, Partial) 😀

randomekek May 29, 2024
Author

A model is a composition of an equation, parameters and an input. I guess the equation is really just a hyper parameter.

I guess hyper parameters are just Partial partial parameters 😂😂

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make a partial function into a pytree #21475

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Make a partial function into a pytree #21475

randomekek May 29, 2024

Replies: 2 comments · 3 replies

jakevdp May 29, 2024 Maintainer

randomekek May 29, 2024 Author

randomekek May 29, 2024 Author

jakevdp May 29, 2024 Maintainer

randomekek May 29, 2024 Author

randomekek
May 29, 2024

Replies: 2 comments 3 replies

jakevdp
May 29, 2024
Maintainer

randomekek May 29, 2024
Author

randomekek
May 29, 2024
Author

jakevdp May 29, 2024
Maintainer

randomekek May 29, 2024
Author