Skip to content

Internals: jq Assigment Operators

Nico Williams edited this page Jul 12, 2023 · 10 revisions

Assignments in the compiler and in the block representation

The jq assigment operators =, //=, <op>= (e.g., +=, -=, etc.), and |= are very special. They're not like assignments in most languages -- they are just another kind of jq expression that produces zero, one, or more values, but the values produced are the input with the changes denoted by the right-hand side (RHS) to the left-hand side (LHS) of the input to the assignment.

The LHS is very special: it is a path expression (TODO: add wiki page about path expressions), which is an expression consisting only of sub-expressions like .a, if/then/else with path expressions as the actions, and/or calls to functions whose bodies are path expressions.

The RHS is some expression which, in the case of |= receives the current value at the LHS in ., while in the other cases the RHS receives . (the input to the whole assignment expression). The latter can be confusing.

Inspecting src/parser.y is instructive.

First we have //= and <op>=

Exp "//=" Exp {
  $$ = gen_definedor_assign($1, $3);
} |
static block gen_definedor_assign(block object, block val) {
  block tmp = gen_op_var_fresh(STOREV, "tmp");
  return BLOCK(gen_op_simple(DUP),
               val, tmp,
               gen_call("_modify", BLOCK(gen_lambda(object),
                                         gen_lambda(gen_definedor(gen_noop(),
                                                                  gen_op_bound(LOADV, tmp))))));
}
Exp "+=" Exp {
  $$ = gen_update($1, $3, '+');
} |
static block gen_update(block object, block val, int optype) {
  block tmp = gen_op_var_fresh(STOREV, "tmp");
  return BLOCK(gen_op_simple(DUP),
               val,
               tmp,
               gen_call("_modify", BLOCK(gen_lambda(object),
                                         gen_lambda(gen_binop(gen_noop(),
                                                              gen_op_bound(LOADV, tmp),
                                                              optype)))));
}

Having val before the gen_call("_modify", ...) is the reason that the RHS of //= gets the . of the LHS as its value, the reason that it's evaluated every time, and also the reason that the assignment is done once per-value output by the RHS.

Compare to |= which is coded like this:

Exp "|=" Exp {
  $$ = gen_call("_modify", BLOCK(gen_lambda($1), gen_lambda($3)));
} |

Ok, let's translate all of this to English:

  • First |=: gen_call("_modify", BLOCK(gen_lambda($1), gen_lambda($3))); means: "generate a call to _modify with the lhs ($1) as the first argument and the rhs ($3) as the second argument (note that jq function arguments are lambdas, thus the gen_lambda()s).

  • Now gen_definedor_assign() and gen_update() (which are very similar):

    • the DUP is memory management -- ignore for this analysis
    • val is the RHS, and we will invoke it immediately
    • store the val output(s) (RHS) in tmp (a gensym'ed $binding)
    • call _modify (the heart of modify-assign operators) with the input to the LHS as the first argument and a second argument that amounts to . // $tmp where $tmp is the gensym'ed binding mentione above

The difference between //= and other op= assignments is that // is block-coded in gen_definedor() while the ops are builtins like _plus. // could have been jq-coded, but it's not.

Assignments in the jq-coded helpers

The jq-coded builtin _assign implements the jq-coded part of the = assignment operator:

def _assign(paths; $value): reduce path(paths) as $p (.; setpath($p; $value));

The jq-coded builtin _modify implements the jq-coded part of all the other assignment operators:

def _modify(paths; update):
    reduce path(paths) as $p ([., []];
        . as $dot
      | null
      | label $out
      | ($dot[0] | getpath($p)) as $v
      | (
          (   $$$$v
            | update
            | (., break $out) as $v
            | $$$$dot
            | setpath([0] + $p; $v)
          ),
          (
              $$$$dot
            | setpath([1, (.[1] | length)]; $p)
          )
        )
    ) | . as $dot | $dot[0] | delpaths($dot[1]);

The $$$$v thing is an internal-only hack where evaluating $$$$v produces $v's value, but also sets $v to null so that the next invocation of $$$$v or $v produces null. This is done to avoid holding on to a reference that would cause copy-on-write behavior that would make _modify accidentally quadratic.

TBD: Explain more.

Clone this wiki locally