-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Internals: jq Assigment Operators
The jq assigment operators =
, //=
, <op>=
(e.g., +=
, -=
, etc.), and |=
are very special. They're not like assignments in most languages -- they are just another kind of jq expression that produces zero, one, or more values, but the values produced are the input with the changes denoted by the right-hand side (RHS) to the left-hand side (LHS) of the input to the assignment.
The LHS is very special: it is a path expression (TODO: add wiki page about path expressions), which is an expression consisting only of sub-expressions like .a
, if/then/else with path expressions as the actions, and/or calls to functions whose bodies are path expressions.
The RHS is some expression which, in the case of |=
receives the current value at the LHS in .
, while in the other cases the RHS receives .
(the input to the whole assignment expression). The latter can be confusing.
Inspecting src/parser.y
is instructive.
First we have //=
and <op>=
Exp "//=" Exp {
$$ = gen_definedor_assign($1, $3);
} |
static block gen_definedor_assign(block object, block val) {
block tmp = gen_op_var_fresh(STOREV, "tmp");
return BLOCK(gen_op_simple(DUP),
val, tmp,
gen_call("_modify", BLOCK(gen_lambda(object),
gen_lambda(gen_definedor(gen_noop(),
gen_op_bound(LOADV, tmp))))));
}
Exp "+=" Exp {
$$ = gen_update($1, $3, '+');
} |
static block gen_update(block object, block val, int optype) {
block tmp = gen_op_var_fresh(STOREV, "tmp");
return BLOCK(gen_op_simple(DUP),
val,
tmp,
gen_call("_modify", BLOCK(gen_lambda(object),
gen_lambda(gen_binop(gen_noop(),
gen_op_bound(LOADV, tmp),
optype)))));
}
Having val
before the gen_call("_modify", ...)
is the reason that the RHS of //=
gets the .
of the LHS as its value, the reason that it's evaluated every time, and also the reason that the assignment is done once per-value output by the RHS.
Compare to |=
which is coded like this:
Exp "|=" Exp {
$$ = gen_call("_modify", BLOCK(gen_lambda($1), gen_lambda($3)));
} |
Ok, let's translate all of this to English:
-
First
|=
:gen_call("_modify", BLOCK(gen_lambda($1), gen_lambda($3)));
means: "generate a call to_modify
with the lhs ($1
) as the first argument and the rhs ($3
) as the second argument (note that jq function arguments are lambdas, thus thegen_lambda()
s). -
Now
gen_definedor_assign()
andgen_update()
(which are very similar):- the
DUP
is memory management -- ignore for this analysis -
val
is the RHS, and we will invoke it immediately - store the
val
output(s) (RHS) intmp
(a gensym'ed$binding
) - call
_modify
(the heart of modify-assign operators) with the input to the LHS as the first argument and a second argument that amounts to. // $tmp
where$tmp
is the gensym'ed binding mentione above
- the
The difference between //=
and other op=
assignments is that //
is block-coded in gen_definedor()
while the op
s are builtins like _plus
. //
could have been jq-coded, but it's not.
The jq-coded builtin _assign
implements the jq-coded part of the =
assignment operator:
def _assign(paths; $value): reduce path(paths) as $p (.; setpath($p; $value));
_assign
is pretty self-explanatory. All it does is reduce over the paths setting the given value at each path. It helps to first see the yacc
/bison
/compiler side of things (see above).
The jq-coded builtin _modify
implements the jq-coded part of all the other assignment operators:
def _modify(paths; update):
reduce path(paths) as $p ([., []];
. as $dot
| null
| label $out
| ($dot[0] | getpath($p)) as $v
| (
( $$$$v
| update
| (., break $out) as $v
| $$$$dot
| setpath([0] + $p; $v)
),
(
$$$$dot
| setpath([1, (.[1] | length)]; $p)
)
)
) | . as $dot | $dot[0] | delpaths($dot[1]);
The $$$$v
thing is an internal-only hack where evaluating $$$$v
produces $v
's value, but also sets $v
to null
so that the next invocation of $$$$v
or $v
produces null
. This is done to avoid holding on to a reference that would cause copy-on-write behavior that would make _modify
accidentally quadratic.
In English we're reducing over the paths
using an array as the reduction stat containing .
and an initially empty array of paths to delete. For any path for which update
produces a value, we take the first value and alter .
to set that value at that path. For any path for which update
produces no value (empty
), we add that path to the array of paths to delete. Once the reduction completes we then delete all the paths queued up for deletion. We delay deletions because otherwise we risk deleting array elements incorrectly because we generally traverse array elements from the first to the last, but if we delete any non-last element then the indices of the remaining elements will decrement, which in turn causes subsequent deletions to be off.
What's really tricky here is that we need to make sure we have just one reference to the reduction state when we get to setpath([0] + $p; $v)
(where update
produced a value) or setpath([1, (.[1] | length)]; $p)
(where update
was empty
so we're queuing up a deletion of that path). We also need to have only one reference to the value to be altered.
TBD: Explain more.
- Home
- FAQ
- jq Language Description
- Cookbook
- Modules
- Parsing Expression Grammars
- Docs for Oniguruma Regular Expressions (RE.txt)
- Advanced Topics
- Guide for Contributors
- How To
- C API
- jq Internals
- Tips
- Development