Adds a `nickel test` subcommand for testing examples in docs. #2020

jneem · 2024-08-01T05:47:07Z

This adds a nickel test subcommand that extracts all nickel code blocks contained in doc fields, and runs them to make sure they work. The examples are run in the environment of the record they're contained in, so in

{
  foo = 3,
  bar | doc m%"
  ```nickel
    foo + 1
  ```
  "%
}

the "foo" variable in the doc test will evaluate to 3.

There is rudimentary support for tests that are expected to error, and tests that are expected to evaluate to a specific value: a test ending in

# => [1, 2]

will be wrapped in (...) | std.contract.Equal [1, 2], while a test ending in

# => error: foo

will be expected to error, with the main diagnostic message containing the string "foo".

Tests can be ignored, by starting them with ```nickel ignore

This mostly conforms to how the examples in the standard library are formatted, but a few changes are required:

each fenced code block can only be used to test a single example
the => expected bit needs to be on the last line, and in a comment

I converted the doctests in std.array to the new format, and discovered a couple of typos in the process. If the format is ok, I'll do the rest of the stdlib.

github-actions · 2024-08-01T05:54:14Z

Bencher

Report	Thu, September 12, 2024 at 11:40:25 UTC
Project	nickel
Branch	2020/merge
Testbed	ubuntu-latest

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!
Latency (latency)
Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results

Benchmark	Latency	Latency Results nanoseconds (ns)
fibonacci 10	➖ (view plot)	490,070.00
pidigits 100	➖ (view plot)	3,154,400.00
product 30	➖ (view plot)	782,240.00
scalar 10	➖ (view plot)	1,448,600.00
sum 30	➖ (view plot)	763,410.00

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

yannham · 2024-08-01T14:33:26Z

Idea: we could do as in the manual, and use a modifier such as nickel #repl or nickel #multilines (I don't know if the hash was a good idea, I think nowadays GH uses just a space for those kind of "language modifiers", but you get the spirit) for multiline examples, using a new line as a delimiter? It's basic and has a few gotcha when your example is long enough so that you want to have newlines in it, but it's been working rather well until now for the manual.

I fear a sequence of four or five 2-line fenced code blocks are going to be less readable and quite noisy in the source or in the markdown rendered version. What do you think?

jneem · 2024-08-01T15:26:53Z

I gave ```nickel multiline a try and it seems pretty reasonable. I don't think #parse or #repl make much sense in doctests.

One possible issue with modifiers is that typos can give confusing errors. I accidentally wrote multline instead of multiline just now and the result was a little confusing. One possibility would be to error on unknown modifiers, but that might limit the ability to postprocess the documentation with other tools that look for modifiers on code blocks...

jneem · 2024-08-07T04:14:57Z

Ok, I've cleaned up the implementation and I think it's ready for review.

jneem · 2024-08-07T15:13:12Z

A couple caveats:

I'd like to track the locations better, so that errors in the doctest will point to spans in the original files. This will require tracking the doc spans better (including tracking how much it got un-indented).
I've disabled the numerical tests because of precision issues (the error message printing out the actual value truncates it to f64, so copy-pasting that value back into the test doesn't work)

yannham

I'm very happy to see this going through; we have had our fair share of examples that aren't updated properly in the stdlib, for example.

In general, I found that it lacked a bit of documentation, both general and detailed. First the high-level scheme of the pipeline, which is: you go through a first time by applying the transformation and filling the registry with test entries, then you go through the spine to run the added tests. Additionally, each more specific small function and datatype could be commenter as well, even quickly.

Another remark is that we seem to expose a lot of the internals of Program and Cache, which doesn't sound so good - including "transmuting" the main_id. At this rate, do you think it could also be an alternative to just go with the "lower-level" objects instead (Cache + VirtualMachine), like what the REPL is doing? In the end, Program is mostly a struct and a few thin wrappers around VirtualMachine or Cache methods, at least when there are no overrides (and here the command doesn't allow them). Although there's still eval_record_spine which is non trivial and useful. Or to just use Program for that, but to use a stand-alone separate VirtualMachine to handle the generated snippets. Or to move part of the code that requires the access to the internals to Program itself? Maybe not, but from a distance we are breaking the abstraction barrier quite a lot here.

cli/src/cli.rs

cli/src/doctest.rs

yannham · 2024-09-04T12:43:35Z

cli/src/doctest.rs

+        DocTest { input, expected }
+    }
+
+    fn code(&self) -> String {


This method would deserve documentation

I've decided to do this a bit differently, building up the contract application programmatically instead of by modifying the text (which gives slightly weird error messages).

yannham · 2024-09-09T13:35:13Z

core/src/eval/mod.rs

@@ -124,7 +124,7 @@ pub struct VirtualMachine<R: ImportResolver, C: Cache> {
    // The call stack, for error reporting.
    call_stack: CallStack,
    // The interface used to fetch imports.
-    import_resolver: R,
+    pub import_resolver: R,


There are already import_resolver() and import_resolver_mut() methods; is this pub really needed? Although setting a field to be public is maybe simpler, in which case maybe we should get rid of the accessors, but having both looks redundant.

I think maybe it was originally for borrow-check reasons (because import_resolver_mut() borrows the whole VirtualMachine), but it isn't needed anymore.

core/src/program.rs

yannham · 2024-09-09T13:43:07Z

core/src/cache.rs

@@ -1068,6 +1068,18 @@ impl Cache {
        self.terms.get(&file_id).map(|TermEntry { term, .. }| term)
    }

+    /// Set a new value for a cached term.
+    pub fn set(&mut self, file_id: FileId, term: RichTerm, state: EntryState) {


Why do we need this? That is a bit suspicious.

It is definitely suspicious and I've removed it.

I replaced it with the custom_transform function, which I'm also not completely thrilled with but I think it's better than the previous poking around in Program's private fields.

Fundamentally, the problem is that we want to apply a custom transformation pass, but otherwise we want all of the behavior that Program provides (the input processing, the override mode, the error reporting...)

cli/src/doctest.rs

jneem · 2024-09-10T08:34:22Z

Thanks for the review, I think this is in a better state now. It's still missing tests and user-facing documentation. I'll try to get to that tomorrow.

yannham · 2024-09-10T09:57:53Z

core/src/program.rs

@@ -672,7 +701,7 @@ impl<EC: EvalCache> Program<EC> {
                        result.body.pos,
                    ))
                }
-                _ => Ok(result.body),
+                _ => Ok(result.body.closurize(&mut vm.cache, result.env)),


I wonder if that can break stuff. In particular extract_from_term is expecting the record spine to be fields with Record and RecRecord directly inside, while here it seems we're introducing intermediate Closure nodes everywhere.

PS: ok, I think I get it. We're introducing closures only when the result isn't a record, which is treated above. It's still a strange discrepancy: it means that if you take a random value somewhere in the field of a record spine, if it's a record and you try to evaluate it further you might get unbound identifiers, but if it's not a record then it's properly closurized and it's evaluating fine.

I have mixed feelings about this, although it doesn't seem to break existing code. Intuitively I would expect evaluate_record_spine to ensure that the spine is just nested records and terminal values with no Closures in-between (something stupid: imagine we want to list all the accessible functions in the future, eval_record_spine would be a good candidate, but now functions are hidden behind Closure nodes, for example).

I wonder if we should somehow specialize eval_record_spine for doc testing to make it clear that it's a strange version that might not make sense for something else, and keep eval_record_spine with the current semantics (of course, under the hood, we can use a generic common base that just takes an additional boolean-like argument to know if it should closurize non records or something like that). It's just a suggestion, there might be other ways to go about it,, but I have the overall feeling that we try to cram too many different things into one method.

it means that if you take a random value somewhere in the field of a record spine, if it's a record and you try to evaluate it further you might get unbound identifiers

I didn't follow this part -- as long as you're encountering records, there's nothing to evaluate and so no unbound identifiers. And then as soon as you encounter a non-record, it's closurized with the right environment, right?

It seems to me like the closurized version is the appropriate version if you're planning to do further evaluation, while the non-closurized version is the right one for any other analysis.

of course, under the hood, we can use a generic common base that just takes an additional boolean-like argument to know if it should closurize non records or something like that

Sure, I'll give that version a try.

I didn't follow this part -- as long as you're encountering records, there's nothing to evaluate and so no unbound identifiers. And then as soon as you encounter a non-record, it's closurized with the right environment, right?

I think you're right (although that might not be entirely true because contracts are evaluated but not closurized).

It seems to me like the closurized version is the appropriate version if you're planning to do further evaluation, while the non-closurized version is the right one for any other analysis.

I think my intuition about eval_record_spine was precisely that you don't want to perform further evaluation, but just peek at the term and gather whatever information you want. For example, nickel doc doesn't currently include default values, but we could imagine doing so in the future. It would be easy with the previous version, but in the new one we now need to get the content of the closure from the cache first (which isn't undoable, but looks a bit like a leaking abstraction IMHO).

However, I see that you could write the specification of your version as all records are evaluated recursively, and environment is pushed toward the "leaf" by closurizing. I didn't see a simple and correct explanation of the updated behaviour of eval_record_spine, but this may be reasonable. I'm still slightly in favour of separating the two flavour, but I don't have such a strong opinion anymore.

yannham

I think beside the reservation around eval_record_spine, it looks good now! It's just missing a bunch of tests indeed. In a second step we will also want to test the stdlib as part of the general Nickel testing suite.

jneem · 2024-09-12T05:15:56Z

I have some tests and docs now, but

I'm not sure where the docs should go. Maybe there should be a section for documenting the cli interface?
While writing the tests, I noticed that some outputs (i.e. any error messages raised during the tests) go to stderr, while others (i.e. the progress messages, and the summary at the end) go to stdout. I guess they should probably all go to the same place? But I'm not sure which one is the right one...

yannham · 2024-09-24T12:44:22Z

@jneem I lost track a bit - is there anything left to do on this PR, either on your side or mine? Is the stdout vs stderr stuff settled ?

jneem · 2024-09-24T13:02:11Z

I think stderr/stdout is settled. The only outstanding question IIRC is where the docs should go.

yannham · 2024-09-24T15:49:46Z

I see. I wonder if it would nice to do something like the Nix CLI - write the CLI documentation as markdown that can be exported as an mdbook but also as manpages, with a single source of truth. In the meantime I guess a new section of the manual for the CLI should do - note that adding a manual section might require some trivial adaptation of nickel-lang.org as well, at least after the next release.

jneem · 2024-09-26T12:50:28Z

@yannham I forgot to bring this up in the weekly, but can we merge this and do the CLI docs as a follow-up?

github-actions bot temporarily deployed to pull request August 1, 2024 05:49 Inactive

github-actions bot temporarily deployed to pull request August 1, 2024 07:34 Inactive

github-actions bot temporarily deployed to pull request August 1, 2024 08:03 Inactive

github-actions bot temporarily deployed to pull request August 6, 2024 15:35 Inactive

github-actions bot temporarily deployed to pull request August 7, 2024 04:14 Inactive

jneem marked this pull request as ready for review August 7, 2024 04:15

jneem requested a review from yannham August 7, 2024 04:15

yannham reviewed Sep 9, 2024

View reviewed changes

jneem added 2 commits September 10, 2024 09:35

Doctests

4e003d4

Add a custom_transform method

a0550b6

jneem force-pushed the doctests branch from c4c975d to a0550b6 Compare September 10, 2024 08:02

github-actions bot temporarily deployed to pull request September 10, 2024 08:05 Inactive

jneem added 2 commits September 10, 2024 15:29

Make the vm private again

c607555

Update doc

8a60420

github-actions bot temporarily deployed to pull request September 10, 2024 08:34 Inactive

yannham reviewed Sep 10, 2024

View reviewed changes

jneem added 2 commits September 11, 2024 18:05

Add some snapshot tests

64ef293

Split out closurized and non-closurized variants of eval_record_spine

1ccde4c

github-actions bot temporarily deployed to pull request September 11, 2024 11:47 Inactive

Fix doc link

6d5c6b1

github-actions bot temporarily deployed to pull request September 11, 2024 13:01 Inactive

Add some docs, but not sure where they should go

390ec4c

github-actions bot temporarily deployed to pull request September 12, 2024 05:09 Inactive

Make report_as_str honor the color setting

70dff8e

github-actions bot temporarily deployed to pull request September 12, 2024 06:54 Inactive

Separate the stdout and stderr outputs more intentionally

dc30074

github-actions bot temporarily deployed to pull request September 12, 2024 10:59 Inactive

Commit the other snapshots

452440b

github-actions bot temporarily deployed to pull request September 12, 2024 11:35 Inactive

jneem requested a review from yannham September 26, 2024 13:44

yannham approved these changes Sep 26, 2024

View reviewed changes

yannham added this pull request to the merge queue Sep 26, 2024

Merged via the queue into master with commit eee41e3 Sep 26, 2024
7 checks passed

yannham deleted the doctests branch September 26, 2024 14:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds a `nickel test` subcommand for testing examples in docs. #2020

Adds a `nickel test` subcommand for testing examples in docs. #2020

jneem commented Aug 1, 2024 •

edited

Loading

github-actions bot commented Aug 1, 2024 •

edited

Loading

yannham commented Aug 1, 2024

jneem commented Aug 1, 2024

jneem commented Aug 7, 2024

jneem commented Aug 7, 2024

yannham left a comment •

edited

Loading

yannham Sep 4, 2024

jneem Sep 10, 2024

yannham Sep 9, 2024

jneem Sep 10, 2024

yannham Sep 9, 2024

jneem Sep 10, 2024

jneem commented Sep 10, 2024

yannham Sep 10, 2024 •

edited

Loading

jneem Sep 10, 2024

yannham Sep 10, 2024

yannham left a comment

jneem commented Sep 12, 2024

yannham commented Sep 24, 2024

jneem commented Sep 24, 2024

yannham commented Sep 24, 2024

jneem commented Sep 26, 2024

Adds a nickel test subcommand for testing examples in docs. #2020

Adds a nickel test subcommand for testing examples in docs. #2020

Conversation

jneem commented Aug 1, 2024 • edited Loading

github-actions bot commented Aug 1, 2024 • edited Loading

Bencher

yannham commented Aug 1, 2024

jneem commented Aug 1, 2024

jneem commented Aug 7, 2024

jneem commented Aug 7, 2024

yannham left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jneem commented Sep 10, 2024

yannham Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yannham left a comment

Choose a reason for hiding this comment

jneem commented Sep 12, 2024

yannham commented Sep 24, 2024

jneem commented Sep 24, 2024

yannham commented Sep 24, 2024

jneem commented Sep 26, 2024

Adds a `nickel test` subcommand for testing examples in docs. #2020

Adds a `nickel test` subcommand for testing examples in docs. #2020

jneem commented Aug 1, 2024 •

edited

Loading

github-actions bot commented Aug 1, 2024 •

edited

Loading

yannham left a comment •

edited

Loading

yannham Sep 10, 2024 •

edited

Loading