Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a nickel test subcommand for testing examples in docs. #2020

Merged
merged 11 commits into from
Sep 26, 2024
Merged

Conversation

jneem
Copy link
Member

@jneem jneem commented Aug 1, 2024

This adds a nickel test subcommand that extracts all nickel code blocks contained in doc fields, and runs them to make sure they work. The examples are run in the environment of the record they're contained in, so in

{
  foo = 3,
  bar | doc m%"
  ```nickel
    foo + 1
  ```
  "%
}

the "foo" variable in the doc test will evaluate to 3.

There is rudimentary support for tests that are expected to error, and tests that are expected to evaluate to a specific value: a test ending in

# => [1, 2]

will be wrapped in (...) | std.contract.Equal [1, 2], while a test ending in

# => error: foo

will be expected to error, with the main diagnostic message containing the string "foo".

Tests can be ignored, by starting them with ```nickel ignore

This mostly conforms to how the examples in the standard library are formatted, but a few changes are required:

  • each fenced code block can only be used to test a single example
  • the => expected bit needs to be on the last line, and in a comment

I converted the doctests in std.array to the new format, and discovered a couple of typos in the process. If the format is ok, I'll do the rest of the stdlib.

@github-actions github-actions bot temporarily deployed to pull request August 1, 2024 05:49 Inactive
Copy link
Contributor

github-actions bot commented Aug 1, 2024

🐰Bencher

ReportThu, September 12, 2024 at 11:40:25 UTC
Projectnickel
Branch2020/merge
Testbedubuntu-latest

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!

  • Latency (latency)

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results
BenchmarkLatencyLatency Results
nanoseconds (ns)
fibonacci 10➖ (view plot)490,070.00
pidigits 100➖ (view plot)3,154,400.00
product 30➖ (view plot)782,240.00
scalar 10➖ (view plot)1,448,600.00
sum 30➖ (view plot)763,410.00

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

@github-actions github-actions bot temporarily deployed to pull request August 1, 2024 07:34 Inactive
@github-actions github-actions bot temporarily deployed to pull request August 1, 2024 08:03 Inactive
@yannham
Copy link
Member

yannham commented Aug 1, 2024

Idea: we could do as in the manual, and use a modifier such as nickel #repl or nickel #multilines (I don't know if the hash was a good idea, I think nowadays GH uses just a space for those kind of "language modifiers", but you get the spirit) for multiline examples, using a new line as a delimiter? It's basic and has a few gotcha when your example is long enough so that you want to have newlines in it, but it's been working rather well until now for the manual.

I fear a sequence of four or five 2-line fenced code blocks are going to be less readable and quite noisy in the source or in the markdown rendered version. What do you think?

@jneem
Copy link
Member Author

jneem commented Aug 1, 2024

I gave ```nickel multiline a try and it seems pretty reasonable. I don't think #parse or #repl make much sense in doctests.

One possible issue with modifiers is that typos can give confusing errors. I accidentally wrote multline instead of multiline just now and the result was a little confusing. One possibility would be to error on unknown modifiers, but that might limit the ability to postprocess the documentation with other tools that look for modifiers on code blocks...

@github-actions github-actions bot temporarily deployed to pull request August 6, 2024 15:35 Inactive
@github-actions github-actions bot temporarily deployed to pull request August 7, 2024 04:14 Inactive
@jneem
Copy link
Member Author

jneem commented Aug 7, 2024

Ok, I've cleaned up the implementation and I think it's ready for review.

@jneem jneem marked this pull request as ready for review August 7, 2024 04:15
@jneem jneem requested a review from yannham August 7, 2024 04:15
@jneem
Copy link
Member Author

jneem commented Aug 7, 2024

A couple caveats:

  • I'd like to track the locations better, so that errors in the doctest will point to spans in the original files. This will require tracking the doc spans better (including tracking how much it got un-indented).
  • I've disabled the numerical tests because of precision issues (the error message printing out the actual value truncates it to f64, so copy-pasting that value back into the test doesn't work)

Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very happy to see this going through; we have had our fair share of examples that aren't updated properly in the stdlib, for example.

In general, I found that it lacked a bit of documentation, both general and detailed. First the high-level scheme of the pipeline, which is: you go through a first time by applying the transformation and filling the registry with test entries, then you go through the spine to run the added tests. Additionally, each more specific small function and datatype could be commenter as well, even quickly.

Another remark is that we seem to expose a lot of the internals of Program and Cache, which doesn't sound so good - including "transmuting" the main_id. At this rate, do you think it could also be an alternative to just go with the "lower-level" objects instead (Cache + VirtualMachine), like what the REPL is doing? In the end, Program is mostly a struct and a few thin wrappers around VirtualMachine or Cache methods, at least when there are no overrides (and here the command doesn't allow them). Although there's still eval_record_spine which is non trivial and useful. Or to just use Program for that, but to use a stand-alone separate VirtualMachine to handle the generated snippets. Or to move part of the code that requires the access to the internals to Program itself? Maybe not, but from a distance we are breaking the abstraction barrier quite a lot here.

cli/src/cli.rs Outdated Show resolved Hide resolved
cli/src/doctest.rs Outdated Show resolved Hide resolved
cli/src/doctest.rs Outdated Show resolved Hide resolved
DocTest { input, expected }
}

fn code(&self) -> String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method would deserve documentation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to do this a bit differently, building up the contract application programmatically instead of by modifying the text (which gives slightly weird error messages).

@@ -124,7 +124,7 @@ pub struct VirtualMachine<R: ImportResolver, C: Cache> {
// The call stack, for error reporting.
call_stack: CallStack,
// The interface used to fetch imports.
import_resolver: R,
pub import_resolver: R,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already import_resolver() and import_resolver_mut() methods; is this pub really needed? Although setting a field to be public is maybe simpler, in which case maybe we should get rid of the accessors, but having both looks redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe it was originally for borrow-check reasons (because import_resolver_mut() borrows the whole VirtualMachine), but it isn't needed anymore.

core/src/program.rs Show resolved Hide resolved
@@ -1068,6 +1068,18 @@ impl Cache {
self.terms.get(&file_id).map(|TermEntry { term, .. }| term)
}

/// Set a new value for a cached term.
pub fn set(&mut self, file_id: FileId, term: RichTerm, state: EntryState) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? That is a bit suspicious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitely suspicious and I've removed it.

I replaced it with the custom_transform function, which I'm also not completely thrilled with but I think it's better than the previous poking around in Program's private fields.

Fundamentally, the problem is that we want to apply a custom transformation pass, but otherwise we want all of the behavior that Program provides (the input processing, the override mode, the error reporting...)

cli/src/doctest.rs Show resolved Hide resolved
cli/src/doctest.rs Outdated Show resolved Hide resolved
@github-actions github-actions bot temporarily deployed to pull request September 10, 2024 08:05 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 10, 2024 08:34 Inactive
@jneem
Copy link
Member Author

jneem commented Sep 10, 2024

Thanks for the review, I think this is in a better state now. It's still missing tests and user-facing documentation. I'll try to get to that tomorrow.

@@ -672,7 +701,7 @@ impl<EC: EvalCache> Program<EC> {
result.body.pos,
))
}
_ => Ok(result.body),
_ => Ok(result.body.closurize(&mut vm.cache, result.env)),
Copy link
Member

@yannham yannham Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if that can break stuff. In particular extract_from_term is expecting the record spine to be fields with Record and RecRecord directly inside, while here it seems we're introducing intermediate Closure nodes everywhere.

PS: ok, I think I get it. We're introducing closures only when the result isn't a record, which is treated above. It's still a strange discrepancy: it means that if you take a random value somewhere in the field of a record spine, if it's a record and you try to evaluate it further you might get unbound identifiers, but if it's not a record then it's properly closurized and it's evaluating fine.

I have mixed feelings about this, although it doesn't seem to break existing code. Intuitively I would expect evaluate_record_spine to ensure that the spine is just nested records and terminal values with no Closures in-between (something stupid: imagine we want to list all the accessible functions in the future, eval_record_spine would be a good candidate, but now functions are hidden behind Closure nodes, for example).

I wonder if we should somehow specialize eval_record_spine for doc testing to make it clear that it's a strange version that might not make sense for something else, and keep eval_record_spine with the current semantics (of course, under the hood, we can use a generic common base that just takes an additional boolean-like argument to know if it should closurize non records or something like that). It's just a suggestion, there might be other ways to go about it,, but I have the overall feeling that we try to cram too many different things into one method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means that if you take a random value somewhere in the field of a record spine, if it's a record and you try to evaluate it further you might get unbound identifiers

I didn't follow this part -- as long as you're encountering records, there's nothing to evaluate and so no unbound identifiers. And then as soon as you encounter a non-record, it's closurized with the right environment, right?

It seems to me like the closurized version is the appropriate version if you're planning to do further evaluation, while the non-closurized version is the right one for any other analysis.

of course, under the hood, we can use a generic common base that just takes an additional boolean-like argument to know if it should closurize non records or something like that

Sure, I'll give that version a try.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't follow this part -- as long as you're encountering records, there's nothing to evaluate and so no unbound identifiers. And then as soon as you encounter a non-record, it's closurized with the right environment, right?

I think you're right (although that might not be entirely true because contracts are evaluated but not closurized).

It seems to me like the closurized version is the appropriate version if you're planning to do further evaluation, while the non-closurized version is the right one for any other analysis.

I think my intuition about eval_record_spine was precisely that you don't want to perform further evaluation, but just peek at the term and gather whatever information you want. For example, nickel doc doesn't currently include default values, but we could imagine doing so in the future. It would be easy with the previous version, but in the new one we now need to get the content of the closure from the cache first (which isn't undoable, but looks a bit like a leaking abstraction IMHO).

However, I see that you could write the specification of your version as all records are evaluated recursively, and environment is pushed toward the "leaf" by closurizing. I didn't see a simple and correct explanation of the updated behaviour of eval_record_spine, but this may be reasonable. I'm still slightly in favour of separating the two flavour, but I don't have such a strong opinion anymore.

Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think beside the reservation around eval_record_spine, it looks good now! It's just missing a bunch of tests indeed. In a second step we will also want to test the stdlib as part of the general Nickel testing suite.

@github-actions github-actions bot temporarily deployed to pull request September 11, 2024 11:47 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 11, 2024 13:01 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 12, 2024 05:09 Inactive
@jneem
Copy link
Member Author

jneem commented Sep 12, 2024

I have some tests and docs now, but

  • I'm not sure where the docs should go. Maybe there should be a section for documenting the cli interface?
  • While writing the tests, I noticed that some outputs (i.e. any error messages raised during the tests) go to stderr, while others (i.e. the progress messages, and the summary at the end) go to stdout. I guess they should probably all go to the same place? But I'm not sure which one is the right one...

@github-actions github-actions bot temporarily deployed to pull request September 12, 2024 06:54 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 12, 2024 10:59 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 12, 2024 11:35 Inactive
@yannham
Copy link
Member

yannham commented Sep 24, 2024

@jneem I lost track a bit - is there anything left to do on this PR, either on your side or mine? Is the stdout vs stderr stuff settled ?

@jneem
Copy link
Member Author

jneem commented Sep 24, 2024

I think stderr/stdout is settled. The only outstanding question IIRC is where the docs should go.

@yannham
Copy link
Member

yannham commented Sep 24, 2024

I see. I wonder if it would nice to do something like the Nix CLI - write the CLI documentation as markdown that can be exported as an mdbook but also as manpages, with a single source of truth. In the meantime I guess a new section of the manual for the CLI should do - note that adding a manual section might require some trivial adaptation of nickel-lang.org as well, at least after the next release.

@jneem
Copy link
Member Author

jneem commented Sep 26, 2024

@yannham I forgot to bring this up in the weekly, but can we merge this and do the CLI docs as a follow-up?

@jneem jneem requested a review from yannham September 26, 2024 13:44
@yannham yannham added this pull request to the merge queue Sep 26, 2024
Merged via the queue into master with commit eee41e3 Sep 26, 2024
7 checks passed
@yannham yannham deleted the doctests branch September 26, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants