Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: format #133

Merged
merged 61 commits into from
Oct 15, 2024
Merged

feat: format #133

merged 61 commits into from
Oct 15, 2024

Conversation

a-frantz
Copy link
Member

@a-frantz a-frantz commented Jul 17, 2024

Describe the problem or feature in addition to a link to the issues.

New crate, wdl-format! Handles all things formatting!

In the interest of keeping this PR from sprawling further, there are some known issues/TODOs that will be handled in follow-up PRs.

  1. Right now, every Expr is just copied verbatim instead of being actually formatted. Actually formatted them in all their complexity will involve a lot of new code.
  2. Placeholders in literal strings are similarly copied verbatim, but will be formatted in a follow-up
  3. Whitespace is always ignored in the current PR. A future PR will implement respecting certain line-breaks and blank lines.
  4. Sorting input sections

Before submitting this PR, please make sure:

  • You have added a few sentences describing the PR here.
  • You have added yourself or the appropriate individual as the assignee.
  • You have added at least one relevant code reviewer to the PR.
  • Your code builds clean without any errors or warnings.
  • You have added tests (when appropriate).
  • You have updated the README or other documentation to account for these
    changes (when appropriate).
  • You have added an entry to the relevant CHANGELOG.md (see
    "keep a changelog" for more information).
  • Your commit messages follow the conventional commit style.

@a-frantz a-frantz mentioned this pull request Jul 17, 2024
8 tasks
@a-frantz a-frantz self-assigned this Jul 24, 2024
@a-frantz a-frantz marked this pull request as ready for review July 25, 2024 12:57
README.md Show resolved Hide resolved
wdl-format/src/format.rs Outdated Show resolved Hide resolved
wdl-format/src/format.rs Outdated Show resolved Hide resolved
wdl-format/src/format.rs Outdated Show resolved Hide resolved
wdl-format/src/format.rs Outdated Show resolved Hide resolved
wdl-format/src/format.rs Outdated Show resolved Hide resolved
wdl-format/src/format.rs Outdated Show resolved Hide resolved
wdl-format/src/format.rs Outdated Show resolved Hide resolved
wdl-format/src/format.rs Outdated Show resolved Hide resolved
wdl-format/src/format.rs Outdated Show resolved Hide resolved
wdl-format/src/format/comments.rs Outdated Show resolved Hide resolved
wdl-format/src/format/comments.rs Outdated Show resolved Hide resolved
//! are considered.
//!
//! A preceding comment is a comment that appears on a line before an element.
//! There may be any number of preceding comments and they may be separated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I'm not sure that comments separated by blank lines should be considered a single unit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example something like this: https://github.com/stjude/seaseq/blob/3b6b503292a4ebb2c8e9053d10e04ffad82b3bad/seaseq-case.wdl#L400-L414. The section header isn't indented, but if it were, I'm not sure we should delete the blank lines and attach everything to File sample_bam.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great example. You're right that the current behavior is wrong for that case. But I'm not sure what the right approach is. I need to think on this case for a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best approach I can think of for this case would be to leave blank lines as they appear. I don't see a way around "attaching" comments to non-trivia. It's required unless we want to visit each and every comment to ensure it's being included somewhere. So I think no-matter-what, all of those seaseq comments should be "attached" to File sample_bam. But we can respect the blank lines. How does that sound?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a good compromise. No issue with attaching them to the element, as long as we leave blank lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that seaseq file to the test cases. I think it looks ok. It's not perfect, but I think you've suggested a big improvement here.

}

let comments_found = !preceding_comments.is_empty();
if comments_found && would_be_interrupting && !state.interrupted() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was confused by this for a moment, but this is an example of an interrupting, preceding comment.

version
# a comment
1.1

wdl-format/src/format/metadata.rs Outdated Show resolved Hide resolved
wdl-format/src/format/metadata.rs Outdated Show resolved Hide resolved
wdl-format/src/format/task.rs Outdated Show resolved Hide resolved
Comment on lines 191 to 219
fn format(&self, buffer: &mut String, state: &mut FormatState) -> Result<()> {
format_preceding_comments(&self.syntax_element(), buffer, state, false)?;

let name = self.name();
state.indent(buffer)?;
name.format(buffer, state)?;
format_inline_comment(&name.syntax_element(), buffer, state, true)?;

let colon = self
.syntax()
.children_with_tokens()
.find(|c| c.kind() == SyntaxKind::Colon)
.expect("Requirements item should have a colon");
format_preceding_comments(&colon, buffer, state, true)?;
if state.interrupted() {
state.reset_interrupted();
state.indent(buffer)?;
}
buffer.push(':');
format_inline_comment(&colon, buffer, state, true)?;

let expr = self.expr();
format_preceding_comments(&expr.syntax_element(), buffer, state, true)?;
state.space_or_indent(buffer)?;
expr.format(buffer, state)?;
format_inline_comment(&self.syntax_element(), buffer, state, false)?;

Ok(())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this logic be abstracted out? It would be the same for RequirementsItem/RuntimeItem/HintsItem right?

Copy link
Member Author

@a-frantz a-frantz Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterhuene how would you recommend going about this? I believe @adthrasher is correct, RequirementsItem, RuntimeItem, and HintsItem should receive identical formatting, but they are defined separately. Should there be a common representation for these 3 items?

Copy link
Collaborator

@peterhuene peterhuene Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to do that would be to define a trait (that is a supertrait of AstNode) that returns what's needed from self here, such as name() and expr().

With that, you'd implement the trait for RequirementsItem, RuntimeItem, and HintsItem to call the appropriate name and expr methods.

Then replace the bodies here with a generic helper that takes T constrained by the new trait.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok if we leave this for another PR? I'd like to get this merged in soon. @peterhuene that's good advice, I'll work on it in another branch. Related, I think I can do the same in some other places as well. Mainly thinking of meta and parameter meta sections which are dang-near identical. Same maybe with input and output? I'll have to play around with it a bit.

wdl-format/src/format/workflow.rs Outdated Show resolved Hide resolved
wdl-format/src/format/workflow.rs Outdated Show resolved Hide resolved
wdl-format/src/format/workflow.rs Outdated Show resolved Hide resolved
wdl-format/CHANGELOG.md Outdated Show resolved Hide resolved
wdl-format/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@peterhuene peterhuene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great! I'm excited to get this feature out there.

Just a few feedback comments to go over.

wdl-format/src/import.rs Outdated Show resolved Hide resolved
wdl-format/src/import.rs Outdated Show resolved Hide resolved
wdl-format/src/lib.rs Show resolved Hide resolved
wdl-format/src/import.rs Outdated Show resolved Hide resolved
wdl-format/src/lib.rs Outdated Show resolved Hide resolved
wdl-format/src/lib.rs Outdated Show resolved Hide resolved
wdl-format/src/lib.rs Outdated Show resolved Hide resolved
wdl-format/src/v1.rs Outdated Show resolved Hide resolved
wdl-format/src/task.rs Outdated Show resolved Hide resolved
Comment on lines 46 to 48
/// This function assumes we are _not_ in the preamble, and thus any
/// double-pound-sign comments should be converted to single-pound-sign
/// comments.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is appropriate, considering it only checks for ## comments (note the space) which should be pretty error-proof. Double-pound comments are exclusively for the preamble, so this feels safer than the reverse operation (upgrading single-pound-comments in the preamble to double-pound).

}

/// Format a WDL document.
pub fn format_document(code: &str) -> Result<String, Vec<Diagnostic>> {
Copy link
Member

@claymcleod claymcleod Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this entrypoint is a little bit too rigid:

  • First, this needs to be recursive all of the way down. What I mean by that is that Document should implement Formattable, and you should be able to call Document::format as your entrypoint (or format on any other SyntaxKind if you just want to format a subset of the file).
  • I think it will be up to the user as to how they'd like to collect the output. Here, you've forced them to allocate a String on each format, but they may not want to do that.
  • You've also used the AST nodes underneath document. I think this should only require working at the SyntaxKind level to avoid all of the casting everywhere (and, logically, this is probably the place that most closely matches what is "formattable").
  • The above also allows you to not have to do the document parsing yourself, which is good; we don't really care about how the Document was created here (it could even be hand crafted programmatically using some future API).

Here's a playground that shows what I had in mind: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c976dc581a0ca2622ccf40fa5398fa9a. Note that, in the example, I allocate a String to collect the results as you did, but the approach of Document::format didn't force me to do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • First, this needs to be recursive all of the way down. What I mean by that is that Document should implement Formattable, and you should be able to call Document::format as your entrypoint (or format on any other SyntaxKind if you just want to format a subset of the file).

That's done, but is complicated by the trait of Formattable that the format() signature takes a state: State. This isn't very ergonomic and requires a let mut state = State::default() statement before any call format(). The State requirement is so that an element knows how far it needs to be indented, but that really only applies to elements which can be nested (anything in a metadata section or an Expr, and WorkflowItem).

I'm wondering if there should be two different Formattable traits, one which needs a state (NestedFormattable) and one which doesn't (FlatFormattable?). Because many elements can only appear at certain indent levels. There's top level DocumentItem which must be indent_level = 0 and then TaskItem and WorkflowItem that must be indent_level = 1 etc.

  • I think it will be up to the user as to how they'd like to collect the output. Here, you've forced them to allocate a String on each format, but they may not want to do that.

👍

  • You've also used the AST nodes underneath document. I think this should only require working at the SyntaxKind level to avoid all of the casting everywhere (and, logically, this is probably the place that most closely matches what is "formattable").

I'm not sure I'm understanding you on this point. Are you suggesting that format only operate on the CST level, and ignores the AST level? I think this would actually greatly complicate things, particularly when we want to move things around and group them. Operating at the AST level lets me cherrypick elements in the order in which we want them in the output. Roughly only visiting things once. Operating at the CST level is going to require traversing up and down the CST multiple times.

  • The above also allows you to not have to do the document parsing yourself, which is good; we don't really care about how the Document was created here (it could even be hand crafted programmatically using some future API).

I believe this is already handled by the existing code? Most every AST node (with some exceptions) has implemented Formattable, allowing them to be called "individually" or outside the context of a Document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought: currently every format() call assumes the element is well-formed and has passed both CST and AST validation. Of course eventually we want format() to be infallible so it can still format problematic documents (that are perhaps being edited in VScode). But we aren't there in this PR. So that's a known limitation of the current implementation.

wdl-format/src/lib.rs Outdated Show resolved Hide resolved
let mut preamble_comments = Vec::new();
let mut lint_directives = Vec::new();
let comment_buffer = &mut String::new();
for sibling in self.syntax().siblings_with_tokens(Direction::Prev).skip(1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole endeavor seems like something that you're going to want to generalize out to be calculated every time you go to format a node. Just my two centsr: I don't think it has to be pulled out for the PR to move forward unless you do it many more times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I get the distinction about preamble comments being specific to this Version—I'm just talking about gathering up the comments that directly preceed a particular area (and the whitespace in between).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing these comments were made before you reached comment.rs and saw format_preceding_comments? I did consider refactoring this chunk into its own format_preamble_comments func, but I couldn't think of a scenario where that would be called outside the context of VersionStatement::format, so I opted to keep it all in there.

let mut preamble_comments = Vec::new();
let mut lint_directives = Vec::new();
let comment_buffer = &mut String::new();
for sibling in self.syntax().siblings_with_tokens(Direction::Prev).skip(1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with the .skip(1)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rowan includes self as the first sibling, so I'm skipping over self. Otherwise I'd need another arm in the match for VersionStatementNode that just ignores it.

wdl-format/src/lib.rs Outdated Show resolved Hide resolved
wdl-format/src/lib.rs Outdated Show resolved Hide resolved
wdl-format/src/comments.rs Outdated Show resolved Hide resolved
impl FormatCommand {
async fn exec(self) -> Result<()> {
let source = read_source(&self.path)?;
match format_document(&source) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove format_document and call Document::format (with any boilerplate
needed to get to that point in this function).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale here?

It seems to me that format_document is a useful utility function. Especially because the current Document::format is fallible if it's assumptions about the Document passing CST+AST validation are not true. I think it's good to have that validation boilerplate inside a util function rather than "hoping" whoever calls Document::format has performed the proper validation.

wdl-format/src/comments.rs Outdated Show resolved Hide resolved
)?;

let open_brace = first_child_of_kind(self.syntax(), SyntaxKind::OpenBrace);
format_preceding_comments(&open_brace, writer, formatter, true)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, I've seen this pattern of formatting the preceding comment,
formatting the element, and then formatting the inline comment enough that I'm
wondering if this should just be abstracted out to its own method. Whatever is
supposed to be sandwiched between the two could be provided in a closure.

Copy link
Member Author

@a-frantz a-frantz Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I partially implemented this (well, fully implemented but only applied to metadata.rs): 5660c1e

Did I do this as you had envisioned? Want to check before carrying it through to the rest of the PR, as I'm not sure this is what you were asking for.

One thing to note is that it can only be used where format_preceding_comments and format_inline_comments are called with the same element. This pattern you've noticed does pop up frequently, but sometimes the elements in each of those calls is different (because we have to go up or down a level in the CST), in which case we'd need a different function.


formatter.increment_indent();

let mut commas = self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this whole comma checking business? Aren't we just going to
output a comma to every line regardless of what the source has?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required by how the comment checking works. Every comment in the file is "owned" by a SyntaxElement. A metadata Comma may own one or more comments, requiring us to iterate through each comma and check it for comments. We can't just ignore commas as trivia and insert them ourselves, because that would cause us to "lose" comments in the output.

Copy link
Member

@claymcleod claymcleod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my latest thoughts (though not complete).

}
}

// TODO: sort inputs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we knock this out in this PR? Or is that a future one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it should wait till we have some configuration built-in. Don't have any data to back this claim up, but I think input sorting will be considered one of our most controversial features, and we probably want to provide an easy way to opt-out.

Copy link
Member

@claymcleod claymcleod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I did not get to review every single detail, but overall I think it looks good and we can "fall forward" so to speak if Peter thinks it looks good.

The only outstanding item that I see is changing placeholder options back to singular—I can do that, but it will take me to the end of today to get around to it. If someone else wants to do it beforehand, feel free to do so.

Copy link
Collaborator

@peterhuene peterhuene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed through wdl-analysis and wdl-ast.

Stopping there to provide feedback on those two crates; I'll resume with another review on the wdl-format changes.

wdl-ast/src/lib.rs Outdated Show resolved Hide resolved
wdl-ast/src/registry.rs Outdated Show resolved Hide resolved
wdl-ast/src/v1/expr.rs Outdated Show resolved Hide resolved
wdl-ast/src/v1/decls.rs Outdated Show resolved Hide resolved
wdl-ast/src/v1/decls.rs Outdated Show resolved Hide resolved
wdl-ast/src/v1/workflow.rs Outdated Show resolved Hide resolved
wdl-ast/src/v1/workflow.rs Outdated Show resolved Hide resolved
wdl-ast/src/v1/workflow.rs Outdated Show resolved Hide resolved
wdl-ast/src/v1/task.rs Outdated Show resolved Hide resolved
wdl-ast/src/v1/task.rs Outdated Show resolved Hide resolved
@claymcleod
Copy link
Member

claymcleod commented Oct 14, 2024

FWIW, I'm good with reverting back to a single Hints construct—I recall having changed it for a particular reason, but that exact reason escapes me now. So, if we can change it back and nothing complains, then I'm fine with it.

Copy link
Collaborator

@peterhuene peterhuene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I've completed my review with some additional feedback.

Overall, this looks fantastic!

wdl-ast/src/lib.rs Outdated Show resolved Hide resolved
wdl-format/src/config/indent.rs Outdated Show resolved Hide resolved
wdl-format/src/token.rs Outdated Show resolved Hide resolved
wdl-format/src/token.rs Outdated Show resolved Hide resolved
wdl-format/src/token/post.rs Outdated Show resolved Hide resolved
wdl-grammar/src/tree.rs Outdated Show resolved Hide resolved
wdl-grammar/src/tree.rs Outdated Show resolved Hide resolved
wdl-grammar/src/tree.rs Outdated Show resolved Hide resolved
wdl-grammar/src/tree.rs Outdated Show resolved Hide resolved
wdl-grammar/src/tree.rs Outdated Show resolved Hide resolved
wdl-grammar/src/tree.rs Outdated Show resolved Hide resolved
@a-frantz a-frantz mentioned this pull request Oct 15, 2024
Copy link
Collaborator

@peterhuene peterhuene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic! I'm looking forward to getting this into the Sprocket extension eventually tool.

@a-frantz a-frantz merged commit afb540a into main Oct 15, 2024
9 checks passed
@a-frantz a-frantz deleted the feat/format branch October 15, 2024 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants