Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: External code snippets and example code #81

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

flaki
Copy link
Contributor

@flaki flaki commented Mar 16, 2022

A first stab at #29

We want to move all example code out of the docs pages and into separate code snippets so we could use a central repository for these (for testing, benchmarking and easier maintenance). The first step is making it possible to pull in external code snippets which is already working here with files included in the /website/code directory (these can be imported using the @code alias).

The above is achieved by using Webpack's "Source Asset modules" feature, and the next steps would be further reducing boilerplate by allowing MultiLanguageCodeBlock-s to load their contents from external files as opposed to being provided inline. This is going to be tricky because of the sync/async import boundary so still figuring out our options.

Resources related to the initial commit in this PR:

After initial build this PR could be previewed at: https://suborbital.github.io/docs/flaki/29-code-examples/

@flaki
Copy link
Contributor Author

flaki commented Mar 16, 2022

An example of an externally loaded snippet can be seen at: https://suborbital.github.io/docs/flaki/29-code-examples/atmo/runnable-api/http-client#example-runnable

@LauraLangdon LauraLangdon added the documentation Anything related to documentation (e.g. doc bugs or similar), *not* documenting new features label Mar 17, 2022
Copy link
Contributor

@hola-soy-milk hola-soy-milk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic, nice one!

Comment on lines +9 to +38
impl Runnable for Fetch {
fn run(&self, input: Vec<u8>) -> Result<Vec<u8>, RunErr> {
let url = util::to_string(input);

let _ = match http::get(url.as_str(), None) {
Ok(res) => res,
Err(e) => return Err(RunErr::new(1, e.message.as_str()))
};

// test sending a POST request with headers and a body
let mut headers = BTreeMap::new();
headers.insert("Content-Type", "application/json");
headers.insert("X-ATMO-TEST", "testvalgoeshere");

let body = String::from("{\"message\": \"testing the echo!\"}").as_bytes().to_vec();

match http::post("https://postman-echo.com/post", Some(body), Some(headers)) {
Ok(res) => {
log::info(util::to_string(res.clone()).as_str());
Ok(res)
},
Err(e) => Err(RunErr::new(1, e.message.as_str()))
}
}
}


// initialize the runner, do not edit below //
static RUNNABLE: &Fetch = &Fetch{};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very exciting!

I would love to try and use it, but am running into bumps. Would it be helpful to have a handler and curl request example, too?

Or maybe make this a part of a greater sample project?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I just used this as an example, haven't tried if it actually worked -- just pulled it out of the rwasm-testdata 🙈

@flaki
Copy link
Contributor Author

flaki commented Mar 22, 2022

So I have been digging around this house of cards awhile, I'm going to write up the results of that so that people with a better understanding/more experience in the infrastructure can chime in with regards to what would be the best way forward, cc @ospencer?

The goal

Is to be able to create tabbed <MultiLanguageCodeBlock>s from external files by pulling in the contents of an e.g. Rust or JS code snippet.

The markdown/mdx should look like something like this:

<MultiLanguageCodeBlock>

 ` ` `rust
// this is an inline Rust code block
 ` ` `

<CodeBlock language="go">
// this is an inline Go codeblock
</CodeBlock>

<CodeBlock src="@code/examples/myexample/ts/lib.ts">
// this is an external typescript runnable example, loaded from the local "@code" alias folder
// the "language" prop can be elided if its value may be inferred automatically from the file extension
</CodeBlock>

</MultiLanguageCodeBlock>

Users can mix & match inline definitions of code blocks and external ones within the same block, and the language tab will be inferred from the snippet type or file extension automatically.

It would also be really useful if we could simply reference external files, e.g. straight out of a separate repo in the URL. These resources would be fetched during the build and included in the document auto-magically. We could also provide a suitable alias for the resources to further reduce boilerplate.

Loading external code snippets straight from GitHub would also go well with the current effort of moving example runnables into a centralized repo, and referencing the files from Git would allow us to later reference explicit file versions of the files, that correspond to the given version of the docs, which we will be needing anyway once we start versioning the docs.

Preceding work

The initial <MultiLanguageCodeBlock> component made it possible to reduce the boilerplate of having to manually define the tabbed structure for every code block, and centralize language information for languages supported by Reactr.

All this initial work still relied solely on inline code blocks, but supported both markdown fenced code blocks and explicitly declared <CodeBlock> components.

Current approach

This pull request adds support for pulling in external files using static import statements in the individual markdown files and using their contents in <CodeBlock>-s. As described above this approach uses Webpack 5's source assets as opposed to the raw-loader method proposed by the Docusaurus docs.

The current approach also lives in its own plugin which will make it possible to move all the earlier bits into the plugin itself (and make them more reusable by other users, eventually moving the plugin into their own, separately installable module, and potentially upstreaming it in full or certain modifications to e.g. the <CodeBlock> component.

This approach described above is already useful and a big step forward, but according to my research, it has several limitations that make it unlikely to be able to support further improvements. I willl describe these findings in detail below.

Next steps

The next step to further enhance the current approach and reduce boilerplate has to balance two requirements:

  1. Further reduce boilerplate by making it possible to declare the file path on the <CodeBlock> component itself (or a wrapper component), and drop the requirement for having to explicitly import the file in an import statement, e.g.:
    Instead of:

    import myCode from `@code/my/code.rs`
    
    <CodeBlock>{myCode}</CodeBlock>

    Allow usage to be just:

    <[External]CodeBlock src='@code/my/code.rs' />
  2. Make it possible to load files from an arbitrary source. This would make it possible to store the code snippets in an entirely separate repository without having e.g. explicitly pull in that repo (such as as a submodule), and further expand the flexibility of the system.
    Such as allowing for usage like:

    <[External]CodeBlock src="https://raw.githubusercontent.com/suborbital/examples/my/code.rs" />

The solution to 1. may have an effect of the solvability of 2. so it's best to consider these requirements together. There are multiple avenues of solving for these requirements but some of them seem to be leading to dead ends. I'm going to talk about my research into these avenues below.

Solving dynamic imports

To be able to load arbitrary files we can imagine multiple approaches, but I'm going to argue that we wouldn't want to do client-side loading of these files, and especially not from external URLs. Because of how Docusaurus is built, escape hatches are provided to execute client-side code that transform the page beyond what is possible during server-side rendering. My view is that resorting to something like this would have to be a last resort, especially given that it limits really interesting future possibilities around static generation.

With that out of the way there are various approaches to making this happen:

  1. Dynamic import():
    We load the URL referenced by the <[External]CodeBlock> component src prop as a Webpack asset within using dynamic import()s.
  2. Docusaurus plugin APIs:
    We load the content as external content in Docusaurus' dedicated lifecycle APIs, components reference this content exposed by the lifecycle APIs during build
  3. Preprocessing step:
    We preprocess the docs and preload/cache the URLs referenced by the <[External]CodeBlock> src props, we manually insert (prepend) the relevant import statements into the files and leave it to Webpack etc. to take care of loading/bundling them (the rest is the same as the current approach)

Any approach that involves dynamic imports not only suffers from the added complexity of Webpack's dynamic imports, but will also require the component itself to be async.

While googling around I uncovered this thread that mentions some of the issues inherent to webpack and dynamic imports. The linked comment in particular suggests building out a map of imports, to create a pseudo-dynamic import where at least each path is known ahead of time. With this approach I was able to sort of acheive success, but only by defining the map statically. The closest I've got to dynamic imports is a large hand rolled map of dynamic imports. --
jemjam

It seems like Docusaurus does not even support rendering async components, and expects all data to be available already during rendering.

These hurdles pretty much exclude Option 1.

It is possible to do some processing in a plugin and expose that data as lifecycle hooks. It does not seem like that this API can be efficiently used to expose data to the docs and while in theory I think setGlobalData() can be twisted into something like this it would be very bad for page load times so I don't think Option 2 would be feasible here, either.

Option 3 seems like the most feasible solution right now, and would probably be most easily achieved using a custom MDX plugin. Keep in mind that Docusaurus currently only supports the earlier MDX version, not MDX v2, the support for which is slated for Docusaurus v3. Also notable that Docusaurus is currently migrating to ES modules, so all plugins should be written in require() module syntax still (see note) and ported to ESM once that upgrade takes place.

Implementation

The current plan to implement this feature is a completely new custom remark transformer, not unlike to the one that implements Docusaurus Admonitions today.

The plugin would (pre)process every MDX document to:

  • Find where external CodeBlock src loads are present
  • If these are not local file references (e.g. HTTP urls) it would cache the contents locally and rewrite the URL
  • Prepends the required import statement to the top of the file that would later load the contents via Webpack
  • Replaces the code block with the <CodeBlock> element parametrized correctly (e.g. does detection of the highlighting type when necessary) and with the import-ed variable passing in the file source content
  • It may even further expand the markdown syntax with additional medatada and processing

It may be possible to completely replace the current <MultiLanguageCodeBlock> implementation with a sufficiently advanced remark transformer (whether it would be worth it is another question). We should also ensure that we do some caching on the external files, particularly for making sure that local builds (e.g. while developing) are not impacted by the extra fetches. Remark transformers are async functions so the fetches and caching can be performed as part of the initial discovery/rewriting of the markdown content. There are existing plugins that can handle caching external content, but they seem to require a preconfigured list of content to be preloaded, so they are not really useful for our usecase but may inform implementation.

@flaki
Copy link
Contributor Author

flaki commented Mar 23, 2022

Earlier @javorszky had a couple questions regarding mostly the Preceding work section. While a lot of this work still evolving, I realize the current state wasn't really documented anywhere outside of PR descriptions so I'm going to be adding some docs in #92

@flaki
Copy link
Contributor Author

flaki commented Mar 30, 2022

The latest commit brings a prototype of the external code snippet caching.
It can be previewed using the test.md document as deployed here:
https://suborbital.github.io/docs/flaki/29-code-examples/test/

The missing ES6 support mentioned above is causing a lot of headaches, and mandates that we use outdated modules in the transformer, as all newer versions are ESM-only. We might want to consider updating to ESM/MDX v2 manually (but right now I would not recommend it as there is frequent version bumps in Docusaurus v2 due to the beta so we that would involve a lot of patching on updating).

Another thing is that a the boundaries of a custom Suborbital CodeBlockExtensions plugin is already taking shape and initially (and probably before the landing of this PR) I will move all our components into a separate plugin, but still within this repo and later we can evaluate to move it out into its own repo.

Another think I'm currently also mulling over is how to handle the imported snippets. There are two possibilities, and each come with its own pros and cons:

  1. use asset imports:
    • this means pre-caching the files we pull from URLs (this already happens)
    • add the cached file paths as asset imports in the top of the md file
    • use a react {variableInterpolation} within a <CodeBlock> of fenced code block to pull in the file content
  2. use direct embeds:
    • this one only needs ephemerally loading the file contents
    • put the contents directly into the currently md file's AST
    • that's it

Currently we use 2. and it seems fine but I am not completely sure whether asset imports wouldn't have an advantage still. Asset imports definitely need caching, which is good from a local-development-rebuild perspective, but bad from a "now we may need a way to cache-bust". Apropos local development, we need a story there without making users to manually change paths in the files. Thinking of allowing changes to the .cache directory -- again, asset imports may make this easier. Fwiw, we can probably make asset imports the default for local development, and direct embeds for docusaurus build.

Another note on versioning: external URLs should point to a very specific repo, and the urls should make sure they link to a tagged version of the file. This ensures e.g. reproducible builds and paves the way to #23

Note that the above will probably require us to allow people to embed this version in URLs as a placeholder and do a replacement on this during pre-processing. Since interpolations are not expanded (yet) at pre-processing time this will probably need a custom mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Anything related to documentation (e.g. doc bugs or similar), *not* documenting new features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants