Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Losing code cell language from MyST markdown inputs with --pipe black #1267

Open
davidorme opened this issue Aug 7, 2024 · 8 comments
Open

Comments

@davidorme
Copy link

We're using jupytext --pipe black to automatically format Python code in MyST markdown notebooks (as part of pre-commit, but I don't think that's relevant here). The problem we're having is that (IIUC) the round trip through the percent format to pass it to black strips out the language specification on the code-cell directives. So given:

---
jupytext:
  formats: md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
kernelspec:
  display_name: Python 3 (ipykernel)
  language: python
  name: python3
---

# Quantum yield efficiency of photosynthesis

```{code-cell} python
# I'm some code
x = 1
```

Running that through jupytext --pipe black results in:

---
jupytext:
  formats: md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
kernelspec:
  display_name: Python 3 (ipykernel)
  language: python
  name: python3
---

# Quantum yield efficiency of photosynthesis

```{code-cell}
# I'm some code
x = 1
```

That does affect other tools that rely on the language specification for syntax highlighting of code cells - we're using VSCode. I wondered if this might be tackled by setting cell_metadata_filter = "all" but I think that language specification is not part of the cell metadata? I don't think that any of the other settings in config.py tackle this?

@mwouts
Copy link
Owner

mwouts commented Aug 7, 2024

Hi @davidorme , thank you for reporting this! We would need to make sure that this language information is preserved when the notebook is converted to a Jupyter notebook (the py:percent format will then, in turn, preserve the cell metadata).

Let me check with @chrisjsewell who knows that part better than I do, what happens to that language specification when the conversion occurs.

@chrisjsewell
Copy link
Contributor

Will put it on the todo list to have a look 😅 but feel free to ping me again if I don't reply

@davidorme
Copy link
Author

@chrisjsewell Sorry to ping you on this.

I've got jupyter-lab and jupytext --pipe black playing ping-pong with each other. When I'm writing docs in jupyter as Myst Markdown files, those language tags are automatically added when the file saves (I'm assuming that this is something that jupytext does?). But then when I commit the file, the pre-commit setup using jupytext --pipe black throws them all out again 😄.

It's not a huge deal - we're just only committing files stripped of code-cell language information - but it would be good to fix it.

@mwouts
Copy link
Owner

mwouts commented Oct 5, 2024

Oh actually I realize that this is an issue that has been going on for a very long time! See #759, #778, #789.

What happens is that the language specification on the code cell comes from the language_info notebook metadata.

That information is in the notebook when you save it from Jupyter, but it is lost when you read the MyST file.

I see one immediate workaround: add the language_info metadata to your MyST notebooks by adding this to your jupytext.toml config:

notebook_metadata_filter="language_info"

On the longer term, I see two possible fixes:

  1. Apply the metadata filter before passing the notebook to MyST (e.g. if Jupytext is not configured to preserve the language info, then no cell would get the ipython3 lexer)
  2. Or, reconstruct the language_info within Jupytext, e.g. figure out how Jupyter does that, and do the same

My preference goes to 1 but I am curious to hear yours @chrisjsewell @davidorme @parmentelat

@davidorme
Copy link
Author

I may have got this wrong but I have pyproject.toml with:

[tool.jupytext]
# Stop jupytext from removing mystnb and other settings in MyST Notebook YAML headers
notebook_metadata_filter = """
settings,
mystnb,
language_info
"""

And then a markdown file with YAML headers:

---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.16.4
kernelspec:
  display_name: Python 3 (ipykernel)
  language: python
  name: python3
---

If I run jupytext --pipe black file.md on that then the output reports:

[jupytext] Reading docs/source/users/demography/canopy.md in format md
[jupytext] Executing black -
All done! ✨ 🍰 ✨
1 file left unchanged.
[jupytext] Writing docs/source/users/demography/canopy.md in format md:myst

But all of the code-cell language specifications have been stripped.

@mwouts
Copy link
Owner

mwouts commented Oct 5, 2024

I see! You still don't have a language_info metadata in your MyST file, that's why the pygment lexers go away. To add that metadata to your MyST file, you will have to open it in Jupyter, and save it using the new config file.

@davidorme
Copy link
Author

davidorme commented Oct 7, 2024

Alright. That took longer than expected:

  • You need to commit your updates to the config file (pyproject.toml in may case) before the actual notebooks if testing this workaround when you have pre-commit. Because 🤦 pre-commit stashes the changes to run the validation and so jupytext uses the old config. I mean, it's sorta obvious but I stumbled.
  • You have to start jupyter in the same directory as the config file (or presumably point jupyter to it correctly). I'm in the habit of running jupyter in my docs directory, so of course it wasn't picking up the config.

But. With the config above committed and jupyter started in the project root so it actually reads that config, opening and saving a notebook in jupyter does add the following to the notebook YAML:

language_info:
  codemirror_mode:
    name: ipython
    version: 3
  file_extension: .py
  mimetype: text/x-python
  name: python
  nbconvert_exporter: python
  pygments_lexer: ipython3
  version: 3.11.9

Saving in jupyter also restores the language info to the code cells and now piping the notebook through black does not strip the language info. So the workaround works.

  1. Apply the metadata filter before passing the notebook to MyST (e.g. if Jupytext is not configured to preserve the language info, then no cell would get the ipython3 lexer)
  2. Or, reconstruct the language_info within Jupytext, e.g. figure out how Jupyter does that, and do the same

I don't understand the boundaries between the different packages at all well, but if I understand correctly:

  • Jupyter uses the language_info metadata to record the lexer information and assign code cell level language information.
  • Jupytext uses metadata filtering to simplify the Jupyter metadata down to a minimal set and (as of Request: minified header #105?) that doesn't include language_info, because the language details are duplicated in kernel_spec.
  • But the specific handling of the code-cell language information relies on the presence language_info notebook metadata.

I'm not sure what (1) adds beyond the workaround - does it mean that jupyter stops adding the code-cell lexer info so the notebook content is more stable? It seems like this could just be a documentation update to say that the default behaviour is not to retain lexer information in notebooks, but that adding the language_info back in to the retained metadata will allow lexer information to be retained?

@davidorme
Copy link
Author

I think I've run into a workflow that - if I understand correctly - argues for option (2). This usage might be out of scope for jupytext but it feels like a reasonably natural thing to want to do.

The workflow is in creating Myst markdown notebooks for rendering using sphinx. Users can of course create notebook content in juypter but one of the advantages (joys?) of the Myst markdown format is that you don't have to because it is human readable. So:

  1. If I'm working in a code editor, I can create a new markdown file that I want to be a Myst notebook.
touch simple.md
  1. I can then set up the header YAML.
jupytext --set-format md:myst --set-kernel python3   simple.md 
  1. I've now got a file that I can use with myst-nb in sphinx to generate content.

  2. But - if I've got this right - at present, the language_info metadata will only be inserted if I open the file in jupyter and then save it, having set the notebook metadata filter to preserve the language_info metadata.

  3. So in this use case, my simple.md file will only be able to preserve the language on code-cell blocks if I open and save it through jupyter.

That feels clunky. I get that jupytext is intended to primarily act as an interface with jupyter but with this workflow jupyter isn't really needed. If I understand right, your proposal (2) would allow jupytext to set the language_info in the same way that it sets format and kernel?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants