Language injections for controlling syntax highlighting in string literals #3952

EmilStenstrom · 2022-12-24T21:26:58Z

EmilStenstrom
Dec 24, 2022

⚠️ Indicate you want this by adding a 👍 by clicking the emoji icon at the end of this issue

I think it would be very useful to have a way to have syntax highlighting for other languages inside of a python file. Much like <style> tags with CSS inside a html file gets the correct syntax.

Example:

class Calendar(component.Component):
   template_string = '<span class="calendar"></span>'
   css_string = '.calendar { background: pink }'
   js_string = 'document.getElementsByClassName("calendar)[0].onclick = function() { alert("click!") }'

Since there's no highlighting of the strings, it's very easy to miss the missing quote in js_string.

WAYS OF SOLVING THIS

A magical comment
PyCharm has this built in, so you can either manually mark a block as another language, or use the magical "# language=html" comment to mark the next string as a foreign language.

class Calendar(component.Component):
   # language=html
   template_string = '<span class="calendar"></span>'
   # language=css
   css_string = '.calendar { background: pink }'
   # language=js
   js_string = 'document.getElementsByClassName("calendar)[0].onclick = function() { alert("click!") }'

I don't quite like the syntax of this, but it solves the problem, and would make two editors in sync around this.

Tag-strings

Tag-strings is an idea for a new python language feature that allows you to write:

class Calendar(component.Component):
   template_string = html'<span class="calendar"></span>'
   css_string = css'.calendar { background: pink }'
   js_string = js'document.getElementsByClassName("calendar)[0].onclick = function() { alert("click!") }'

This might sound sound crazy, but there is some working python fork by Jim Baker and Guido that explores this. The rationale for this feature is not explicitly syntax highlighting, but this is such a nice API, that I think it would fit very well with syntax highlighting too.

I opened a thread on this on python-ideas, and got feedback in the form of "this is something that the editors should do, not the python language".

Dummy methods

Another proposal (in the python-ideas thread above) is being able to set some functions in VSCode to be special, so that strings that are inside those functions gets highlighted. In this case, I've set up html(), css(), and js() to be highlighted in VSCode.

def html(s): pass
def cssl(s): pass
def js(s): pass

class Calendar(component.Component):
   template_string = html('<span class="calendar"></span>')
   css_string = css('.calendar { background: pink }')
   js_string = js('document.getElementsByClassName("calendar)[0].onclick = function() { alert("click!") }')

New types
Another way would be to use the typing system, and use the new type to highlight.

from typing import NewType
html = NewType("html", str)
css = NewType("css", str)
js = NewType("js", str)

class Calendar(component.Component):
   template_string = html('<span class="calendar"></span>')
   css_string = css('.calendar { background: pink }')
   js_string = js('document.getElementsByClassName("calendar)[0].onclick = function() { alert("click!") }')```

Use subclass of str

Which has the nice side effect that you could also do something with the strings.

class html(str): pass
class css(str): pass
class js(str): pass

class Calendar(component.Component):
   template_string = html('<span class="calendar"></span>')
   css_string = css('.calendar { background: pink }')
   js_string = js('document.getElementsByClassName("calendar)[0].onclick = function() { alert("click!") }')```

Use Annotated type

class Calendar(component.Component):
   template_string: Annotated[str, 'html'] = '<span class="calendar"></span>'
   css_string: Annotated[str, 'css'] = '.calendar { background: pink }'
   js_string: Annotated[str, 'js'] = 'document.getElementsByClassName("calendar)[0].onclick = function() { alert("click!") }'

YOUR THOUGHTS

This is where I'm hoping for some feedback from you. Is this something you have wanted too? Do you think it's a useful addition to the python language extension? Is this at all doable in VSCode, or does there need to be upstream changes?

brettcannon · 2023-01-03T17:53:01Z

brettcannon
Jan 3, 2023

This isn't specific to Python but to VS Code in general, so transferring.

0 replies

EmilStenstrom · 2023-01-04T14:11:10Z

EmilStenstrom
Jan 4, 2023
Author

@brettcannon This has been discussed in vscode before with the conclusion that this should be done on a language by language basis. The idea is that you need to understand the Python grammar to be able to know what to highlight and not. Do you agree with that assessment? That's why I posted it in the python language repo.

0 replies

brettcannon · 2023-01-04T19:18:52Z

brettcannon
Jan 4, 2023

I hadn't realized it was already rejected as a core feature and being suggested each language somehow support it. I've moved the issue back.

0 replies

EmilStenstrom · 2023-01-04T23:11:36Z

EmilStenstrom
Jan 4, 2023
Author

Fully understandable, I should have included that information in the original issue. Anyways, it's an awesome feature, if at all possible to build! :)

0 replies

karthiknadig · 2023-01-26T19:31:05Z

karthiknadig
Jan 26, 2023
Collaborator

Moving this to pylance for further investigation. See here for details on Embedded Programming Languages: microsoft/vscode-languageserver-node#1170 (comment)

0 replies

rchiodo · 2023-01-26T19:42:40Z

rchiodo
Jan 26, 2023
Maintainer

@EmilStenstrom do you have any use cases where this would be useful? I mean other than just creating strings with syntax highlighting.

I would think this would be something to add to LSP itself and not have it be language specific.

Pylance shouldn't parse HTML tags and provide completions for HTML. You'd want the HTML language server to parse them.

In Visual Studio, I believe this would be handled with projection buffers. Each language server would be responsible for just their portion. Something would have to indicate how to split the text buffer into its pieces.

VS code has at least one issue that sounds similar:
microsoft/vscode#13821

0 replies

rchiodo · 2023-01-26T19:44:50Z

rchiodo
Jan 26, 2023
Maintainer

Oh it seems VS code has a different way.
https://code.visualstudio.com/api/language-extensions/embedded-languages

That would mean Pylance would generate virtual documents and send each document off to the appropriate server.

0 replies

EmilStenstrom · 2023-01-26T19:49:03Z

EmilStenstrom
Jan 26, 2023
Author

@rchiodo With use-case, do you mean what the benefit would be to have the strings as highlighted instead or regular python strings?

It would be easier to find errors if they were highlighted since the colouring would be clearly off
If there would be autocomplete the authoring experience would be vastly improved

If you by use-case mean if this is code that actually exists in the wild, then yes, it very much does. I'm the author of a library called django-components, which provides a container for reusable web components consisting of some python glue code, html, css and html. Currently users need to have four different files with the different formats, because otherwise there is no way to get syntax highlighting and autocompletion, but since most files are small, it would be a better experience for everyone if you could inline those small files as strings instead. This is possible today, but then authors lose all editor support for those languages...

0 replies

rchiodo · 2023-01-26T19:52:47Z

rchiodo
Jan 26, 2023
Maintainer

This reminds of cell magics in Jupyter. Something like so:

%% sql

SELECT * FROM FOO...

Which we special case right now to just ignore everything.

It would be much nicer for the user if the %%sql turned the rest of the document into a virtual sql doc.

0 replies

rchiodo · 2023-01-26T19:54:20Z

rchiodo
Jan 26, 2023
Maintainer

I asked for use cases because this would be a lot of work to implement so we'd have to justify it based on how many people it would affect.

If we could use the same thing for jupyter notebooks, that might up the affect count.

0 replies

rchiodo · 2023-01-26T19:55:15Z

rchiodo
Jan 26, 2023
Maintainer

Personally, I like your NewType idea. Special casing the .ctor for NewTypes that have specific names.

0 replies

EmilStenstrom · 2023-01-26T21:18:49Z

EmilStenstrom
Jan 26, 2023
Author

Nice catch with the Jupyter case! I think there are many more. The general question is: Do people write code in other languages in python. I have personally done this many times, especially for "glue scripts". The tagstr project (which is actively worked on) has a list of example tags in their repo, which includes the html, sh, and sql tags. I think this is likely the three most common languages that your write inside python strings.

Object-Relations Mappers (ORM:s): Almost all ORM:s have a mode were you write raw sql that get sent to the database directly. Here's Django's documentation for this: https://docs.djangoproject.com/en/4.1/topics/db/sql/ - By marking those string as SQL for VSCode you could greatly reduce the risk of errors inside those brittle SQL strings.

Unix shell scripts: Conditionally calling different shell scripts is VERY common in python. So common that the standard library has a utility called shlex which lets you deal with them. This means shell scripts in code is very common, and single letter errors could potentially delete all your files. By getting syntax highlighting for those strings, you could avoid such errors! :)

HTML templating: Python is used a lot on the web, and not all sites are backed by large templating libraries. Instead, HTML snippets are stringed together with python, leaving a lots of room for errors when no highlighting is available for those strings. Pycharm uses HTML as their example for when language injections are needed.

Those three use-cases should touch a LOT of codebases out there, and if we include all languages that VSCode supports, configuration scripts, deployment code, templating languages, I think it would be hard to find ANY sizeable codebase that doesn't embed another language somewhere.

0 replies

EmilStenstrom · 2023-01-26T21:29:15Z

EmilStenstrom
Jan 26, 2023
Author

Just to test my point, I just picked a microsoft-related project I've recently worked with: the O365 bindings for python. They use HTML strings in their tests: test_teams.py and test_message.py.

0 replies

rchiodo · 2023-01-27T22:40:56Z

rchiodo
Jan 27, 2023
Maintainer

Not sure what the status of the extension client move is, but implementing this requires changes on the client side. So it would require pylance owned the client side or that the python core extension provide some of the support here.

See the example here:
https://github.com/microsoft/vscode-extension-samples/blob/main/lsp-embedded-request-forwarding/client/src/extension.ts

0 replies

karthiknadig · 2023-01-27T22:45:29Z

karthiknadig
Jan 27, 2023
Collaborator

@rchiodo I am still investigating the requirements from the Jupyter side for the client move. I will update you and the team when that is done. For now this should be here.

0 replies

EmilStenstrom · 2023-02-03T20:23:47Z

EmilStenstrom
Feb 3, 2023
Author

Do I understand things correctly that implementing this requires changes to "pyrx" which is a closed source repo linked above? Is this issue still in triage stage? Let me know if there's something else I can do to help out!

0 replies

rchiodo · 2023-02-03T21:09:14Z

rchiodo
Feb 3, 2023
Maintainer

@EmilStenstrom this is in the looking for upvotes phase at the moment. It's why I asked for use cases too.

Changes for this would (at the moment) be in the Python core extension, the Pyright code, and in the private pyrx repo.

0 replies

EmilStenstrom · 2023-02-03T21:15:21Z

EmilStenstrom
Feb 3, 2023
Author

@rchiodo Do you collect the upvotes in this issue? I got a e-mail notice that this was moved to a discussion, but I don't find it there.

0 replies

rchiodo · 2023-02-03T21:20:53Z

rchiodo
Feb 3, 2023
Maintainer

For now, yes the upvotes on the issue would be counted. I'm guessing we'll move it to a discussion at some point. Once @karthiknadig comes back with the information on the client move. I think Jude was initially going to move this to a discussion but we kept it here waiting for Karthik. That's likely the e-mail you got.

Right now the 'client' side code for pylance (pylance exists in two parts, server/client) is created in the Python core extension. We're trying to determine if we can move this to the pylance extension itself.

I think in order to support the idea you proposed, we need to create virtual documents as outlined in VS code's example. In order to create those, that has to happen on the client. Hence the need to know where the client is going first.

0 replies

EmilStenstrom · 2023-02-07T21:44:50Z

EmilStenstrom
Feb 7, 2023
Author

There are a lot of people talking about support for this in python in the original thread in vscode main. Unfortunately, they have no idea that this is discussed here, and since that thread is locked, there is no way I can bring their attention here.

Some of these should be here :)

0 replies

Archmonger · 2023-02-09T01:00:27Z

Archmonger
Feb 9, 2023

In my opinion, the Magical Comment interface seems to be the most pragmatic and extensible of the lot.

In terms of user experience and readability, I would rank them

Tag Strings
- If tag strings do make it into a Python release, then this is the way to go.
- However, the currently proposed interface for tag strings is ugly. It visually looks like a typing error, where the user accidentally forgot to put a space between a variable and a string. I get trying to re-use the f-string interface, but f"{my_string}" feels good since f isn't a PEP8 variable name.
Magical Comment
NewType
Subclass
Annotated Type
- This seems like it would have awkward limitations, such as requiring everything to be stored in variables for proper syntax highlighting
Dummy Methods
- To be honest, these just feel wrong.

0 replies

thebestnom · 2023-02-09T13:23:25Z

thebestnom
Feb 9, 2023

I do have to say that I most like the comment options, as I don't like to inject code only for ide to do something, I kuch prefer that any IDE specific things will be either manual process or comment that will be ignored (or even be used the same) in other IDEs

Even more that it's not that pycharm allows lang injection, is that any jetbrains ide allows it, and comment is the most cross language way to support it

0 replies

judej · 2023-02-13T19:22:08Z

judej
Feb 13, 2023
Maintainer

Moving this issue to discussion as an enhancement request for comments and upvotes.

0 replies

boxed · 2023-03-15T14:09:09Z

boxed
Mar 15, 2023

See also this discussion: python/typing#1370

My suggestion is different from this issue, in that I am arguing for specifying string languages as types. The difference being that the definition can be moved to the function definition or type stubs, rather then every call site.

Take the example of regexes. My suggestion is that re.sub should declare that the first argument is a regular expression string. So instead of all callers to re.sub marking the string language:

s2 = re.sub(
    # language=regex
    '(\d+)foo',
    # language=regex replacement
    '$1bar',
    s,
)

this is very verbose, requires tons of newlines, and again, it's at the call site but it's always the same!

Instead logically it would be much better at the function definition:

def sub(pattern: Regex, replacement: RegexReplacement, string: str):

now everyone who upgrades their type stubs or their python version or whatever will get their code syntax highlighted for free.

20 replies

trevrobwhite · 2025-01-14T10:02:09Z

trevrobwhite
Jan 14, 2025

Can this be made to work with SQLTools and Copilot, see #6847

Similar to who Pycharm implemented it, see Inject SQL & Injection Settings

Schema and table suggestions

Choose from languages

IntelliSense now treats the code within the string as the specified language, it realises the table reference is invalid

Now go back to the *, get the option to expand to all the fields

Or if I remove the * I get to choose from the fields in the selected table:

0 replies

Language injections for controlling syntax highlighting in string literals #3952

Replies: 25 comments · 20 replies

EmilStenstrom Jan 4, 2023 Author

EmilStenstrom Jan 4, 2023 Author

karthiknadig Jan 26, 2023 Collaborator

rchiodo Jan 26, 2023 Maintainer

rchiodo Jan 26, 2023 Maintainer

EmilStenstrom Jan 26, 2023 Author

rchiodo Jan 26, 2023 Maintainer

rchiodo Jan 26, 2023 Maintainer

rchiodo Jan 26, 2023 Maintainer

EmilStenstrom Jan 26, 2023 Author

EmilStenstrom Jan 26, 2023 Author

rchiodo Jan 27, 2023 Maintainer

karthiknadig Jan 27, 2023 Collaborator

EmilStenstrom Feb 3, 2023 Author

rchiodo Feb 3, 2023 Maintainer

EmilStenstrom Feb 3, 2023 Author

rchiodo Feb 3, 2023 Maintainer

EmilStenstrom Feb 7, 2023 Author

judej Feb 13, 2023 Maintainer

Replies: 25 comments 20 replies

EmilStenstrom
Jan 4, 2023
Author

EmilStenstrom
Jan 4, 2023
Author

karthiknadig
Jan 26, 2023
Collaborator

rchiodo
Jan 26, 2023
Maintainer

rchiodo
Jan 26, 2023
Maintainer

EmilStenstrom
Jan 26, 2023
Author

rchiodo
Jan 26, 2023
Maintainer

rchiodo
Jan 26, 2023
Maintainer

rchiodo
Jan 26, 2023
Maintainer

EmilStenstrom
Jan 26, 2023
Author

EmilStenstrom
Jan 26, 2023
Author

rchiodo
Jan 27, 2023
Maintainer

karthiknadig
Jan 27, 2023
Collaborator

EmilStenstrom
Feb 3, 2023
Author

rchiodo
Feb 3, 2023
Maintainer

EmilStenstrom
Feb 3, 2023
Author

rchiodo
Feb 3, 2023
Maintainer

EmilStenstrom
Feb 7, 2023
Author

judej
Feb 13, 2023
Maintainer