Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i18n French/English language translation #23

Open
kousu opened this issue Nov 1, 2021 · 21 comments
Open

i18n French/English language translation #23

kousu opened this issue Nov 1, 2021 · 21 comments
Assignees
Labels
bug Something isn't working revision Content and reading flow supporting infrastructure Scripts, CI, build tools, hosting

Comments

@kousu
Copy link
Member

kousu commented Nov 1, 2021

We should make an effort to make our site bilingual. We're contravening the language policy (unless Poly has a separate policy from UdeM?) and, I mean, the writing's on the wall with loi 96.

@kousu
Copy link
Member Author

kousu commented Nov 1, 2021

There's probably lots of ways we can go about this, but down the #21 road, sphinx has i18n built in via standard gettext .po files: https://www.sphinx-doc.org/en/master/usage/advanced/intl.html. Any phrase not translated comes out unchanged in the build but a warning is issued so it's easy, or at least feasible, to hunt them down and manage them.

@jcohenadad
Copy link
Member

it's a lot of maintenance-- i'd rather do english only until someone from the francophone bureau sends me a warning

@kousu
Copy link
Member Author

kousu commented Nov 1, 2021

I'm not saying we have to make CI a language cop blocking contributions. I'm envisioning a cronjob that detects untranslated texts and sends us an email about it once a month. I'm also not saying this is something we do right now, I just wanted to pin the issue so that like, in 18 months say we have a solution ready.

Here's an example of what it can look like when it's functioning: https://docs.godotengine.org/fr/stable/

lang=fr

vs

lang=en

Here's their translation file: https://github.com/godotengine/godot/blob/master/doc/translations/fr.po. It looks like they're actually having to use a tool -- open source -- to manage this because they have a lot of text: https://hosted.weblate.org/projects/godot-engine/godot-docs/

Screenshot 2021-11-01 at 14-21-04 Godot Engine Godot Documentation

Looks like this thing even makes a TODO list that can be tackled a bit at a time -- even better than my cron idea.

Here's what an untranslated string looks like: https://docs.godotengine.org/fr/latest/tutorials/plugins/gdnative/gdnative-c-example.html -- it just gets inlined with a 🤷

Screenshot 2021-11-01 at 14-27-46 Exemple de GDNative en C

This string was updated and so ended up back on weblate's TODO list, with the english diffs highlighted to make updating quick: https://hosted.weblate.org/translate/godot-engine/godot-docs/fr/?q=state%3A%3Ctranslated&offset=41

Screenshot 2021-11-01 at 14-28-43 Godot Engine Godot Documentation — French

@jcohenadad
Copy link
Member

wow! beyond having a website in multiple languages, i'm mostly interested in trying it just because it looks very cool 😎

@kousu kousu added bug Something isn't working help wanted revision Content and reading flow supporting infrastructure Scripts, CI, build tools, hosting labels Nov 20, 2021
@kousu
Copy link
Member Author

kousu commented Nov 21, 2021

@taowa corrected me, this is the language policy. Poly's more open to english than UdeM is, because

Dans un contexte où la langue anglaise devient la langue de la technologie, Polytechnique reconnaît l’importance d’encourager l’acquisition, par ses étudiants, d’une connaissance adéquate de cette langue seconde.

but still

2.12 Sites web
Lorsque Polytechnique diffuse un texte ou un document sur Internet, elle peut en présenter une traduction dans d’autres langues. La version française est accessible distinctement.

@RignonNoel
Copy link
Contributor

RignonNoel commented Jun 21, 2022

+1 for Weblate usage. I use it in my company for customer's project and it's REALLY easy and nice to use.

@jcohenadad jcohenadad changed the title i18n i18n French/English language translation Jun 27, 2022
@RignonNoel RignonNoel self-assigned this Aug 1, 2022
@RignonNoel
Copy link
Contributor

@jcohenadad @kousu Is it OK if I take the lead on this one and create/configure a weblate project for the lab ?

I will make sure to document my work and I know how to configure Weblate to use free ACL account for free software like the lab do.

@RignonNoel
Copy link
Contributor

Spoked with Julien and Nick today. I will prepare Weblate and plug it to the website

@kousu
Copy link
Member Author

kousu commented Aug 17, 2022

We have a blocker: if we use the free hosted Weblate, our documentation must be under a libre license. Right now there's no formal license attached to the text of the site/wiki, so it defaults to All Rights Reserved by all the individual contributors.

@RignonNoel did some researches and decided that CC-BY-4.0 or CC-BY-NC-4. are the most common and simplest options right now.

We will need to get the consent of everyone who has contributed, like @alexfoias and @ahill187, to get their consent for the transition.

@RignonNoel
Copy link
Contributor

RignonNoel commented Aug 24, 2022

I did a bunch of integration test yesterday with the official server of Weblate and the official repo of neuropoly, but I discover some problems I did not expect...


1 - The generation of the POT file

In order for weblate to work, we need to have the .pot file (ie: po template) committed into the repository so Weblate can detect the change of template and update the sentences that need a new translation.

Sphinx make gettext function generate .pot files inside the _build/ folder. But this folder is not commited by default.
Also, I was planning to modify the makefile to change this location but finally I discovered that it was not a good idea since the makefile change with Sphinx version and so our work could create breakingchange on every upgrade of the dependencies.

I see two options:

  1. We update the .pot file manually from time to time or on every PR and move them manually to another folder. Habitually it's the way to go with tools like Python/Django, but since here we have a lot of contributors that do little changes I think it will become hard to explain to them.
  2. We can create a Github action to auto-generate the .pot files and move them to another folder. I did some research and it seems to be possible to add a new commit every time we push something on master.

2 - Bug on pages when I18N is activated

@kousu found out that when I activated I18N some pages became unavailable. We do not know the exact reason for the moment, we will need to do some analysis.

We can try to generate a unique po/pot file for all the website, maybe it will fix any potential problem of naming (ex: README.md VS README.po)

@lifetheater57
Copy link
Contributor

To address

1 - The generation of the POT file

issue in #23 (comment) another option may be to use pre-commit hooks.

@kousu
Copy link
Member Author

kousu commented Aug 26, 2022

We can't really run pre-commit hooks because when someone clicks the green Commit button the commit happens, unless we start forcing every change to go through a PR first, and force every PR to have a pre-commit check that enforces translations. I think if we do that then we make maintaining the wiki less appealing than it already is.

I've been imagining that we rotate translation duty around the lab and have everyone spend one hour a week on it, patching up translations for contributions added in that week.

But I like the idea of automating as much as possible! What we can do is run a post-commit hook: add .github/workflow/i18n.yml on: push that generates the .pot file and commits. We'd have to be a bit careful to make sure rapid edits don't gum it up and cause bugs or merge conflicts but I think we can probably get it working. Which is what @RignonNoel was suggesting.

@kousu
Copy link
Member Author

kousu commented Aug 26, 2022

Oh actually come to think of it:

If we do rotate translation duty around once a week, then we only need to update the .pot file once a week as well. We can use on: schedule instead of on: push, and set the time to 3am in the morning, and that will greatly reduce the chance of conflicts.

@RignonNoel
Copy link
Contributor

I continued the investigation and I have something working in local. However, when I tried to see how to toggle the language between french and english I discover that Sphinx does not support this part and that we need to create our own system on top of it.

Basically Sphinx is just a generator of static page, and it does not really support I18N but just allow us to build in one language of our choices based on multiple locales we could have been prepared before. So we need to manually:

  1. Build the doc in french
  2. Build the doc in english
  3. Store the builds in a place in order to have something like:
    • neuropoly.polymtl.ca/fr
    • neuropoly.polymtl.ca/en
  4. Add a custom component with two links (one for french, one for english)
  5. Add a base redirection to french if somebody hit the base URL neuropoly.polymtl.ca

During my research I found this nice article that explain overall the system we need.


Toggle of the language

I did not found how to use the nice ReadTheDoc display on the bottom left of the Sphinx-Book-Theme

Firefox_Screenshot_2022-09-08T16-18-38 507Z

But I have a first solution with two link on the bottom left

Firefox_Screenshot_2022-09-08T16-20-08 593Z

Problem is that this manual solution redirect to homepage... As on the article I linked and I think it's really less interesting than the nice Read The Docs integration that keep the URL.


Deployment

To build the different language and store them in specific folder we can try reusing the script of the article: buildDocs.sh

I already tried to use the make html with some options with I think we will be forced to use directly sphinx-build has in the script proposed. It's the only way to store only the html result in another fodler and not all the doctree.


Since the scope of the project grow every step I dig into it would be nice to have your opinion @kousu before I implement this new architecture, just to make sure I do not put too much time in it if you don't want to approve all theses change of process.

@kousu
Copy link
Member Author

kousu commented Sep 8, 2022

These are good researches, @RignonNoel. I think it's fine that

I took a closer look at my original example; Godot actually has two repos, with the first containing the .po files from Weblate, and the second the original source text, and the first instructs ReadTheDocs to combine both and to run conf.py from the second which has this code which handles language switching, and, in particular, reads the language out of $READTHEDOCS_LANGUAGE

language = os.getenv("READTHEDOCS_LANGUAGE", "en")

So their deployment is rather complicated, and that's what a successful i18n project looks like. It looks on par with what you found.

Still, I think we can probably shorten it. I don't think we need buildDocs.sh, though it's good to have for comparison.

Instead, if we patch the Makefile like this:

diff --git a/Makefile b/Makefile
index d4bb2cb..b9e83e2 100644
--- a/Makefile
+++ b/Makefile
@@ -14,6 +14,12 @@ help:
 
 .PHONY: help Makefile
 
+html: html-en html-fr
+       @echo -n # no-op, to override the catch-all below
+
+html-%:
+       @language=$* $(SPHINXBUILD) -b html "$(SOURCEDIR)" "$(BUILDDIR)/$*" $(SPHINXOPTS) $(O)
+
 # Catch-all target: route all unknown targets to Sphinx using the new
 # "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
 %: Makefile

And conf.py like this:

diff --git a/conf.py b/conf.py
index 88139f4..a6fb717 100644
--- a/conf.py
+++ b/conf.py
@@ -22,7 +22,7 @@ import os.path
 project = 'NeuroPoly Internal Wiki'
 copyright = '2021, NeuroPoly'
 author = 'NeuroPoly'
-
+language=os.environ.get('language', 'fr')
 
 # -- General configuration ---------------------------------------------------
 
@@ -94,7 +94,7 @@ for dir, dirs, fnames in os.walk("."): #XXX "." is possibly buggy? This isn't ne
     dirs[:] = [d for d in dirs if d not in exclude_patterns] # prune the search
     for fname in fnames:
         if os.path.splitext(fname)[0] == "README":
-            z = os.path.join("_build", "html", dir, "index.html")
+            z = os.path.join("_build", language, dir, "index.html")
             #html_extra_path.append(z)
             os.makedirs(os.path.dirname(z), exist_ok=True)
             if os.path.lexists(z):

then make html produces

p115628@joplin:~/src/intranet.neuro.polymtl.ca$ ls _build/*
_build/en:
agenda-and-calendar.html  conferences.html  geek-tips                     _images       objects.inv     practical-information  search.html     _static
bibliography              contact.html      genindex.html                 index.html    onboarding      README.html            searchindex.js  writing-articles.html
computing-resources       courses.html      ideas-for-cool-projects.html  mri-scanning  _panels_static  rf-lab                 _sources

_build/fr:
agenda-and-calendar.html  conferences.html  geek-tips                     _images       objects.inv     practical-information  search.html     _static
bibliography              contact.html      genindex.html                 index.html    onboarding      README.html            searchindex.js  writing-articles.html
computing-resources       courses.html      ideas-for-cool-projects.html  mri-scanning  _panels_static  rf-lab                 _sources

(of course, in my copy, the contents are both in English because I don't have the .po files you are working with).

And then we just have to adjust the publish script to match:

diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
index 753be90..930cc2e 100644
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -31,5 +31,5 @@ jobs:
         uses: peaceiris/actions-gh-pages@v3
         with:
           github_token: ${{ secrets.GITHUB_TOKEN }}
-          publish_dir: _build/html
+          publish_dir: _build/
           cname: intranet.neuro.polymtl.ca

(I'm not sure why but if I used sphinx-build -M html then the output ended up in _build/html/en/html and _build/html/fr/html; but using -b html put it in _build/en and _build/fr; we can try to figure out what the difference between -b and -M is later but for now at least this is a working prototype)

Problem is that this manual solution redirect to homepage... As on the article I linked and I think it's really less interesting than the nice Read The Docs integration that keep the URL.

Right. This is because we're using sphinx-book-theme (and want to use furo), neither of which support i18n. But I think injecting code like you did is alright, it's a standard feature of furo/sphinx.

More, we should be able to, using jinja2, get the path to the current page, so that we can generate the link to the other language without losing the user's place. I think probably the function we need is pathto(). So we'd just say something like

<a href="/fr/{% pathto(this document) %}">fr</a>
<a href="/en/{% pathto(this document) %}">en</a>

@kousu
Copy link
Member Author

kousu commented Sep 8, 2022

@RignonNoel, est-ce que ton prototype est partageable? Envoyerais-tu le comme un PR? Je pourrais te donner des meilleures commentaires si je peux faire marcher le même code.

@jcohenadad
Copy link
Member

@RignonNoel do you know what technology is behind the translation of this page? https://engerlab.com/ (see button top right).

the source of the page mentions a "gtranslate" so I am wondering if google offers options to add a button at the top of a website for translation.

@RignonNoel
Copy link
Contributor

@RignonNoel, est-ce que ton prototype est partageable? Envoyerais-tu le comme un PR? Je pourrais te donner des meilleures commentaires si je peux faire marcher le même code.

Oui et non, il y a pas grand chose a partager puisque en terme de code c'est tres petit. C'est plus des questions de processus et d'essai/erreur pour demystifier ce qui est possible ou non a travers chacune des documentations puisque le processus complet n'etait disponible nul part.

Mais puisqu'on est d'accord je vais mettre tout cela en place et partager une PR

@RignonNoel
Copy link
Contributor

@RignonNoel do you know what technology is behind the translation of this page? https://engerlab.com/ (see button top right).

the source of the page mentions a "gtranslate" so I am wondering if google offers options to add a button at the top of a website for translation.

I did some research and I found this product: https://gtranslate.io/

It seems to be a plugin/widget/tool that you can install on top of your website and that will auto-translate your content based on some provider like GoogleTranslate. Technically it seems to work with a simple JQuery (Javascript) that will transform the page dinamically on demand.

  1. It seems to be an automatic translation, not a manual one (complex for the kind of content we have)
  2. It's a proprietary software (even if it have a 0$ plan available)

@jcohenadad
Copy link
Member

Thank you for looking into this. Should we consider gtranslate in light of the possible difficulties/roadblocks with the current sphinx/weblate strategy?

@RignonNoel
Copy link
Contributor

RignonNoel commented Sep 13, 2022

Thank you for looking into this. Should we consider gtranslate in light of the possible difficulties/roadblocks with the current sphinx/weblate strategy?

@jcohenadad Legaly it work if we put the repo in french and add the button for auto-translate in english.

And bonus.. An auto-translation is a lot less work for maintain in the time:

  • No need to wait after different native speaker to update the translation
  • No need to debate on the "correct" translation
  • Technical integration is FAR more easy! (from a complete architecture and dependencies to just a widget)

But automatic translation is good only for "common" content that google translate succeed to translate correctly. If you have tiny details or a lot of accronyms and expert word its clearly not a good idea.

Sadly, I think it's the case for us. So IMHO I think it's not a good strategy for our use case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working revision Content and reading flow supporting infrastructure Scripts, CI, build tools, hosting
Projects
None yet
Development

No branches or pull requests

4 participants