Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add README.md instructions and test build environment #142

Closed

Conversation

jodygarnett
Copy link
Contributor

@jodygarnett jodygarnett commented May 22, 2020

During the course of testing I found that transifix no longer autoupdates, reported as #141

@juanluisrp
Copy link
Contributor

I'm not sure if this is needed at all. I think language files are pushed to Transifex using tx client every now and then. Maybe @fxprunayre knows if these files are really needed or not.

@fxprunayre
Copy link
Member

fxprunayre commented May 25, 2020

if these files are really needed or not.

They are if we want to continue to have transifex synchronized on english PO files.

BUT, from my point of view, this approach has some issues:

a) Changing a letter in an english line of the doc will reset all existing translations

eg. @archaeogeek improved a lot the english version during the November sprint
image

those changes will be updated by this PR
image

and the transifex key will no longer exist
image

and as a result, all the translation work made by translator will be lost.

b) Working on a per line basis does not sound the easiest to work on - translating the full page with figures would make it more usable?

c) ...?

So what are the options?

  1. do not use English lines as a key and continue to use transifex. Solves a).
  2. move back to github for translations
    • one folder per language
    • convert all languages from transifex with more than X% of translations to RST formats in github ?
  3. something else ?

@josegar74
Copy link
Member

  1. do not use English lines as a key and continue to use transifex. Solves a).

Not really sure how can be implemented, I guess will require for each line or paragraph in the english file to define also a key, quite cumbersome and not really sure how to implement it.

  1. move back to github for translations

To me seems the best option unless exists any option to handle this in a better way in the current setup that I'm not aware. With the current setup almost any change in the english texts reset the translations in the other languages, what is far from optimal.

+1 for me for this option.

@jodygarnett
Copy link
Contributor Author

Both ways are horrible!

I would contribute to use what we have now and rescue translations when the English test changes.

The trouble with using GitHub to manage both translations as independent rst files is there is no way to catch these kind of small changes (or even very large ones.) The percentage translated is very useful at identifying that change has occurred.

-1 for me

@juanluisrp
Copy link
Contributor

Sphinx has an option to add an uuid to the message string. I don't know if that could help here:
https://www.sphinx-doc.org/en/latest/usage/configuration.html#confval-gettext_uuid

If true, Sphinx generates uuid information for version tracking in message catalogs. It is used for:

Add uid line for each msgids in .pot files.

Calculate similarity between new msgids and previously saved old msgids. This calculation takes a long time.

If you want to accelerate the calculation, you can use python-levenshtein 3rd-party package written in C by using pip install python-levenshtein.

The default is False.

@jodygarnett
Copy link
Contributor Author

That is really interesting, I would like to try the procedure outlined by read-the-docs that makes use of this setting.

This is in keeping with do not use English lines as a key and continue to use transifex.

@juanluisrp
Copy link
Contributor

juanluisrp commented Jun 3, 2020

Can you remote the .pot files again? I think we don't need them. They have been years without any update. Or have you generated them from the current .rst files?

@jodygarnett
Copy link
Contributor Author

I believe I generated them:

  • with current rst files
  • with the uuid (so we do not lose French translations when English words change)

I would be happy to have a breakout chat with @fxprunayre as this should resolve his concern ...

I do wish this build was faster :)

@fxprunayre
Copy link
Member

I would be happy to have a breakout chat with @fxprunayre as this should resolve his concern ...

If you tested it and it works fine, super, then go for it.

@jodygarnett
Copy link
Contributor Author

@fxprunayre I think we need to review the workflow above: https://docs.readthedocs.io/en/stable/guides/manage-translations.html

My understanding is this approach (pot files in github) are not something we can test in isolation, it must be made available to Transifex first.

Please note that this is just a community acitivity (the website was down and I tried to help) and I have no budget committed to this activity.

@jodygarnett
Copy link
Contributor Author

Talking with @juanluisrp, he recommends trying a small project first in the doc folder just to see how UUIDs work.

For now I am will remove all the POT changes so this PR can stay focued on a small change to the build system.

This was created when the website was unavaialble and is part of moving instructions to README.md files as outlined in core-geonetwork/README.md

Signed-off-by: Jody Garnett <jody.garnett@gmail.com>
@jodygarnett jodygarnett changed the title Restore pot files used for external translation Add README.md instructions and test build environment Jun 4, 2020
@volaya
Copy link

volaya commented Jun 29, 2020

I have been testing the option of adding uuids, but it doesnt seem to solve the problem. Even with uuids, if the original string changes, it will appear as a new string with no translation in transifex, so transifex doesnt seem to be able to link this new string to the translations available for the old one.

The uuids are not even constant for the same translation message, and they change each time the doctree is built. Calling Sphinx to generate the .pot files results in new uuids. I don't see how Sphinx can recognize that one string is a new version of an old one, assing it the same uuid, so it can be linked with the available translations for those older ones.

Searching a bit, this seems to be a common problem with no easy solution other than manually "re-linking", new strings with the existing translations for older ones.

according to the documentation, the gettext_uuid parameter also causes Sphinx to compute Levenshtein distances, so those could be used to detect minor changes like a typo fix, etc, but I havent seen them in the resulting files, and in any case, it doesnt seem Sphinx or Transifex can directly use those values to match old string and their modified versions automatically

I will investigate this last option a bit more.

@jodygarnett
Copy link
Contributor Author

With respect to:

The uuids are not even constant for the same translation message, and they change each time the doctree is built.

That is surprising, I understood that these UUIDs had some summery of the content that was more stable (not less) over builds and "untangled" by the Transifex website to help keep translations across changes.

@volaya
Copy link

volaya commented Jun 30, 2020

I update what I wrote above:

Sphinx seems to be smart enough to detect those small changes in the original strings, and keep the associated translated strings. That happens even if no uuids are added.

Creating the pot files from the docs using make gettext and then updating the po files with sphinx-intl update -p build/gettext -l <lang> should correctly handle those changes. If po files are kept on a repo and updated manually, that should be fine then.

I wasnt able to integrate this with Transifex, though. Pushing changes was not working, since I could push the pot files (which caused the removal of previous translations when the original strings had changed), but pushing the translated po files was not working for some reason. The transifex-github integration was also not working, but I had never used it before so I am not completely sure that it was not my fault.

@jodygarnett
Copy link
Contributor Author

I think you have to pay, or get permission as an open source project like geonetwork has done.

@jodygarnett
Copy link
Contributor Author

The approach outlined in this PR is fine, it relies on transifex web client being used for translations so it can make suggestions based on previous translation that are close (even if the key changes).

@jodygarnett
Copy link
Contributor Author

There is no interest in pursuing this PR.

@jodygarnett jodygarnett closed this Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants