Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Action] Automation to detect required website translations #325

Open
leonardpahlke opened this issue Jan 30, 2024 · 53 comments
Open

[Action] Automation to detect required website translations #325

leonardpahlke opened this issue Jan 30, 2024 · 53 comments
Assignees
Labels
board/wg-advocacy Filter for the WG Comms project board info/help-wanted Extra attention is needed wg-advocacy/website

Comments

@leonardpahlke
Copy link
Member

leonardpahlke commented Jan 30, 2024

The TAG ENV website now supports multiple languages. With this new feature, it's essential to ensure that not only the main English version is updated but also the translated versions. Currently, translators must monitor the website; if they notice any changes, they open a PR. This process is error-prone and stressful for those maintaining the translated content. Ideally, when a PR updates the English version of the website (any part of the website), a follow-up issue would be automatically created for each language, which our team can then address.

  • This likely would require writing some part of custom code which we could host in the tag-env-tooling repository.
  • The code could check over a GitHub action if artifacts of the website are touched and if this is done in a en folder.
  • The GH action would then open an issue with details (labels translation required) and ideally ping (over github) a group of people maintaining updates.
  • Optional: Add the opened issues to a new GH project board.
@leonardpahlke leonardpahlke added the info/help-wanted Extra attention is needed label Jan 30, 2024
@SamYuan1990
Copy link
Contributor

SamYuan1990 commented Jan 30, 2024

https://github.com/dorny/paths-filter
I hope by this github action we can have a change list, as a part of monitoring of the change.

is there any one knows how to create github issue automatically with a change list?

@kumarankit999
Copy link
Contributor

I would also like to help here! @leonardpahlke

@sergiopsyalo
Copy link

sergiopsyalo commented Jan 30, 2024

I would also like to help too @leonardpahlke

@leonardpahlke
Copy link
Member Author

Assigned you both! Thanks!

@guidemetothemoon
Copy link
Contributor

+1 for getting some automation in place for this!

For the QA checks workflow we're using a changed-files GH Action that works really nicely on identifying changes in the specified files/directories so this or simple action can be relevant to use here as well. Implementation is in this workflow: https://github.com/cncf/tag-env-sustainability/blob/main/.github/workflows/checks.yml

There are multiple GH Actions to choose from that can be used to automatically create an issue based on some event. For example, this one: https://github.com/marketplace/actions/create-an-issue

Maybe we can even create groups with people that have been contributing to the different language translations so that those groups can be added on respective issues (f.ex. spanish translators group to an issue related to translating spanish content).

I'll be happy to contribute both on the implementation and review. Don't hesitate to reach out @kumarankit999 @sergiopsyalo 😊

@leonardpahlke
Copy link
Member Author

@kumarankit999 could you join the next TAG ENV meeting (Feb 14) to briefly report on the status and perhaps we can discuss any questions you may have. https://calendar.google.com/calendar/embed?src=72e93a411f02e5664bb4485c04311b83dae6a62574e4ab882a1ccf8526aa9bf1%40group.calendar.google.com

@sergioarmgpl
Copy link
Contributor

Sorry I commented with the wrong user, could you assign the user @sergioarmgpl instead @leonardpahlke

@leonardpahlke
Copy link
Member Author

@kumarankit999 & @sergioarmgpl pls let us know if you need any assistance here! thanks :)

@sergioarmgpl
Copy link
Contributor

I will use this actions as a reference https://github.com/marketplace/actions/create-an-issue I will need this part ${{ secrets.GITHUB_TOKEN }} I mean that secret to have super powers to create the issue cc: @leonardpahlke

@leonardpahlke
Copy link
Member Author

I will use this actions as a reference https://github.com/marketplace/actions/create-an-issue

👍

AFAIK the GITHUB_TOKEN is a default environment variable which gets automatically set in GitHub Actions. Should exist by default.

@sergioarmgpl
Copy link
Contributor

@Dianmz will be with me for some shadowing, just to introduce her to this kind of Issues :), Sure, I will use the GITHUB_TOKEN in that way :) @leonardpahlke

@sergioarmgpl
Copy link
Contributor

Retaking this after completing the translation into Spanish

@SamYuan1990
Copy link
Contributor

how is this issue on going?

@leonardpahlke
Copy link
Member Author

how is this issue on going?

@sergioarmgpl ^ — do you need any support? Do we have a plan how to implement it?

@sergioarmgpl
Copy link
Contributor

I will work on this, this week, sorry I was a little bit absent.

@thelooter
Copy link
Contributor

Another way this could be approached would be using a tool like crowdin or Weblate to manage translations. You would essentially put in the translatable strings and then add languages it should be translated too. This would also make tracking the process of translations easier. It would also lower the bar to entry for translation, since you could much more easily translate small chunks instead of needing to translate a whole document at once

@leonardpahlke
Copy link
Member Author

Thanks @thelooter!

@cjyabraham would be interesting to hear your thoughts on this.

@leonardpahlke
Copy link
Member Author

cc @nate-double-u

@cjyabraham
Copy link
Contributor

I think the place for human-crafted website translations is shifting as automated in-browser AI translations get better and better. If we're not already there, I expect we will soon be at the point where automated translations are about as good as those done by a human. Given that, I would look at supporting human translations as more of a short-term thing rather than something we'll need to do indefinitely. I'm interested, of course, in other people's views on this :)

We've also been exploring this topic wrt the Events site and, as of now, have opted not to build out a complex translation infrastructure.

Another temporary solution for this current issue is to create an issue every time an edit needs to be done to the site with checkboxes for each of the languages. Once the English language edit is done, the other language teams can be assigned to the issue until they have done their translation and checked their box. I'm not sure if this is practical given the cadence of English-language edits but it would save having to have GH actions running to automate things. I'm curious to know how the Glossary team manages this...

I don't like the idea of integrating with a 3rd-party service to have them manage our translations. Integrating sites like that is rarely simple and ties us to that closed-source solution.

@sergioarmgpl
Copy link
Contributor

At least by my side just to detect different make the list of the files to update them send the notification, thats my plan, I have time to work this week, I have some vacations :) news soon.

@sergioarmgpl
Copy link
Contributor

I saw into the glossary there is no automation for translation. So its made by humans. Also I think that human intervention promote people to contribute in some way.

@leonardpahlke
Copy link
Member Author

Also I think that human intervention promote people to contribute in some way.

Yes, translations are a low-barrier type of contribution aimed at parts of the community that are not yet represented in the TAG. However, in general contributions should not be obsolete, otherwise they lose their value.

@thelooter
Copy link
Contributor

I think the place for human-crafted website translations is shifting as automated in-browser AI translations get better and better. If we're not already there, I expect we will soon be at the point where automated translations are about as good as those done by a human. Given that, I would look at supporting human translations as more of a short-term thing rather than something we'll need to do indefinitely. I'm interested, of course, in other people's views on this :)

We've also been exploring this topic wrt the Events site and, as of now, have opted not to build out a complex translation infrastructure.

Another temporary solution for this current issue is to create an issue every time an edit needs to be done to the site with checkboxes for each of the languages. Once the English language edit is done, the other language teams can be assigned to the issue until they have done their translation and checked their box. I'm not sure if this is practical given the cadence of English-language edits but it would save having to have GH actions running to automate things. I'm curious to know how the Glossary team manages this...

I don't like the idea of integrating with a 3rd-party service to have them manage our translations. Integrating sites like that is rarely simple and ties us to that closed-source solution.

While I agree that AI gets better and better every day, its imo just not at the point yet, where it can be reliably used for translations. Many languages rely on certain phrases that can't just be translated one to one, and reasoning those phrases is still a big challenge for AI.

AI can maybe be used to create an initial translation that's refined by humans but I wouldn't just blindly let AI translate it.

I agree that binding ourselves to a closed source tool is not an ideal solution, that's why I suggested weblate. It's open source (https://github.com/WeblateOrg/weblate)

This also makes updating and tracking translations a lot easier since it's managed in smaller chunks and therefore easier to spot changes. As I stated before, it also allows for easier contribution, as one doesn't have to edit/translate a whole document but they can edit small snippets, e.g. while on the train.

@nate-double-u
Copy link
Member

AI can maybe be used to create an initial translation that's refined by humans but I wouldn't just blindly let AI translate it.

This has been what I've heard from localization teams too. AI can be used for a first pass, but it doesn't quite reduce the human workload yet because of the amount of editing that is still required.

I'm curious to know how the Glossary team manages this...

@jihoon-seo & @seokho-son may have some insight on how the glossary localization teams manage this.

@sergioarmgpl
Copy link
Contributor

sergioarmgpl commented Apr 5, 2024

Screenshot 2024-04-04 at 22 19 18

This our advance, we already detected the differences between different the languages.
We found:

  • Some inconsistencies in a file called landscape/SustainabilityUseCasesAndLandscape2023.ko.md, We would like to remove the ko, just to standardize the files a little bit.

The other thing is that I will create an Issue using a cron workflow in actions, notifying the active people working on Spanish and Chinese. Thats our idea.

Questions:

  • What do you think about?
  • What do you think about calling a script in the workflow or executing bash in the workflow?

@leonardpahlke @guidemetothemoon cc: @Dianmz

@Dianmz
Copy link
Contributor

Dianmz commented Apr 5, 2024

To follow up the previous comment, how would you like or preferred to visualize the changed files?

Personally I prefer to create an issue but we want to know your opinion.

cc: @sergioarmgpl @leonardpahlke @guidemetothemoon

@SamYuan1990
Copy link
Contributor

Screenshot 2024-04-04 at 22 19 18 This our advance, we already detected the differences between different the languages. We found:
  • Some inconsistencies in a file called landscape/SustainabilityUseCasesAndLandscape2023.ko.md, We would like to remove the ko, just to standardize the files a little bit.

The other thing is that I will create an Issue using a cron workflow in actions, notifying the active people working on Spanish and Chinese. Thats our idea.

Questions:

  • What do you think about?
  • What do you think about calling a script in the workflow or executing bash in the workflow?

@leonardpahlke @guidemetothemoon cc: @Dianmz

for Chinese, please notify me :-)

@leonardpahlke
Copy link
Member Author

Some inconsistencies in a file called landscape/SustainabilityUseCasesAndLandscape2023.ko.md, We would like to remove the ko, just to standardize the files a little bit.

We are currently in the process of rewriting the landscape document. This will take another month or two. We can remove the KO file and follow standards! (@seokho-son @sysnet4admin — we started to translate the TAG env website in different languages, like Spanish see PR, we could do the same for Korean)

The other thing is that I will create an Issue using a cron workflow in actions, notifying the active people working on Spanish and Chinese. Thats our idea.

Sounds good! We could create a GitHub team tag-env-translators-spanish, tag-env-translators-korean, tag-env-translators-german, tag-env-translators-chinese, … and tag them in the issue. This can be defined in the CNCF repository “people” that has a manifest file https://github.com/cncf/people/blob/c7f8625ebd386574959bcb807de1b48ded6f4da2/config.yaml#L1160 (example team: tag-env-chairs)

@leonardpahlke
Copy link
Member Author

To follow up the previous comment, how would you like or preferred to visualize the changed files?

Personally I prefer to create an issue but we want to know your opinion.

I think issues are the best way for sure. One issue per language based on PR. So if we open a PR which makes changes to multiple files we just mirror that with one issue per language (not per file).

@guidemetothemoon
Copy link
Contributor

Yes, I also vote for issues.
Suggestion regarding GitHub teams for this also makes a lot of sense.

@sergioarmgpl
Copy link
Contributor

Thank you, retaken this issue this week, yesterday was my birthday and I left my job 🙃 cc: @leonardpahlke @guidemetothemoon

@sysnet4admin
Copy link
Contributor

sysnet4admin commented Apr 9, 2024

@leonardpahlke

We are currently in the process of rewriting the landscape document. This will take another month or two. We can remove the KO file and follow standards! (@seokho-son @sysnet4admin — we started to translate the TAG env website in different languages, like Spanish see #340, we could do the same for Korean)

I will take a look after finishing my duty work 😭

@sergioarmgpl
Copy link
Contributor

Thank you, we have deadline with @Dianmz to finish this the next week. I will create the issues as suggested. cc: @leonardpahlke @guidemetothemoon

@guidemetothemoon
Copy link
Contributor

Thanks for keeping us updated @sergioarmgpl and happy belated birthday!🥳 Don't hesitate to reach out if you need any support.

@Dianmz
Copy link
Contributor

Dianmz commented Apr 18, 2024

image

Take a look at our progress 👀

cc: @sergioarmgpl @leonardpahlke @guidemetothemoon

@Dianmz
Copy link
Contributor

Dianmz commented Apr 18, 2024

We are going to keep on working on a general workflow for any language to avoid reusing code

cc: @leonardpahlke @guidemetothemoon

@leonardpahlke leonardpahlke added board/wg-advocacy Filter for the WG Comms project board wg-advocacy/website labels Apr 18, 2024
@Dianmz
Copy link
Contributor

Dianmz commented Apr 19, 2024

We found some trouble with the workflow but we are basing on the Kubernete's glossary

cc: @leonardpahlke @guidemetothemoon @sergioarmgpl

@Dianmz
Copy link
Contributor

Dianmz commented Apr 24, 2024

We are currently testing the final configuration of the workflow!

very close to reach the goal 🎯

cc: @leonardpahlke @guidemetothemoon @sergioarmgpl

@guidemetothemoon
Copy link
Contributor

Very exciting! Thanks for all your time and effort, and for sharing updates with the TAG @Dianmz @sergioarmgpl 💚

@sysnet4admin
Copy link
Contributor

Some inconsistencies in a file called landscape/SustainabilityUseCasesAndLandscape2023.ko.md, We would like to remove the ko, just to standardize the files a little bit.

We are currently in the process of rewriting the landscape document. This will take another month or two. We can remove the KO file and follow standards! (@seokho-son @sysnet4admin — we started to translate the TAG env website in different languages, like Spanish see PR, we could do the same for Korean)

The other thing is that I will create an Issue using a cron workflow in actions, notifying the active people working on Spanish and Chinese. Thats our idea.

Sounds good! We could create a GitHub team tag-env-translators-spanish, tag-env-translators-korean, tag-env-translators-german, tag-env-translators-chinese, … and tag them in the issue. This can be defined in the CNCF repository “people” that has a manifest file https://github.com/cncf/people/blob/c7f8625ebd386574959bcb807de1b48ded6f4da2/config.yaml#L1160 (example team: tag-env-chairs)

Hi @leonardpahlke
I commit and push the content of ko.
There is partly translating contents. so maybe it is not poor quality ;)
So failed to quality check. It may consider to check before merging it.

Please let me know if I could contribute more for it.
Have a nice weekend!

@sergioarmgpl
Copy link
Contributor

This is the PR for this issue: #407

@sergioarmgpl
Copy link
Contributor

Hi Dear TAG Env Team, this is the final the work that we are doing with @Dianmz.

This includes a workflow called Check outdated content, that generates new Issues for Languages for ES & ZH languages at the moment. Currently the issues are assigned to Diana and I, but should be modified to the new groups that you suggested or new ones.

The issues will look like this:
Screenshot 2024-04-30 at 22 05 41

And the issue like this:
Screenshot 2024-04-30 at 22 05 53

The workflow is in the .github/workflows/check-outdated-content.yml

@Dianmz
Copy link
Contributor

Dianmz commented May 1, 2024

@sergioarmgpl and I have just finished the first version of this issue!
We will create a second issue to optimize the languages with a reusable workflow.

cc: @leonardpahlke @guidemetothemoon

@sergioarmgpl
Copy link
Contributor

Also we have to update the assignees for the new teams to create in Github such as tag-env-translators-spanish, tag-env-translators-korean, tag-env-translators-german, tag-env-translators-chinese. This is pending cc: @leonardpahlke @guidemetothemoon

@claire-fletcher
Copy link
Contributor

Hey! What is the status on this work? It would be great to get an update in the WG Comms meeting.

@sergioarmgpl
Copy link
Contributor

sergioarmgpl commented Sep 12, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
board/wg-advocacy Filter for the WG Comms project board info/help-wanted Extra attention is needed wg-advocacy/website
Projects
Status: In Progress
Development

No branches or pull requests