Skip to content

Commit

Permalink
Add: add proposal of i18n tool which is developed for ospp 2024 project
Browse files Browse the repository at this point in the history
  • Loading branch information
chouchongYHMing committed Sep 9, 2024
1 parent 705986f commit cde6e27
Show file tree
Hide file tree
Showing 5 changed files with 187 additions and 0 deletions.
187 changes: 187 additions & 0 deletions docs/i18n-docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
---
title: Automation tools for i18n and translate
authors:
- "@chouchongYHMing" # Authors' github accounts here.
reviewers:
- TBD
approvers:
- "@warjiang"

creation-date: 2024-08-19

---

# Automation tools for i18n and translate

## Summary

Internationalization (also called i18n) is essential for software that aims to become global. During the development of the Karmada dashboard, we encountered inefficiencies, errors, and lack of engineering rigor in manually maintaining translation content (also called locales files). To solve these issues and meet the i18n needs of the Karmada dashboard project, we developed a ready-to-use automated i18n tool.

<!--
国际化(也称为i18n)对于需要全球化的软件来说是必须要的,在开发Karmada dashboard的过程中,我们遇到了手动维护翻译内容(也称为locales文件)效率低、容易出错、多人协作时缺乏工程化的问题,因此我们我们开发了一套开箱即用的i18n自动化工具,解决这些问题,满足Karmada dashboard项目的i18n的需求。
-->

## Motivation

During the development of the Karmada dashboard, early on, we maintained the locales files manually. If new Chinese content was added to the code, it required to replace by the corresponding i18n keys and functions. After that, a mapping of code points to the corresponding Chinese content was added to the Chinese locales file. Then translating the Chinese content and inserting a mapping of code points to the translated content in the locales files of the respective languages. When the project was small and the amount of Chinese characters was limited, the cost of manual maintenance was relatively acceptable. However, as the project scale expanded and the number of Chinese characters increased, the cost of manually maintaining locales files rose exponentially. Therefore, an automated i18n tool was needed to meet the i18n requirements of the Karmada dashboard. In the open source community, it was hard to find a ready-to-use i18n tool that perfectly supported the i18n needs of the Karmada dashboard. Therefore, we aimed to develop an automated i18n tool which is based on the perspective of the Karmada dashboard's i18n requirements, to satisfy the specific i18n needs of the Karmada dashboard.
<!--
在开发Karmada dashboard的过程中,早期我们是通过人工的方式来维护locales文件,如果代码中新增了中文内容,则需要替换中文内容为对应的码点+i18n的调用函数,之后在中文的locales文件中插入码点和对应的中文内容的映射关系,再翻译中文内容,并在对应的语言的locales文件插入码点和对应翻译后内容的映射关系。在项目规模小,中文字符少的情况下手动维护的成本相对可以接受,但是一旦项目规模逐渐扩大、中文字符变多的情况下,手动维护locales文件的方式的成本会呈指数倍上升,因此需要一套i18n自动化工具来满足Karmada dashboard的i18n需求。在开源社区中我们发现很难找到一套开箱即用的i18n工具能够完美支持Karmada dashboard的i18n需求,因此我们希望从Karmada dashboard的i18n需求出发,实现一套满足Karmada dashboard的i18n需求的i18n自动化工具。
-->

### Goals

1. Extract Chinese characters from the code and automatically replace them with calls to i18n functions
2. Automatically maintain multilingual locales dictionary files.
3. Support integration with third-party translation capabilities to automatically translate Chinese copy into the target language.

<!--
- 从代码中提取中文字符并自动将中文字符替换为i18n函数的调用
- 自动地维护多语言的locales字典文件
- 支持接入第三方翻译能力,自动的将中文文案翻译到目标语言
-->

### Non-Goals

1. Developers need to manually add the styles of Chinese characters types that are not already supported by the tool when encountering them.
2. Character length inconsistency encountered in multilingual display.

<!--
1. 开发人员需要在遇到工具不支持的汉字样式时手动添加样式。
2. 多语言显示过程中遇到的字符长度不一致现象
-->

## Proposal

<!--
1. 自动节点识别:增强工具以自动识别和处理源代码中的各种节点类型,包括纯字符串节点、模板文字节点、JSX文本节点和JSX属性节点。此功能将确保对可能需要国际化的所有潜在文本元素进行全面扫描和修改。
2. 汉字替换:实现一个健壮的机制,用相应的i18n密钥替换原始文件中所有检测到的汉字。这一步对于为多语言支持准备代码库和确保所有文本内容都为国际化做好准备至关重要。
3. 与本地化文件的集成:自动将新的i18n键包含到locale /zh-CN中。json文件。此过程应有效地处理更新,确保所有密钥都准确地反映在本地化文件中,而无需人工干预。
4. 翻译执行:开发和整合使用更新后的zh-CN的翻译程序。json文件生成对应的其他语言的翻译,如英语(en-US)和法语。该功能将允许快速和准确地制作多语言内容。
-->
### User Stories (Optional)

<!--
Detail the things that people will be able to do if this KEP is implemented.
Include as much detail as possible so that people can understand the "how" of
the system. The goal here is to make this feel real for users without getting
bogged down.
-->

#### Story 1
As a developer of the Karmada dashboard, when introducing new features with a significant amount of new Chinese content, there is an automated tool that can extract, translate, and generate corresponding locale files for this new content with a single click, thereby reducing the maintenance cost of i18n. The community administrator provides the corresponding i18n translation rules for specific descriptions. In automatic translation, you can preferentially use the i18n translation rules provided by the community administrator and combine them with automatic translation to improve the overall effect of i18n translation.

<!--
作为Karmada dashboard的项目开发人员,开发新feature的时候在项目中引入了大量的新增中文内容,对于这部分内容希望有一套自动化的工具可以一键完成新增中文内容的提取、翻译并生成对应的locale文件,降低i18n的维护成本。社区管理员对于特定的描述提供对应的i18n翻译规则,自动化翻译的时候可以优先采用社区管理员提供的i18n翻译规则再结合自动化翻译,提高整体i18n翻译的效果。
-->
#### Story 2
As software developers, they can improve the code for each module. The executable code for each module is stored in the 'src' directory. Developers simply need to open each file to directly modify and enhance it. When they want to develop new tools in the future and need to use these modules, they can directly reference them, avoiding redundant production.
<!--
作为程序开发者,它可以对每个模块的代码进行改进。在scr目录下存放了每个模块的执行代码。开发者只需点开每个文件即可直接对其进行修改和增强。当他们在未来想开发的新的工具,需要用到这些模块时,也可以直接进行引用,避免了重复生产。
-->

### Notes/Constraints/Caveats (Optional)

<!--
What are the caveats to the proposal?
What are some important details that didn't come across above?
Go in to as much detail as necessary here.
This might be a good place to talk about core concepts and how they relate.
-->

### Risks and Mitigations

The i18n tool modifies the original files, potentially causing irreversible changes. Developers need to save a version of the previous generation code before using it.
<!--
i18n工具会修改原始文件,可能导致不可逆的更改。开发人员需要在使用前保存上一代代码的一个版本。
-->

## Design Details
The overall i18n automation tool comprises two main modules: the translation module and the command-line module. The translation module is responsible for interfacing with mainstream translation APIs, and its interface-oriented programming approach ensures good extensibility. If a better translation implementation is invented in the future, it can be integrated by simply supporting the corresponding translation interface. The command-line module serves as the entry point for the entire i18n tool, handling the main workflow of the i18n functionality, which is showed as follows:
![process details](./images/i18n-tool-overview.png)
<!--
整体i18n自动化工具包含两个主要的模块:翻译模块以及命令行模块,翻译模块负责对接主流翻译API,面相接口的编程方式保证了良好的拓展性,如果未来有更好的翻译实现,只需要支持对应的翻译接口即可接入;命令行模块是整个i18n工具的入口,负责完成i18n功能的主流程,整体如下所示:
-->


### Modular Design:

1. processAST Function: Responsible for processing the AST, identifying and replacing Chinese strings.
2. generateKey Function: Generates unique keys for use in the i18n JSON file.
3. updateLocales Function: Update Locales files as material for subsequent translation functions.
4. translate Function: Choose the right way to translate from a variety of options.
<!--
功能模块化:
processAST功能: 负责AST的处理,识别和替换中文字符串.
generateKey功能: 生成i18n JSON文件中使用的唯一键。
updateLocales功能:更新Locales文件,作为后续翻译功能的材料。
translate功能: 从多种选择中挑选合适的方式进行翻译。
-->

### Process Design:
#### Translate Process:
The overall design adopts an interface-oriented development approach. In BaseTranslator, defining the translation implementation for single statements as well as for batch statements. Specific implementation classes only need to implement the corresponding translation methods. For example, integrating with Baidu's translation services, we have defined BaiduTranslator. After implementing the corresponding interface, one simply needs to instantiate a BaiduTranslator object to use its translation capabilities.

<!--
整体设计上采用面相接口的开发模式,在BaseTranslator 中定义了对单个语句的翻译实现以及批量语句的翻译实现,具体的实现类中只需要实现对应的翻译方法即可,比如对接baidu提供的翻译实现,我们定义了BaiduTranslator,实现对应的接口后,使用时仅需要实例化BaiduTranslator对象即可调用对应的翻译实现
-->

#### Command-line Process:
The program, which is developed based on Node.js's commander module, primarily handles the main workflow which includes parsing command-line arguments, collecting files to be processed based on configuration parameters which use glob expressions, and iterating over all the files to be processed. It sequentially executes scanning, code generation, translation and locale file generation modules to complete the entire i18n process. The specific internal implementation is as follows:
<!--
基于Node.js的commander模块开发实现的,程序的主流程中主要负责完成命令行参数的解析、根据配置参数通过glob表达式收集需要处理的文件、遍历所有待处理的文件,对待处理的文件执行依次执行扫描、代码生成模块、翻译&生成locale文件模块,完成整个i18n的过程,内部具体实现如下:
-->

1. Scanning Process: The scanning process is responsible for receiving external raw files and returning a processed AST (Abstract Syntax Tree). The internal process is illustrated in the diagram below:
![process details](./images/i18n-tool-scan.png)
<!--
扫描流程:扫描模块负责接收外部传入的原始文件,返回处理好的AST(Abstract Syntax Tree)树,内部流程如图:
-->
Convert the input files into an AST, traverse the AST to find nodes which containing Chinese characters, calculate the code points corresponding to the Chinese characters and save them in the i18nMap, and save the transformed AST nodes in CNpath. After completing one scan, continue to traverse CNpath, converting the Chinese characters in the corresponding AST nodes into the form of i18n function calls, and generate a new AST tree to return.
<!--
将输入的文件转换成AST树,遍历AST树找到包含中文字符的AST节点,计算中文字符对应的码点保存在i18nMap中,并将AST节点作统一转换后保存在CNpath中。完成一趟扫描后,继续遍历CNpath,将对应的AST节点中的中文字符转成i18n的函数调用的形式,生成新的AST树返回出去。
-->
2. Code generating process: This process receives the modified AST from the scanning module and converts this modified AST back into the original code. The internal process is as follows:
![process details](./images/i18n-tool-codeG.png)
<!--
代码生成流程:接收扫描模块返回的修改后的AST树,将修改后的AST树转换成原始代码,内部流程如图:
-->
If the original code does not include an import statement for the i18n instance, an appropriate import statement needs to be inserted. This completes the main workflow of code generation. The process of code generation might inadvertently alter statements unrelated to i18n. To address this issue, a prettier process is introduced to style the generated code according to the original coding style, minimizing changes which brought by code-gen. Finally, the newly generated code is written back into the original file.
<!--
如果原始代码中没有对i18n实例的导入,则需要插入对应的import语句。到这里已经完成代码生成的主流程,代码生成的过程可能会导致i18n无关的语句在做code-gen的过程中也被修改了,为了解决这个问题,额外引入了prettier流程,将生成后代码按照原始代码风格做一次prettier,最小化code-gen带来的变更,最后将生成好的代码覆盖写入原始文件中。
-->

3. UpdateLocales process: This part receives the i18nMap which returned from the scanning module and updates the locale files by various target languages. The internal process is as follows:
![process details](./images/i18n-tool-upLoc.png)
<!--
更新locale文件流程: 接收扫描模块返回的i18nMap ,更新多种目标语言的locales文件,内部流程如图:
-->
Based on the returned i18nMap, append updates to the Chinese locales file. Traverse through the target languages, and after the translation for each language is completed using the translator module, append the translated content to the respective languages’ locale files, thus automating the update of locale files.
<!--
根据返回i18nMap更新追加写入到中文locales文件中,遍历目标语言,在调用translator模块完成对应语言的翻译后,将翻译好的内容追加写入到对应语言的locales文件中,完成locales文件的自动更新
-->

### Test Plan

plan to implement the unit testing function of the i18n tool with vitest.

<!--
计划采用vitest实现i18n工具的单元测试功能
-->

## Alternatives

In addition to the AST-based approach for implementing the i18n tool, it is also possible to use regular expressions. However, considering code maintainability (extensive use of regular expressions may reduce code readability), stability of functionality (regular expressions may encounter tricky corner cases), and the more comprehensive features of the Babel ecosystem, the AST-based approach has been chosen for implementing the i18n tool.

<!--
实现i18n工具的除了基于AST的技术路线,还可以通过正则表达式的方式实现,但是考虑到代码的可维护性(大量正则可能会导致代码可读性变差)、功能的稳定性(正则可能遇到比较棘手的corner case)以及 babel生态功能更加完善,因此考虑基于AST的技术路线实现i18n工具.
-->

<!--
Note: This is a simplified version of kubernetes enhancement proposal template.
https://github.com/kubernetes/enhancements/tree/3317d4cb548c396a430d1c1ac6625226018adf6a/keps/NNNN-kep-template
-->
Binary file added docs/images/i18n-tool-codeG.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/i18n-tool-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/i18n-tool-scan.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/i18n-tool-upLoc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit cde6e27

Please sign in to comment.