-
-
Notifications
You must be signed in to change notification settings - Fork 154
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Markdown to PDF workflow automation (#391)
* added pdf workflow automation and headers for 5 languages * added markdown to pdf folder
- Loading branch information
Showing
22 changed files
with
1,628 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
## Code | ||
|
||
name: Markdown to PDF | ||
|
||
on: | ||
push: | ||
branches: | ||
- main | ||
paths: | ||
- '1_1_vulns/translations/**' | ||
pull_request: | ||
branches: | ||
- main | ||
paths: | ||
- '1_1_vulns/translations/**' | ||
|
||
env: | ||
LANGUAGES: '["de", "it", "pt", "hi", "zh"]' # Add or remove language codes as needed | ||
|
||
jobs: | ||
convert-markdown-to-pdf: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout repository | ||
uses: actions/checkout@v4 | ||
|
||
- name: Set up Node.js | ||
uses: actions/setup-node@v4 | ||
with: | ||
node-version: '20' # Using Node.js version 20 | ||
|
||
- name: Configure locale | ||
run: | | ||
sudo locale-gen en_US.UTF-8 | ||
echo "LC_ALL=en_US.UTF-8" >> $GITHUB_ENV | ||
echo "LANG=en_US.UTF-8" >> $GITHUB_ENV | ||
echo "LANGUAGE=en_US.UTF-8" >> $GITHUB_ENV | ||
- name: Install necessary fonts | ||
run: | | ||
sudo apt-get update | ||
sudo apt-get install -y fonts-noto fonts-noto-cjk fonts-noto-color-emoji fonts-indic fonts-arphic-ukai fonts-arphic-uming fonts-ipafont-mincho fonts-ipafont-gothic fonts-unfonts-core | ||
- name: Install md-to-pdf | ||
run: npm install -g md-to-pdf | ||
|
||
- name: Run markdown_to_pdf.sh for each language | ||
run: | | ||
for lang in $(echo $LANGUAGES | jq -r '.[]'); do | ||
./markdown_to_pdf.sh --language $lang | ||
done | ||
working-directory: ./markdown-to-pdf | ||
|
||
- name: Get current date and time | ||
id: date | ||
run: echo "date=$(date '+%Y-%m-%d-%H-%M-%S')" >> $GITHUB_ENV | ||
|
||
- name: Upload generated PDFs as artifact | ||
uses: actions/upload-artifact@v4 | ||
with: | ||
name: pdf-translations-zipfile-${{ env.date }} | ||
path: ./markdown-to-pdf/generated/*.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,3 @@ | ||
|
||
<div class="frontpage"> | ||
<div class="smalllogo"> | ||
<img src="/img/OWASP-title-logo.svg"></img> | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
# OWASP Top Ten for LLMs - Markdown to PDF | ||
|
||
The contents of this directory are used to generate the PDFs of translated versions of the OWASP Top Ten for LLMs using the md-to-pdf npm package. | ||
|
||
## How to Contribute Translations | ||
|
||
To contribute translations to the OWASP Top Ten for LLMs project, please follow these steps: | ||
|
||
1. Fork this repository on GitHub by clicking on the "Fork" button at the top right corner of the repository page. | ||
|
||
2. On your copy of this repo, create an ISO two-letter subdirectory in the `1_1_vulns` directory. This subdirectory should contain all the markdown files of the translation. You can match the same format as other languages. | ||
|
||
3. Copy the Markdown files of the English version to your new directory and start translating. Make sure to follow the instructions in `_template.md` to ensure consistent styling. (There is no need to copy the _template.md file) The Markdown to PDF generator relies on this consistency. | ||
|
||
4. Aim to replicate the translation as accurately as possible and avoid deviating from the original meaning of the Top Ten for LLMs. | ||
|
||
5. In the `LLM00_Introduction.md` file, there is a section **About this translation**. You can add your name as a translator in this section. | ||
|
||
6. Once the translation is complete, open a descriptive pull request to this repository to get it merged in. | ||
|
||
7. There is no need to generate the PDF using the process in this document, but if you want to validate that your Markdown is in the correct format (and possibly add some styling if it needs tweaking), follow the instructions below. | ||
|
||
8. If you are validating a translation, you can open an issue and tag the original translator to make the change. Once both the original translator and reviewer agree, you can open a pull request to this repository. (Remember to add your name to the About this translation section) | ||
|
||
9. You should aim to keep a summary of the discussion around translations in the Github issue even if you were chatting in the OWASP Slack channel, which is located here: [OWASP Slack Channel](https://owasp.slack.com/archives/C063W2E791U). | ||
|
||
|
||
## How to generate a Translated PDF | ||
|
||
### Requirements | ||
1. To generate PDFs from the markdown files you'll need to have the [md-to-pdf](https://www.npmjs.com/package/md-to-pdf) npm package installed globally. You can do this by installing globally if you have NPM installed on your machine: | ||
```shell | ||
npm i -g md-to-pdf | ||
``` | ||
|
||
2. You will require the translated Markdown files described above. | ||
|
||
3. You will also need a CSS style file for the language in the `styles` directory. For languages based on latin characters you can copy the Portuguese file `topten-pt.css` as a starting point. | ||
|
||
|
||
### Descriptions of contents | ||
|
||
- ``markdown-to-pdf/generated`` directory: This directory is where the PDFs are stored once they are generated. After the Markdown files are converted to PDF format, the resulting PDF files are placed in this directory for easy access and distribution. | ||
|
||
- ``markdown-to-pdf/img`` directory: This directory is used to store all the images that will be included in the PDF files. When converting Markdown to PDF, any referenced images are typically embedded in the PDF document. The images are stored in this directory so that they can be easily referenced and included during the conversion process. | ||
|
||
- ``markdown-to-pdf/styles`` directory: The styles directory contains custom CSS files for each language. When converting Markdown to PDF, the Markdown is first converted to HTML, and then the HTML is "printed" using Puppeteer to generate the PDF. The custom CSS files in the styles directory ensure that the PDFs have consistent styling and alignment, closely resembling the original Markdown files. Each language may have its own CSS file to handle language-specific formatting requirements. | ||
|
||
- ``markdown-to-pdf/frontmatter.md``: This file serves as the configuration for Puppeteer, the tool used to generate the PDFs. It specifies how the PDFs should be generated and, importantly, defines the header and footer for each page of the PDF. The header and footer typically contain information such as page numbers, document title, and other relevant details. **It is crucial to note that on line 57 of frontmatter.md, the title is translated and needs to be changed before generating a PDF.** This ensures that the PDFs have the correct translated title. | ||
|
||
- ``markdown-to-pdf/markdown_to_pdf.sh``: This file is responsible for executing the conversion process from Markdown to PDF. It contains the necessary commands and instructions to convert the Markdown files to PDF format using the md-to-pdf npm package. The usage of this file is typically explained in the project documentation or README file, providing step-by-step instructions on how to run the script and generate the PDFs. | ||
|
||
|
||
### Usage | ||
|
||
To generate PDFs from the markdown files, follow these steps: | ||
|
||
1. Modify line 57 of `frontmatter.md` to show the correct title of the OWASP Top Ten in the appropriate language | ||
|
||
2. Validate that the ISO code directory for the language exists in the `../1_1_vulns` directory and that the corresponding CSS file for the language exists in the `styles` directory. | ||
|
||
3. Run the following command to generate the PDF: | ||
|
||
```shell | ||
./markdown_to_pdf.sh --language <language_iso_code> | ||
``` | ||
|
||
Example | ||
|
||
```shell | ||
./markdown_to_pdf.sh --language pt | ||
``` | ||
|
||
The generated PDF will be saved in generated directory with the ISO code as the filename. If a file already exists it will be overwritten. | ||
|
||
4. Validate that the contents of the file look similar to that of the main English file. | ||
|
||
|
||
### Options | ||
|
||
|
||
- **Keep Markdown** If you add the ``--keep-markdown`` flag at the end, the script will not delete the temporary markdown file generated from all the cocatenated ones. Please note that the temporary file is located in ``./generated/tmp``. eg: | ||
```shell | ||
./markdown_to_pdf.sh --language pt --keep-markdown | ||
``` | ||
|
||
|
||
|
||
## License | ||
|
||
This project is licensed under the terms of the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/). |
Large diffs are not rendered by default.
Oops, something went wrong.
Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
#!/bin/bash | ||
|
||
# Ensure UTF-8 encoding in the environment | ||
export LC_ALL=C.UTF-8 | ||
export LANG=C.UTF-8 | ||
|
||
# Check if a directory and stylesheet filename are provided | ||
if [ -z "$2" ] || [ "$1" != "--language" ]; then | ||
echo "Usage: $0 --language <language>" | ||
exit 1 | ||
fi | ||
language="$2" | ||
|
||
# Define directories and files | ||
current_directory=$(pwd) | ||
directory="$current_directory/../1_1_vulns/translations/$language" | ||
dir_name=$(basename "$directory") | ||
generated_folder="$current_directory/generated" | ||
tmp_folder="$generated_folder/tmp" | ||
output_file="$tmp_folder/${dir_name}.md" | ||
temp_pdf_file="$tmp_folder/${dir_name}.pdf" | ||
pdf_file="$generated_folder/${dir_name}.pdf" | ||
frontmatter="$current_directory/frontmatter.md" | ||
stylesheet="$current_directory/styles/topten-$language.css" | ||
intro_file="${directory}/LLM00_Introduction.md" | ||
|
||
# Check if file exists | ||
if [[ -f "$intro_file" ]]; then | ||
# Use awk to handle multi-line patterns and extract the title, ensuring UTF-8 handling | ||
header_title=$(awk '/<div class="doctitle">/,/<\/div>/{ if ($0 ~ /<\/div>/) { print p; p=""; next } if ($0 ~ /<div class="doctitle">/) next; p=p $0 }' "$intro_file" | xargs) | ||
echo "Extracted header title: $header_title" | ||
else | ||
echo "Error: File does not exist." | ||
fi | ||
|
||
# Check if the provided argument is a directory | ||
if [ ! -d "$directory" ]; then | ||
echo "Error: '$directory' is not a directory." | ||
exit 1 | ||
fi | ||
|
||
# Check if the provided stylesheet exists | ||
if [ ! -f "$stylesheet" ]; then | ||
echo "Error: '$stylesheet' does not exist." | ||
exit 1 | ||
fi | ||
|
||
# Create the 'generated' directory if it doesn't exist | ||
if [ ! -d "$generated_folder" ]; then | ||
mkdir "$generated_folder" | ||
fi | ||
|
||
# Create the 'tmp' directory if it doesn't exist | ||
if [ ! -d "$tmp_folder" ]; then | ||
mkdir "$tmp_folder" | ||
fi | ||
|
||
# Delete the PDF and Markdown file if they already exist | ||
if [ -f "$pdf_file" ]; then | ||
echo "Deleting existing PDF file: $pdf_file" | ||
rm "$pdf_file" | ||
fi | ||
# Delete the PDF and Markdown file if they already exist | ||
if [ -f "$pdf_file" ]; then | ||
echo "Deleting existing temporary PDF file: $temp_pdf_file" | ||
rm "$pdf_file" | ||
fi | ||
if [ -f "$output_file" ]; then | ||
echo "Deleting existing temporary Markdown file: $output_file" | ||
rm "$output_file" | ||
fi | ||
|
||
# Start with a clean output file | ||
> "$output_file" | ||
|
||
# Add the frontmatter if it exists | ||
if [ -f "$frontmatter" ]; then | ||
cat "$frontmatter" >> "$output_file" | ||
echo "" >> "$output_file" # Adds a newline after the frontmatter | ||
fi | ||
|
||
# Sort markdown files alphabetically and concatenate them | ||
for file in $(find "$directory" -maxdepth 1 -name '*.md' | sort); do | ||
# Skip the frontmatter | ||
if [[ "$file" != "$frontmatter" ]]; then | ||
cat "$file" >> "$output_file" | ||
echo "" >> "$output_file" # Adds a newline between files | ||
fi | ||
done | ||
|
||
echo "Combined markdown files into $output_file" | ||
|
||
# Convert the combined Markdown file to PDF | ||
md-to-pdf --basedir "$current_directory" --stylesheet "$stylesheet" --document-title "$header_title" "$output_file" | ||
mv "$temp_pdf_file" "$pdf_file" | ||
|
||
if [ -f "$output_file" ] && [ "$3" != "--keep-markdown" ]; then | ||
echo "Deleting temporary Markdown file: $output_file" | ||
rm "$output_file" | ||
fi | ||
|
||
if [ -f "$pdf_file" ]; then | ||
echo -e "\033[32m###############################################################################################################\033[0m" | ||
echo -e "\033[32m########################################### Success!! ##################################################\033[0m" | ||
echo -e "\033[32m###############################################################################################################\033[0m" | ||
echo "PDF file generated: $pdf_file" | ||
echo -e "\033[32m###############################################################################################################\033[0m" | ||
echo -e "\033[32m###############################################################################################################\033[0m" | ||
fi |
Oops, something went wrong.