survey.tm
streamlines the process of managing and updating
translations for multilingual CAPI surveys, significantly easing the
task for users to maintain accuracy and consistency across multiple
languages, especially when questionnaires undergo changes and working
with multiple questionnaires.
Key features include:
- Automated Synchronization: Automatically extracts and consolidates survey text items from the Survey Solutions Designer, ensuring translations are always up-to-date with the latest questionnaire versions.
- Centralized Translation Management: Utilizes a Google Sheets-based ‘Translation Database’ for easy and intuitive management of translations by multiple translators. See example here.
- Efficiency and Consistency: Minimizes manual work, reduces errors, and enhances consistency across translations, saving hours in translation management for both survey manager and Translator(s).
- Quick Setup: Designed for a setup time of just 5 minutes, it’s scalable across multiple languages and questionnaires.
Initially tailored for Survey
Solutions with future plans to support
ODK-based systems, survey.tm
is an essential tool for anyone
conducting multilingual surveys seeking to improve their translation
workflow.
- R version 4.0.0 or higher must be installed on your machine. If you do not have it, you can download and install the latest version of R from CRAN.
- A valid Google account for accessing Google Sheets.
- A CAPI questionnaire within Survey Solutions.
Once you have the prerequisites ready, proceed with the installation of
survey.tm
as outlined below.
You can install the development version of survey.tm through
# If you don't have `devtools` installed, uncomment the line below to install it:
# install.packages("devtools")
devtools::install_github("RowSquared/survey.tm", ref = "main")
To start using survey.tm
for a data collection project, set up a
“Translation Database Sheet” on Google Sheets. A single sheet is
recommended per project, regardless of the number of questionnaires or
languages involved. There are two setup options:
Run setup_tdb() once(!) to create a “Translation Database Sheet”:
# First-time `googlesheets4` users need to authenticate. Follow the on-screen instructions.
# googlesheets4::gs4_auth(email = "your.email@address.com")
#library(survey.tm)
# Create the translation database sheet. Run once, then comment out or remove.
# The function will print the URL of the new sheet into the console
# setup_tdb(
# ssheet_name = "Project X: Translation Sheet", # Name to assign to new google worksheet
# lang_names = c("German", "French") # Translations/Languages needed. Needs to be ISO 639-1 language name.
# )
- Create a copy of this template sheet
- Rename the sheet for your target language (e.g., “German”).
- Add more sheets for additional languages as needed, naming each one appropriately.
After setting up your “Translation Database Sheet” using either option, ensure you copy and save the Google Sheet ID for future use in the workflow.
Having established your “Translation Database Sheet”, you can jump straight into using survey.tm for your project. Below is a quick start code snippet that encapsulates the essential steps of the translation management workflow. Simply copy and paste the following code into your R environment to begin. It’s recommended to examine each object after each step to gain insight into the process and understand the changes taking place.
library(survey.tm)
# Optional: Authenticate with Google Sheets (required once per session)
# googlesheets4::gs4_auth(email = "your.email@address.com")
# Define your project's Translation Database Sheet ID
translation_db_id = "your_google_sheet_id"
# Specify the Survey Solutions questionnaire IDs you're working with
# Locate your questionnaire ID in Survey Solutions by navigating to your
# questionnaire in the Designer. The ID appears in the URL as
#https://designer.mysurvey.solutions/questionnaire/details/<questionnaire_id>
questionnaire_ids = c("questionnaire1_id", "questionnaire2_id")
#Step 1: Pull (& compare) all data sources ----
#Step 1.1: Retrieve the 'Source Questionnaire' template files from Survey Solutions
suso_trans_templates <- get_suso_tfiles(
questionnaires = questionnaire_ids,
user = "your_email@example.com", # Registered with Survey Solutions Designer
password = "your_password"
)
#Step 1.2: Aggregate and consolidate text items from multiple questionnaires and sheets into one data.table
source_titems <- parse_suso_titems(
tmpl_list = suso_trans_templates, #Nested list as returned by `get_suso_tfiles()`
collapse = TRUE # Keep only unique text items across questionnaires
)
#Step 1.3: Pull 'Translation Database' Google Sheet data
#Pull all columns and rows found in all sheets in 'Translation Database' Google Sheet
tdb_data <- get_tdb_data(ss = translation_db_id)
#Please note: At first run, all elements of the list returned will be empty data.table's
#(as there is not data in the Google Sheet..)
#Step 1.4: Update the 'Translation Database' object based on the text items in current version of 'Source Questionnaire(s)'
# It changes the statuses of any text item in the translation database object that no longer is part of the source questionnaire(s).
# And adds any new text item from source questionnaire(s) not yet found in the database
new_tdb <- update_tdb(
tdb = tdb_data,
source_titems = source_titems
)
#Optional Step 1.5: Query API Translation Services for items without any translation
#Currently DeepL and Google API supported
# batchTranslate_Deepl2(new_tdb,
# API_key = my_api_key
# )
#Optional Step 1.6: Check translation for software-related issues
new_tdb <- syntax_check(
tdb=new_tdb
)
#Step 2: Update the Translation Database ----
# Write one particular (updated) language to the 'Translation Database' Google Sheet
write_tdb_data(new_tdb[["German"]],
ss = translation_db_id,
sheet = "German"
)
#Or write all languages found in 'Translation Database' object to the Google Sheet
# purrr::walk(
# .x = names(new_tdb),
# .f = ~ write_tdb_data(new_tdb[[.x]],
# ss = translation_db_id,
# sheet = .x
# )
# )
#Step 3: Generate outputs for CAPI program ----
# Take the 'German' Translation from database and merge into the source questionnaire as returned by `get_suso_tfiles()`
# Consider only text items of particular statuses defined by Translator
create_suso_file(
tdb.language = new_tdb[["German"]],
source_questionnaire = suso_trans_templates[["NAME-OF-QUESTIONNAIRE"]],
statuses = c("machine", "reviewed", "translated"),
path = "your-path/German_NAME-OF-QUESTIONNAIRE.xlsx"
)
Pull the Questionnaire Template(s) and current ‘Translation Database’, compare and produce an updated translation database object.
Click to see detailed description
To retrieve the current version and text items of the questionnaire(s)
you’re working with, you can use the get_suso_tfiles()
function. This
function downloads the questionnaire
templates
in Excel format from the Survey Solutions Designer and stores them as
nested lists in your environment. You can specify the questionnaires by
using their questionnaire ID, which is a 32-character alphanumeric
identifier.
You can find this ID by logging in to the Survey Solutions Designer and
accessing your questionnaire. The URL in your browser should look like
https://designer.mysurvey.solutions/questionnaire/details/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
,
where the combination of x’s after /details/
represents your
questionnaire ID.
#One needs to pull the Questionnaire Template(s) ('Translation File') from the Survey Solutions Designer
# Define the questionnaire IDs you want to retrieve translations for.
questionnaire_ids <- c("9e37f1c59e1c47039167928481cf42d2", "702bef443dfb4fc8bd8c754612590875")
# Retrieve the 'Source Questionnaire' template files
suso_trans_templates <- get_suso_tfiles(
questionnaires = questionnaire_ids,
sheets=NULL, #If you want to read only specific sheets (like sheets 'Translations' without any reusable categories which is stored on seperate sheet)
user = "your_email@example.com", # Registered with Survey Solutions Designer
password = "your_password"
)
Click to see detailed description
Text items to be translated are currently distributed across
questionnaires and sheets in the object returned by get_suso_tfiles()
.
To aggregate all these items into a single consolidated data.table, use
the parse_suso_titems()
function.
#Aggregate and consolidate text items from multiple questionnaires and sheets into a single data.table using parse_suso_titems().
source_titems <- parse_suso_titems(
tmpl_list = suso_trans_templates, #Nested list as returned by `get_suso_tfiles()`
collapse = TRUE, # Keep only unique text items across questionnaires
qcode_pattern = "^Q\\d+.\\s" #Remove question coding (e.g. Q1.). Needs to be added later in `create_suso_file`
)
Click to see detailed description
In order to add translations to the ‘Source Questionnaires’, it’s necessary to pull any existing translations (if available) specified by the Translator from the ‘Translation Database’ Google Sheet.
get_tdb_data()
is a simple wrapper for googlesheets4::read_sheet()
and returns a list where each element is a sheet in the Google Sheet,
representing translation data for a specific language.
Please note: At first run after setup_tdb()
, all elements of the list
returned will be empty data.table’s.
# googlesheets4::gs4_auth(email = "your.email@address.com")
#Pull all columns and rows found in all sheets in 'Translation Database' Google Sheet
tdb_data <- get_tdb_data(ss = "GOOGLE-SHEET-IDENTIFIER")
Click to see detailed description
Compares ‘Source Questionnaire’ text items (object returned by
parse_suso_titems()) against the 'Translation Database' list returned by
get_tdb_data()`.
Adds any new text item from source questionnaire(s) not yet found in the
database. Text items in any sheet in the translation database object
that no longer is part of the source questionnaire(s) does remain in the
object but status is changed to outdated
.
List returned will be used for subsequent processes, including creating ‘Translated Questionnaires’ for upload to CAPI system and/or pushed to the ‘Translation Database’
#Update the 'Translation Database' object based on the text items in current version of 'Source Questionnaire(s)'
new_tdb <- update_tdb(
tdb = tdb_data,
source_titems = source_titems
)
Click to see detailed description
Object new_tdb
contains the most recent text items from the source
questionnaire along with the previously added translations (if any) for
all languages in the ‘Translation Database’. As a next step, one should
update the Google Worksheet ‘Database’ so that Translators can continue
working on the most recent version of the questionnaire(s). To this end,
use write_tdb_data()
which writes one language from new_tdb
object
to one particular sheet in the Google Worksheet.
Please note: It is advisable to coordinate with the Translator(s) to not overwrite translations that she is adding to the Database sheet while you are about to push new ones.
#Write one particular (updated) language to the 'Translation Database' Google Sheet
write_tdb_data(new_tdb[["German"]],
ss = "GOOGLE-SHEET-IDENTIFIER",
sheet = "German"
)
#Or write all languages found in 'Translation Database' object to the Google Sheet
# purrr::walk(
# .x = names(new_tdb),
# .f = ~ write_tdb_data(new_tdb[[.x]],
# ss = ss,
# sheet = .x
# )
# )
Click to see detailed description
Using object new_tdb
, one can now add the most recent translation to a
questionnaire source template (as returned by get_suso_tfiles()
) of
your choice.
To this end, use create_suso_file()
which will write a .xlsx file that
one can use to upload to the Survey Solutions Designer.
# Take the 'German' Translation from database and merge into the source questionnaire as returned by `get_suso_tfiles()`}
# Consider only text items of particular statuses defined by Translator
#Attention: If you used parameter qcode_pattern at `parse_suso_titems()`, you should specify here again to ensure strings are merged correctly
create_suso_file(
tdb.language = new_tdb[["German"]],
source_questionnaire = suso_trans_templates[["NAME-OF-QUESTIONNAIRE"]],
statuses = c("machine", "reviewed", "translated"),
qcode_pattern = "^Q\\d+.\\s",
path = "your-path/German_NAME-OF-QUESTIONNAIRE.xlsx"
)
# Or loop through all source questionnaires pulled via `get_suso_tfiles()`and create a file for each language sheet on 'Translation Database'
#languages <- names(new_tdb)
#questionnaires <- names(suso_trans_templates)
#purrr::walk(languages, ~{
# lang <- .x
# purrr::walk(questionnaires, ~create_suso_file(
# tdb.language = new_tdb[[lang]],
# source_questionnaire = suso_trans_templates[[.x]],
# statuses = c("machine", "reviewed", "translated"),
# path = paste0("your-path/", lang, "_", .x, ".xlsx")
# ))
#})
In addition to the basic workflow functions, there are a number of additional wrappers that help in translation as well as validating the translation.
The syntax_check
function serves to identify software-related
mismatches between the ‘Text_Item’ and ‘Translation’ columns of a given
translation data table. These mismatches could arise due to:
- HTML-tag issues: Mismatches in HTML tags or their count between the ‘Text_Item’ and ‘Translation’ columns.
- Text Substitution issues: Discrepancies in text substitutions
between the ‘Text_Item’ and ‘Translation’ columns. For instance, if
a text substitution like
%rostertitle%
is present in the ‘Text_Item’ but missing in the ‘Translation’.
new_tdb <- syntax_check(
tdb=new_tdb, # Your list of translations (usually created by `get_tdb_data()` or `update_tdb()`).
pattern = "%[a-zA-Z0-9_]+%" # Regular expression that identifies text items in 'Text_Item' and 'Translation'. The default pattern is `%[a-zA-Z0-9_]+%`, which matches substitutions like `%rostertitle%`.
)
Quickly translate missing ‘Translation’ items using API providers:
Simple wrapper for the Google Translate API, utilizing the
googleLanguageR
package. For more details on setting up Google Translate API and
obtaining authentication details, visit Google Cloud
Translation.
# Query Google API
batchTranslate_GApi(new_tdb,
auth="my_path/my_service_account_key.json")
Simple wrapper for the DeepL API, utilizing the
deeplr
package. For more
details on setting up the API, visit DeepL
API.
batchTranslate_Deepl2(new_tdb,
API_key = my_api_key
)