Skip to content

Latest commit

 

History

History
182 lines (127 loc) · 17.1 KB

README.md

File metadata and controls

182 lines (127 loc) · 17.1 KB

Simply simplify language

Use LLMs to simplify your institutional communication. Get rid of «Behördendeutsch».

GitHub License PyPI - Python GitHub Stars GitHub Issues GitHub Issues Current Version linting - Ruff

Contents

Usage

  • You can run the app locally, in the cloud or in a GitHub Codespace.
  • If you just have an OpenAI account and do not want to use other LLMs you also can run a variant of the app that only uses OpenAI models. However, we recommend to give the Mistral and Anthropic models a spin too. These models are very powerful too and we continuously achieve very good results.
  • We also added an app version that uses the Azure OpenAI Service.
  • We also added an app version that only leverages the Google Gemini models (1.5 / 2.0 Flash and Pro).

Run the app locally

  • Create a Conda environment: conda create -n simplify python=3.9
  • Activate environment: conda activate simplify
  • Clone this repo.
  • Change into the project directory: cd simply-simplify-language/
  • Install packages: pip install -r requirements.txt
  • Install Spacy language model: python -m spacy download de_core_news_sm
  • Create an .env file and input your API keys:
    OPENAI_API_KEY=sk-...
    ANTHROPIC_API_KEY=sk-...
    MISTRAL_API_KEY=KGT...
    GOOGLE_API_KEY=...
  • Change into app directory: cd _streamlit_app/
  • Start app: streamlit run sprache-vereinfachen.py
  • To run the OpenAI only version use streamlit run sprache-vereinfachen_openai.py.
  • To run the Google Gemini only version use streamlit run sprache-vereinfachen_google.py. Get your API key from here: Google AI Studio.
  • To run the Azure OpenAI only version use streamlit run sprache-vereinfachen_azure.py. Have a look here to learn more about how to setup the app with Azure.

Run the app in the cloud

  • Instantiate a small virtual machine with the cloud provider of your choosing. Suggested size: 2 vCPUs, 2GB RAM, and an SSD with a couple of GBs are sufficient. This will set you back no more than a couple of Francs per month.
  • Install Conda and set up the repo and app as described above.
  • Recommendation: To use a proper domain and HTTPS it makes sense to install a reverse proxy. We very much like Caddy server for this due to its simplicity and ease of installation and usage. It's also simple to request certificates – Caddy does this automatically for you.

Run the app in a Github Codespace

  • This will enable you to develop and run the app in a cloud-hosted development workspace, using GitHub Codespaces.
  • Some benefits: No need for any local installation, you can do anything right from your Web Browser. You get some free hours with your GitHub account, so this should not be expensive at all. However, do not forget to delete unused Codespaces to avoid being billed unnecessarily. It's also a sensible idea, to make sure that Auto-delete codespace is activated in the settings.
  • Create a GitHub codespace on this repository by clicking Code > Codespaces > Create codespace on main
  • Wait until the codespace is started. You'll get a new url like https://scaling-pancake-jwjjw54r4r7hpqpg.github.dev/
  • If you run into network connection issues try another browser. In our testing Firefox sometimes threw errors, Chrome worked fine.
  • Install the project requirements from the terminal: pip install -r requirements.txt
  • Install spacy language model: python -m spacy download de_core_news_sm
  • Create an .env file and input your API keys like described above.
  • Alternatively, create Repository Secrets on GitHub, which will get available for your codespaces automatically when starting up (only if you are a repo owner / using your own fork).
  • Start app: python -m streamlit run _streamlit_app/sprache-vereinfachen.py
  • Codespaces auto-proxies and forwards Port 8501 to something like https://scaling-pancake-jwjjw54r4r7hpqpg.github.dev/
  • In case you don't like coding in your browser, you can also use a local Visual Studio Code IDE and connect to the remote Codespace.

Note

The app logs user interactions to your local computer or virtual machine to a file named app.log. If you do not want to have analytics, simply comment out the function call in the code.

Project information

Institutional communication is often overly complicated and hard to understand. This particularly affects citizens who do not speak German as their first language or who struggle with complex texts for other reasons. Clear and simple communication is essential to ensure everyone can participate in public processes and access services equally.

For many years, the cantonal administration of Zurich has gone to great lengths to make communication more inclusive and accessible. With the increasing volume of content, we wanted to explore the potential of AI to assist in this effort. In autumn 2023, we launched a pilot project. This app is one of the results. The code in this repository represents a snapshot of our ongoing efforts.

We developed this app following our communication guidelines. However, we believe it can be easily adapted for use by other public institutions.

What does the app do?

  • This app simplifies complex texts, rewriting them according to rules for «Einfache Sprache» or «Leichte Sprache». To simplify your source text, the app applies effective prompting, and uses your chosen LLM.
  • The app also offers coaching to improve your writing. Its analysis function provides detailed, sentence-by-sentence feedback to enhance your communication.
  • It measures the understandability of your text on a scale from 0 (very complex) to 20 (very easy to understand).
  • The One-Click feature sends your text to six LLMs simultaneously, delivering six drafts in a formatted Word document within seconds, ready for download.

In English «Einfache Sprache» is roughly equivalent to «Plain English, while «Leichte Sprache» has similarities to «Easy English».

Important

At the risk of stating the obvious: By using the app you send data to a third-party provider (OpenAI, Anthropic, and Mistral AI in case of the current state of the app). Therefore strictly only use non-sensitive data. Again, stating the obvious: LLMs make errors. They regularly hallucinate, make things up, and get things wrong. They often do so in subtle, non-obvious ways, that may be hard to detect. This app is meant to be used as an assistive system. It only yields a draft, that you always must double- and triple-check.

At the time of writing many users in our administration have extensively used the app with thousands of texts over several months. The results are very promising. With the prototype app, our experts have saved time, improved their output, and made public communication more inclusive.

Note

This app is optimized for Swiss German («Swiss High German», not dialect). Some rules in the prompts steer the models toward this. Also the app is setup to use the Swiss ss rather than the German ß The understandability index assumes the Swiss ss for the common word scoring and we replace ß with ss in the results.

What does it cost?

Usage is inexpensive. You only pay OpenAI & Co. for the tokens that you use. E.g. for the translation of 100 separate «Normseiten» (standard pages of 250 German words each) to Einfache Sprache or Leichte Sprache you pay depending on the model token cost - so roughly between 0.15 CHF for Gemini Flash 1.5/2.0 and around 25 CHF for the o1 model (as of December 2024). The hardware requirements to run the app are modest too. As mentioned above a small VM for a couple of Francs per month will suffice.

Our language guidelines

You can find the current rules that are being prompted in utils_prompts.py. Have a look and change these according to your needs and organizational communication guidelines.

We derived the current rules in the prompts mainly from these of our language guidelines:

A couple of findings

  • Large Language Models (LLMs) already have an understanding of Einfache Sprache, Leichte Sprache, and CEFR levels (A1, A2, B1, etc.) from their pretraining. It's impressive how well they can translate text by simply being asked to rewrite it according to these terms or levels. We have also successfully created test data by asking models to e.g. describe a situation at each of the six CEFR levels (A1 to C2).
  • LLMs produce varied rewrites, which is beneficial. By offering multiple model options, users receive a range of suggestions, helping them achieve a good result. It's often effective to use the One-Click mode, which consolidates results from all models.
  • Measuring text understandability is really helpful. Early in our project, we realized the need for a quantitative metric to evaluate our outputs, such as comparing different prompts, models, and preprocessing steps. We developed and index for this purpose that we call the «Zürcher Verständlichkeits-Index» or «ZIX» 😉. We created the ZIX using a dataset of complex legal and administrative texts, as well as many samples of Einfache and Leichte Sprache. We trained a classification model to differentiate between complex and simple texts. By selecting the most significant model coefficients, we devised a formula to estimate a text's understandability. This pragmatic metric has been useful to us in practice. We plan to publish the code for it in the coming weeks.
  • Finally, validating your results with your target audience is crucial, especially for Leichte Sprache, which requires expert and user validation to be effective.

How does the understandability score work?

  • The score takes into account sentence lengths, the readability metric RIX, the occurrence of common words and overlap with the standard CEFR vocabularies A1, A2 and B1.
  • At the moment the score does not take into account other language properties that are essential for e.g. Einfache Sprache (B1 or easier, similar to «Plain English») or Leichte Sprache (A2, A1, similar to «Easy English») like use of passive voice, subjunctives, negations, etc.

We have published the ZIX understandability index as a pip installable package. You can find it here.

Note

The index is slightly adjusted to Swiss German. Specifically we use ss instead of ß in our vocabulary lists. In practice this should not make a big difference. For High German text that actually contains ß the index will likely underestimate the understandability slightly with a difference of around 0.1.

What does the score mean?

  • Negative scores indicate difficult texts in the range of B2 to C2. These texts will likely be very hard to understand for many people (this is classic «Behördendeutsch» or legal text territory...).
  • Positive scores indicate a language level of B1 or easier.

Outlook

These are a couple of areas that we are actively working on:

  • Conduct more quantitative tests: We aim to quantitatively evaluate LLM responses for completeness and accuracy. One approach we are testing is using LLMs as judges to assess these responses.
  • Enhance our understandability index: We plan to improve word scoring by detecting issues like passive voice, subjunctives and other linguistic properties that are currently missed.
  • Establish standard vocabularies for administrative terms: Consistent output for terms and names is crucial for our clients. We need to create a system that allows clients to manage these vocabularies themselves.
  • Experiment with open-weight models on-premise: To process sensitive data, we are exploring lightweight models fine-tuned with German data that can be used on-premise.

Project team

This project is a collaborative effort of these people of the cantonal administration of Zurich:

A special thanks goes to Government Councillor Jacqueline Fehr, who came up with the idea and initiated and supported the project.

Feedback and contributing

We are interested to hear from you. Please share your feedback and let us know how you use the app in your institution. You can write an email or share your ideas by opening an issue or a pull requests.

Please note that we use Ruff for linting and code formatting with default settings.

Miscellaneous