Skip to content

Latest commit

 

History

History
80 lines (64 loc) · 4.5 KB

readme.md

File metadata and controls

80 lines (64 loc) · 4.5 KB

👀 2023 Solution Challenge: ImReader 👀

2023 Solution challenge Demo Video of ImReader

☘️ What's ImReader?

ImReader is the solution to solve the 10th goal of the UN's SDGs, Reduced Inequalities. It's a service for people who use the Voice Assistant program (iPhone is Voice over, Android is Voice Assistant), such as a visual impairment.

The Voice Asthma program is especially helpful for blind people to use smartphones. However, some images cannot be read and simply described as "detailed images." Because of this, they have more difficulty in acquiring information than others.

To solve this problem, we developed a service called ImReader! If voice assistant program that we developed ourselves recognizes an image, it does not inform it as a "detailed image" instead extracts the text from the image by inserting the image into the Deep Learning model(OCR). This text is then told to the user using the TTS API. Through this process, there is an advantage that the user can access not only the plain text but also the text in the image.


☘️ How to Use

We really tried to complete the service, but unfortunately we didn't finish it on time. So we'll show you a prototype of UI and a communication results between the server and the model.

  • Prototype (Situation: When ordering)

    1. Turn off the Voice Assistant built into the System to use ImReader.
    2. Users look around the app to order the food they want
    3. If there is a character in the image, the character recognized by the model is heard to the user using the TTS.
    4. Users proceed with the rest including payment
  • Communication Result (Server and the model)

You can test it by following the steps!

  1. Please go to Postman.
  2. Please set the link to http://35.234.33.62/img-src .
  3. Please send the base64 code in the following format to the body. { "base64": "base64-string" }
  4. Press Send to get the corresponding results.

☘️ Used Technology & Architecture

image

  1. The client implemented as Kotlin sends the image information to base64.
  2. The server then uses the virtual machine in GCP to run the OCR model with that information.
  3. When the returned result value is sent back to the client.
  4. The client hears the text to the user through the TTS API.

☘️ Team Helppy

[ Android ] [ Back-End ] [ Deep Learning ] [ Deep Learning ]
Park Jaeyoung
Park Jaeyoung

Park Injae
Park Injae
Lee Seulbi
Lee Seulbi
Jeon Junseok
Jeon Junseok