iOS Field Test Mode OCR Parser using OpenCV and Tesseract

Description

This project is an implementation of text detection and text recognition using OpenCV and Google Tesseract with corresponding Python wrappers to extract field test mode measurements from iOS obtained by built-in screen recording and post-processing.

This project serves solely learning purpose to practice text detection and optical character recognition (OCR).

There are of course many things that can be improved, feel free to experiment, discuss or pull request.

Background

Field Test Mode on the iPhones provides infromation about radio channel conditions, such as received power levels or signal-to-noise ratios, as well as information to which base station the phone is currently connected. However, unlike Android, the iOS does not provide any API to access this data.

Input description

Input for the processing is a video file which as then transformed to the unique image sequence. Since the frame rate is higher than 1 frame per second and the data changes even less frequently we need to extract only frames with new information on them.

A frame sample and processing example

The areas of interests are labels on the left-hand side and values on the right-hand side and how the latter are changing over time. That what we will detect and recognize.

Outcome

As a result I want to get a CSV file with a time trace of a received signal strength. All values are converted to the corresponding format (Integer, Float, Date and time) and validated, such as a valid range. The figures below show the output generated by the script and a sample plot for SINR values over time.

Dependencies

Installation and Usage

All dependencies are in the requirements.txt and can be installed using pip, also in the virtual environment:

pip3 install -r requirements.txt
# or with virtual environment
virtualenv --no-site-packages venv
source venv/bin/activate
pip install -r requirements.txt

Usage example, the resulting CSV file will be saved to sample/video.csv and video frames images to sample/video folder. Where i is the path to the video file.

python3 cli_video.py -i=sample/video.mp4
# or specifying number of CPU cores to use for multithreading
python3 cli_video.py -i=sample/video.mp4 --n-proc=10

You can run only frames extraction:

python3 cli_video.py -i=sample/video.mp4 --skip-extracting

Likewise, if frames already extraced, one can skip that step:

python3 cli_video.py -i=sample/video.mp4 --skip-parsing

Project structure

The main script is located in cli.py file. Us python3 cli.py -h to explore all possible options.

The script file consts.py contains constants, such as types, folder names.

recognizer.py is an implementation of text recognition procedure, along with keys, values detection and correction.

text_helper.py contains functions to validate and correct strings.

cv_helper.py is a set of methods related to OpenCV, e.g. to extract video frames or pre-process images.

Constants explanation

The script file consts.py contains types constants:

CHECK_NONE = 0
CHECK_REPLACE = 1
CHECK_DISTANCE = 2

It also contains replace rules as a dictionary REPLACE_RULES:

REPLACE_RULES = {
    "1,-1": ["i", "[", "]", "l", "7", "?", "t"],
    "0,-1": ["o"],
    "q,-2": ["g"],
    "0,": ["0o", "o0", "00", "oo"],
    "q,": ["qg","qq","gg","gq"]
}

The key is comma seperated, first element is a desired character and the second value is a position on which make replacements. The value is a list of values that might be seen instead. In other words character i, [, ], l, etc. on position -1 will be replaced by 1.

Another helpful dictionary is EXPECTED_KEYS is a list which we describe expected data, for example:

    "phy_cell_id": {
        "corr": CHECK_DISTANCE,
        "map": int
    },
    "rsrp0": {
        "corr": CHECK_REPLACE,
        "map": int
    },

In this case we describe that value phy_cell_id should be an integer and correction function will be Levenshtein distance calculation. The rsrp0 value will be corrected by replacing symbols using REPLACE_RULES dictionary.

Get video frames

TODO: Extract frames description

Text recognition

Google provides a very powerful and helpful library: Tesseract. Using pytesseract the text on image can be recognized with a single line of code:

text = pytesseract.image_to_string(image)

So, we can just feed in the whole image to the Tesseract and get the text. Well, yes and no.

Here is the result: TODO: Image with recognised text here

Text detection

In order to improve accuracy of Tesseract we need to extract single words and process them.

There are some special libraries and neural networks for the text detection problem like EAST or Core ML Vision.

Here some CoreML result (without pre-processing):

EAST frozen model with Tensorflow also gives a result which can be used with some post-processing:

However, I have later realised that using this libraries for this case is too overwhelming and processing takes some time. While doing experiments with image pre-processing, I have noticed that text can be transofrmed into filled rectangles. On the other hand it makes easier to distinguish it from the background and applying OpenCV we can detect now contours of those areas.

Pre-processing of the original image affects the result a lot, but preparing image for contour detection is much easier task than for text detection, we use threshold function to increase contrast and better separate text from the background. Then we enlarge text blocks with dilate and transform them into rectangle shapes.

new_image = image.copy()
_, new_image = cv2.threshold(new_image, 0, 255, cv2.THRESH_BINARY_INV)
kernel = np.ones((15, 15),np.uint8)
new_image = cv2.dilate(new_image, kernel, iterations = 2)

After that pre-processing we can use OpenCV findContours to locate those blocks and as output we get array of contours coordinates.

cntrs = cv2.findContours(proc_image.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cntrs = imutils.grab_contours(cntrs)

Now, we need to tweak a pytesseract call a bit, in order to better deal with single word recognition by setting psm parameter to 6, so that Tesseract will handle the input only as a single word:

text = pytesseract.image_to_string(image, config='--psm 6')

Multiprocessing

The video which contains 68 unique frames with 1 thread takes 236.84 seconds to process. Using some straight-forward multiprocessing by separating frames between CPU, result with 10 parallel threads gives 57.95 seconds for the same processing.

Authorship and license

Igor Kim

The repository is distributed under MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
sample		sample
screenshots		screenshots
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.md		LICENSE.md
README.md		README.md
analysis.ipynb		analysis.ipynb
cli.py		cli.py
cli_video.py		cli_video.py
consts.py		consts.py
cv2detection_try.py		cv2detection_try.py
cv_helpers.py		cv_helpers.py
recognizer.py		recognizer.py
requirements.txt		requirements.txt
server.py		server.py
tests.py		tests.py
text_helpers.py		text_helpers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iOS Field Test Mode OCR Parser using OpenCV and Tesseract

Description

Background

Input description

Outcome

Dependencies

Installation and Usage

Project structure

Constants explanation

Get video frames

Text recognition

Text detection

Multiprocessing

Authorship and license

About

Releases

Languages

License

igorskh/ios-screen-recording-ocr

Folders and files

Latest commit

History

Repository files navigation

iOS Field Test Mode OCR Parser using OpenCV and Tesseract

Description

Background

Input description

Outcome

Dependencies

Installation and Usage

Project structure

Constants explanation

Get video frames

Text recognition

Text detection

Multiprocessing

Authorship and license

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Languages