This challenge is meant for candidates who wish to intern at Fyle and work with the ML team.
- You should be available to start by Sept 7, 2021
- You should be able to commit to at least 3 months (we strongly prefer 6 months)
Fyle is a fast-growing Expense Management SaaS product. We are ~40 strong engineering team at the moment. About 60% of our engineers started off as interns. Interns at Fyle do extremely challenging and impactful work.
People love working at Fyle. Check out our Glassdoor reviews here. You can read stories from our teammates here.
Under the data
directory, you will find 20 receipt
directories. Each directory has the following files:
- An image file that corresponds to a receipt (e.g.
data/receipt1/recpu6in7u.jpeg
) - OCR output that was obtained by running the receipt through AWS Textract (e.g.
data/receipt1/ocr.json
). You can learn about this file's structure in this document by AWS (link). - An
expected.json
file that contains the receipt amount that should've been extracted
You'll need to fill in a stub function in extract.py called extract_amount
that extracts the amount, given the receipt directory. You can choose to extract from the receipt or the ocr.json or combination of both.
Please don't use specific markers in the given receipts in your submission - you need to write a generic solution that works across the test data. You will be disqualified if we see hacks like this.
First, fork this repo to your github account (keep it public so it is easy for us to check the submission later).
Then, clone the repo to your laptop.
This codebase requires Python 3.7+. It is recommended to use virtualenv.
Then install all the dependencies.
pip install -r requirements.txt
You're ready to begin your task.
Your task is to fix up extract_amount
function so that all the tests pass i.e. amounts in all 20 receipts are extracted correctly. You are free to
use the receipt image or the AWS Textract output for this purpose - please do not ask us which one to use.
Once all the tests pass locally, take a screenshot of the successful run with 100% tests passing. Commit and push your code to your repository.
Please do not spend more than 3 hours on this task.
Run the tests that validate if your extract_amount
is working fine against the test data. You can run all the tests using:
python -m pytest
You will initially see failures. This is expected since the stub function returns a constant 0.0. The output should look like this.
collected 20 items
test_extract.py::test_extract[./data/receipt8] FAILED [ 5%]
test_extract.py::test_extract[./data/receipt1] FAILED [ 10%]
test_extract.py::test_extract[./data/receipt6] FAILED [ 15%]
test_extract.py::test_extract[./data/receipt7] FAILED [ 20%]
test_extract.py::test_extract[./data/receipt9] FAILED [ 25%]
...
If you'd like to run the test against a single directory, run it like this:
python -m pytest test_extract.py::test_extract[./data/receipt1]
Once you finish your task successfully, all tests should pass.
Please run this command to check for any linting errors. You can run this command:
pylint extract.py
If this shows any warnings or errors, please fix them and commit your changes.
Once you are done with your task, please use this form to complete your submission.
You will hear back within 48 hours from us via email. We may request for some changes based on reviewing your code.
Subsequently, we will schedule a phone interview with a Fyle Engineer.
If that goes well, we'll make an offer.