Given a list of urls in urls.txt: https://s3.amazonaws.com/fieldlens-public/urls.txt, write a program that will fetch each page and determine whether a search term exists on the page (this search can be a really rudimentary regex - this part isn't too important).
You can make up the search terms. Ignore the addition information in the urls.txt file.
Constraints
- Search is case insensitive
- Should be concurrent.
- But! It shouldn't have more than 20 HTTP requests at any given time.
- The results should be writted out to a file results.txt
- Avoid using thread pooling libraries like Executor, ThreadPoolExecutor, Celluloid, or Parallel streams.
- The solution must be written in Kotlin or Java.
Sample urls.txt: https://s3.amazonaws.com/fieldlens-public/urls.txt
The solution must be able to be run from the command line (dont assume JDK is available): java -jar ./jarFileName.jar
The current problem requires us to create a multi-threaded implementation of a Website Searcher, limited to 20 threads at any given instance. Furthermore, we are asked to avoid using any thread pooling libraries which make this task trivial.
- The solution was designed keeping in mind how thread pools work.
- A main driver program derives the
WebsiteSearcherRunnable
task and submits it to the implementation of a blocking queue. - The blocking queue implemented here, mimics the functionality of a blocking queue and enqueues and de-queues a particular task.
- The
WebsiteSearcherRunnable
task implements the JavaRunnable
task and in it'srun
method parses the given URL, extracts the html data and does a rudimentary word search. - Basically a thread pool executor, here implemented as
WebsiteSearcherTask
de-queuesRunnable
objects and executes them. This class controls the number of thread which can simultaneously be run. Not more thann
thread can be started off at a particular time (here n = 20)
- Gradle is used as a build system. Install gradle and run a
gradle build
in the main project structure to compile the code - The jar file is uploaded which can be directly run as
java -jar build/libs/website-searcher.jar
or can be run with this command after agradle build
- The results are located in
results.txt
which contain the list of URLS with the search term