Problem:

Given a list of urls in urls.txt: https://s3.amazonaws.com/fieldlens-public/urls.txt, write a program that will fetch each page and determine whether a search term exists on the page (this search can be a really rudimentary regex - this part isn't too important).

You can make up the search terms. Ignore the addition information in the urls.txt file.

Constraints

Search is case insensitive
Should be concurrent.
But! It shouldn't have more than 20 HTTP requests at any given time.
The results should be writted out to a file results.txt
Avoid using thread pooling libraries like Executor, ThreadPoolExecutor, Celluloid, or Parallel streams.
The solution must be written in Kotlin or Java.

Sample urls.txt: https://s3.amazonaws.com/fieldlens-public/urls.txt

The solution must be able to be run from the command line (dont assume JDK is available): java -jar ./jarFileName.jar

Solution:

Description

The current problem requires us to create a multi-threaded implementation of a Website Searcher, limited to 20 threads at any given instance. Furthermore, we are asked to avoid using any thread pooling libraries which make this task trivial.

The solution was designed keeping in mind how thread pools work.
A main driver program derives the WebsiteSearcherRunnable task and submits it to the implementation of a blocking queue.
The blocking queue implemented here, mimics the functionality of a blocking queue and enqueues and de-queues a particular task.
The WebsiteSearcherRunnable task implements the Java Runnable task and in it's run method parses the given URL, extracts the html data and does a rudimentary word search.
Basically a thread pool executor, here implemented as WebsiteSearcherTask de-queues Runnable objects and executes them. This class controls the number of thread which can simultaneously be run. Not more than n thread can be started off at a particular time (here n = 20)

Requirements to run this solution:

Gradle is used as a build system. Install gradle and run a gradle build in the main project structure to compile the code
The jar file is uploaded which can be directly run as java -jar build/libs/website-searcher.jar or can be run with this command after a gradle build
The results are located in results.txt which contain the list of URLS with the search term

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.idea		.idea
build		build
out/production/website-searcher/main		out/production/website-searcher/main
src/main		src/main
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
results.txt		results.txt
website-searcher.iml		website-searcher.iml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem:

Solution:

Description

Requirements to run this solution:

About

Releases

Packages

Languages

Nisarga/website-searcher

Folders and files

Latest commit

History

Repository files navigation

Problem:

Solution:

Description

Requirements to run this solution:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages