Implementation of web searcher based on requirements specified on page below
Application flows as follow (Thread Pool related calls are omitted for brevity)
Main -> load file with links
-> extract Urls
-> create callback object (PageContentSearcher class)
-> creates Controller with injected callback
Controller -> create tasks and queue in thread pool
-> uses HttpConnectionManager to create URL connection for generated links (HTTP protocol)
-> uses UrlReader to load contents of the link
-> uses callback (PageContentSearcher class) to search loaded contents for provided search terms
-> awaits for all tasks to be completed
-> uses ReportWriter to create and write results to results file (see VII for file format )
-> exits the program
(See item III. in case if java is not installed on this machine )
- Java 8
- Maven 3.3+
mvn clean package
cd target
java -Xmx1g -jar website-searcher-1.0-SNAPSHOT.jar app.WebContentsSearcher
cd target
java -cp test-classes;lib/takari-cpsuite-1.2.7.jar;lib/hamcrest-core-1.3.jar;lib/junit-4.12.jar;website-searcher-1.0-SNAPSHOT.jar org.junit.runner.JUnitCore RunAllTests
Distribution of Java 8 must be located on PC Note: I was unable to add JDK distribution to the project since github rejects any files > 100 mb
Execute batch scripts
EX: build [Full path to JDK distribution location on PC]
EX: run [Full path to JDK distribution location on PC]
**Note: If [Full path to JDK distribution location on PC] contains spaces wrap path in double quotes
All parameters used in app.WebContentsSearcher configured in the file website-searcher\src\main\resources\
1) To control log level
2) To specify a link to load Urls for processing
3) To specify a thread pool size
4) To specify a thread pool queue's capacity
5) Sleep time of worker thread
6) Time to wait for thread pool termination
7) Terms to be used to search on downloaded pages,document,Region,scrollbar
All matching results will be saved in the file: results_{timestamp}.txt
Ex: results_2018-09-18T10-15-30.150.txt
Links that failed to load will have details of exception
Log file will be generated and saved in logs folder in the file websearcher_log_{timestamp}.txt with current timestamp
Ex: websearcher_log_2018-09-18T10-15-30.150.txt
Refer to directory ./diagrams to see various parts of an application flow