This Java program extracts links from a list of URLs, saves the extracted links to files, and removes the processed URLs from the input list.
- Open your terminal or command prompt.
- Navigate to the directory where you want to clone the repository.
- Use the following command to clone the repository:
git clone https://github.com/akumathedyn123/java-url-collector.git
Prerequisites:
- Java 11 or above installed on your system.
- Jsoup library (https://jsoup.org/download) added to your project. You can download the Jsoup jar file and place it in the same directory as your project or use a dependency management tool like Maven or Gradle.
Steps:
- Open a terminal or command prompt and navigate to the project directory.
- Compile the Java source code:
javac main.java
This will create an executable file named main.class
.
- Run the program with the following command:
java main [arguments]
Arguments:
-i|--input <filename>
: Path to the text file containing the list of URLs (default: urls.txt).-o|--output <directory>
: Directory path to save the extracted link files (default: /path/to/text).
Example Usage:
java main -i my_urls.txt -o extracted_links
This will process the URLs in my_urls.txt
and save the extracted links to files in the extracted_links
directory.
- The program reads a list of URLs from a text file.
- It iterates through each URL in the list.
- For each URL, it downloads the HTML content using the
downloadHtml
function. - It then parses the downloaded HTML with Jsoup to extract links from anchor tags (
<a>
) withhref
attributes using theextractLinks
function. - The extracted links are saved to a file named after the processed URL using the
saveLinksToFile
function. - Finally, the processed URL is removed from the input file using the
deleteUrl
function.
This project is licensed under the MIT License (see LICENSE file for details).
We welcome contributions to this project. Feel free to fork the repository and submit pull requests with your improvements.