GitHub - akumathedyn123/java-url-collector: This Java program extracts links from a list of URLs, saves the extracted links to files, and removes the processed URLs from the input list.

ExtractLinks - A Java Web Crawler

This Java program extracts links from a list of URLs, saves the extracted links to files, and removes the processed URLs from the input list.

How to Clone

Open your terminal or command prompt.
Navigate to the directory where you want to clone the repository.
Use the following command to clone the repository:

git clone https://github.com/akumathedyn123/java-url-collector.git

How to Run

Prerequisites:

Java 11 or above installed on your system.
Jsoup library (https://jsoup.org/download) added to your project. You can download the Jsoup jar file and place it in the same directory as your project or use a dependency management tool like Maven or Gradle.

Steps:

Open a terminal or command prompt and navigate to the project directory.
Compile the Java source code:

javac main.java

This will create an executable file named main.class.

Run the program with the following command:

java main [arguments]

Arguments:

-i|--input <filename>: Path to the text file containing the list of URLs (default: urls.txt).
-o|--output <directory>: Directory path to save the extracted link files (default: /path/to/text).

Example Usage:

java main -i my_urls.txt -o extracted_links

This will process the URLs in my_urls.txt and save the extracted links to files in the extracted_links directory.

How it Works

The program reads a list of URLs from a text file.
It iterates through each URL in the list.
For each URL, it downloads the HTML content using the downloadHtml function.
It then parses the downloaded HTML with Jsoup to extract links from anchor tags (<a>) with href attributes using the extractLinks function.
The extracted links are saved to a file named after the processed URL using the saveLinksToFile function.
Finally, the processed URL is removed from the input file using the deleteUrl function.

License

This project is licensed under the MIT License (see LICENSE file for details).

Contribution

We welcome contributions to this project. Feel free to fork the repository and submit pull requests with your improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
main.java		main.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ExtractLinks - A Java Web Crawler

How to Clone

How to Run

How it Works

License

Contribution

About

Releases

Packages

Languages

License

akumathedyn123/java-url-collector

Folders and files

Latest commit

History

Repository files navigation

ExtractLinks - A Java Web Crawler

How to Clone

How to Run

How it Works

License

Contribution

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages