Skip to content

Python scraper to collect apartment listing data from Aruodas.lt. Based on BeautifulSoup4. Installable as a python package

License

Notifications You must be signed in to change notification settings

valdas-v1/scrape_aruodas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aruodas.lt Real Estate Listing Scraper

Overview

The Aruodas.lt Scraper was a robust tool designed for efficiently extracting apartment listing data from Aruodas.lt. Initially tailored for data analysts and real estate professionals seeking comprehensive market insights, this scraper served as an efficient tool for data extraction.

Important Update

Since the development of this project, Aruodas.lt has significantly enhanced its anti-scraping measures. They have implemented advanced security features like Cloudflare protection and mandatory JavaScript execution. As a result, this version of the scraper is no longer functional. However, the HTML extraction and parsing logic remain largely unchanged, making this project a valuable example of how web scraping was previously accomplished.

For historical data and insights into Aruodas.lt's listings, visit my Lithuanian Real Estate Listings Repository.

Features

  • Apartment Listings: Extracted key details such as price, build year, and number of rooms.
  • Bot Detection Evasion: Previously utilized techniques to avoid detection by website security.
  • Data Management: Seamlessly integrated with Pandas DataFrame for efficient data handling.

Technologies

Built on a foundation of open-source technologies, the Aruodas.lt Scraper leveraged:

  • Python: Core programming language.
  • Pandas: Data analysis and manipulation tool.
  • Beautiful Soup: For efficient HTML parsing.
  • Pickle: Python object serialization.

Getting Started

Prerequisites

  • Python 3

Installation Options

As a Python Module:

  1. Setup Virtual Environment:
    python -m venv venv
  2. Activate Virtual Environment:
    venv\Scripts\activate.bat
  3. Install with pip:
    pip install git+https://github.com/valdas-v1/scrape_aruodas

As a Standalone Script:

  1. Clone Repository:
    git clone https://github.com/valdas-v1/scrape_aruodas
  2. Navigate to Directory:
    cd scrape_aruodas
  3. Setup Virtual Environment:
    python -m venv venv
  4. Activate Virtual Environment:
    venv\Scripts\activate.bat
  5. Install Requirements:
    pip install -r requirements.txt

Contact for Current Data

If you are interested in current listing data, I have been able to circumvent Aruodas.lt's security measures and continue to scrape their website. Feel free to contact me on LinkedIn. We can discuss potential collaborations or solutions.

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).

About

Python scraper to collect apartment listing data from Aruodas.lt. Based on BeautifulSoup4. Installable as a python package

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages