Django Scrapy API Project

This project provides a comprehensive solution for web scraping and data extraction. It combines the power of Scrapy and Django to efficiently scrape data from the ACM and IEEE digital libraries and expose it through a user-friendly API. The Scrapy spiders extract relevant information such as titles, links, abstracts, citation counts, and author details, while the Django project provides a structured way to access and utilize this data through well-defined API endpoints.

Scrapy Project: ACM & IEEE Spider

This project contains two Scrapy spiders that scrape research papers, titles, authors, abstracts, and citation counts from the IEEE Xplore and ACM Digital Library websites.

Project Description

The ACM & IEEE Spiders are designed to scrape research paper metadata from their respective digital libraries. Both spiders handle dynamic content using scrapy-splash to ensure proper loading of JavaScript-rendered pages.

Spider Capabilities:

Scrapes research paper titles, links, abstracts, citation counts, and author details.
Handles JavaScript-rendered content using Splash.

Installation

Clone the Repository

git clone https://github.com/yourusername/your-repo.git
cd your-repo

Install the Dependencies Install Scrapy and the necessary libraries:
```
pip install scrapy scrapy-splash
```
Install Splash Install Docker from here and run Splash to handle JavaScript rendering:
```
docker run -p 8050:8050 scrapinghub/splash
```
Ensure Splash is Running Ensure Splash is running in Docker:
```
docker run -p 8050:8050 scrapinghub/splash
```

Spiders Overview

`ieee_spider`

Description: The IEEE spider scrapes research papers from IEEE Xplore. It extracts information like the paper title, link, abstract, citation count, and authors.

Command:

scrapy crawl ieee_spider -a search_term="Your Search Term" -o results.json

`acm_spider`

Description: The ACM spider scrapes research papers from ACM Digital Library. It extracts information like the paper title, link, abstract, citation count, and authors.

Command:

scrapy crawl acm_spider -a search_term="Your Search Term" -o results.json

Example Output

After running the spiders, the output will be stored in a results.json. Here’s an example structure:

{
  "title": "Example Research Paper Title",
  "link": "https://example.com",
  "details":"Example details of publication",
  "abstract": "This is a sample abstract.",
  "authors": ["Author One", "Author Two"],
  "citation_count": 120
}

Django Project: A Bridge to Scraped Data

The Django project acts as a gateway between users and the Scrapy spiders. It handles requests, interacts with the spiders, and processes the extracted data, providing a structured interface for accessing scraped information from ACM and IEEE.

Project Setup

1. Install Required Libraries

pip install django djangorestframework

2. Start the Django Development Server

python manage.py runserver

3. Access the API

The Django API is now running at http://localhost:8000/. You can use a web browser or an API testing tool to interact with the API endpoints.

API Endpoint: `/api/hello`

The current API endpoint is /api/hello. When accessed, it returns a JSON response containing the following message:

{
  "message": "This is a test endpoint for Scraper API."
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Django Scrapy API Project

Scrapy Project: ACM & IEEE Spider

Project Description

Spider Capabilities:

Installation

Spiders Overview

`ieee_spider`

`acm_spider`

Example Output

Django Project: A Bridge to Scraped Data

Project Setup

1. Install Required Libraries

2. Start the Django Development Server

3. Access the API

API Endpoint: `/api/hello`

Files

README.md

Latest commit

History

README.md

File metadata and controls

Django Scrapy API Project

Scrapy Project: ACM & IEEE Spider

Project Description

Spider Capabilities:

Installation

Spiders Overview

ieee_spider

acm_spider

Example Output

Django Project: A Bridge to Scraped Data

Project Setup

1. Install Required Libraries

2. Start the Django Development Server

3. Access the API

API Endpoint: /api/hello

`ieee_spider`

`acm_spider`

API Endpoint: `/api/hello`