Beagle

A text search-engine over the Stanford CS276 document collection.

Install

Run

pip3 install -e .

to install the package.

Usage

python3 -m beagle

Tests

pytest

Dataset

The collection can be downloaded here: http://web.stanford.edu/class/cs276/pa/pa1-data.zip.

This is a 170MBs corpus organized in 10 folders. Each file contains a web page tokenized contents.

Stop words

The english stop words list we use (saved in stop_words.json) comes from this post : https://gist.github.com/sebleier/554280.

Report

More details can be found in the project report.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
beagle		beagle
doc		doc
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
stop_words.json		stop_words.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beagle

Install

Usage

Tests

Dataset

Stop words

Report

About

Releases

Packages

Contributors 2

Languages

License

juliendoutre/beagle

Folders and files

Latest commit

History

Repository files navigation

Beagle

Install

Usage

Tests

Dataset

Stop words

Report

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages