Skip to content

Commit

Permalink
Add project to PyPI
Browse files Browse the repository at this point in the history
- Add setup and make sortgs available via CLI
- Test CLI from setup.py
- Add CLI commands to test
- Github Action to send to PyPi when merging to master
- Add bumb2version to manage pip versioning
- Use build
- Update readme
  • Loading branch information
WittmannF committed Nov 16, 2023
1 parent e329efe commit 5f1a4ce
Show file tree
Hide file tree
Showing 13 changed files with 228 additions and 2,595 deletions.
6 changes: 6 additions & 0 deletions .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[bumpversion]
current_version = 1.0.1
commit = True
tag = True

[bumpversion:file:setup.py]
42 changes: 42 additions & 0 deletions .github/workflows/deploy-to-pypi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
name: Publish Python Package to PyPI

on:
push:
branches:
- master

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0 # Ensures tags are fetched

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.x' # Use the version appropriate for your project

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build twine bump2version
#- name: Configure Git
# run: |
# git config --global user.name "GitHub Actions"
# git config --global user.email "41898282+github-actions[bot]@users.noreply.github.com"

#- name: Bump version and push tag
# run: |
# bump2version patch # or 'minor' or 'major' depending on the release type
# git push --tags

- name: Build and publish
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
run: |
python -m build
twine upload dist/*
5 changes: 4 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ on:

jobs:
build:

runs-on: ubuntu-latest

strategy:
Expand All @@ -26,5 +25,9 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Install the package
run: pip install -e .
- name: Test with unittest
run: python -m unittest
- name: Test CLI Command
run: sortgs 'machine learning' --nresults 10
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,9 @@ ipython_config.py

# Other variables
.DS_Store
.vscode/
.vscode/

# Ignore egg-info, dist folders and build
*.egg-info
dist
build
198 changes: 93 additions & 105 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,30 @@
# Sort Google Scholar by the Number of Citations V2.0b
This Python code ranks publications data from Google Scholar by the number
of citations. It is useful for finding relevant papers in a specific field.

The data acquired from Google Scholar is Title, Citations, Links and Rank. A new
columns with the number of citations per year is also included.
The example of the code will look for the top 100 papers related to the keyword,
and rank them by the number of citations. This keyword can eiter be included in
the command line terminal (`$python sortgs.py --kw 'my keyword'`) or edited in
the original file.
As output, a .csv file will be returned with the name of the chosen keyword
ranked by the number of citations.

## UPDATES
- Try running the code using Google Colab -> [<img src="https://colab.research.google.com/assets/colab-badge.svg" align="center">](https://colab.research.google.com/github/WittmannF/sort-google-scholar/blob/master/Test_sortgs_py_on_Colab.ipynb)
- No install requirements! Limitations: Can't handle robot checking, so use it carefully.
- Command line arguments. Ex: `$python sortgs.py --kw "deep learning"` (results saved in `deep_learning.csv`)
- Handling robot checking with selenium.
- OBS: You might be asked to manually solve the first captcha for retrieving the content of the pages
# Sort Google Scholar by the Number of Citations
[![PyPI Version](https://img.shields.io/pypi/v/sortgs.svg)](https://pypi.org/project/sortgs/)

sortgs is a Python tool for ranking Google Scholar publications by the number of citations. It is useful for finding relevant papers in a specific field. The data acquired from Google Scholar includes Title, Citations, Links, Rank, and a new column with the number of citations per year. In the background, it first try to fetch results using python requests. If it fails, it will use selenium to fetch the results.

## Try on Google Colab: [<img src="https://colab.research.google.com/assets/colab-badge.svg" align="center">](https://colab.research.google.com/github/WittmannF/sort-google-scholar/blob/add_project_to_pypi/examples/run_sortgs_on_colab.ipynb)
- No install requirements! Limitations: Can't handle robot checking, so use it carefully.

## Installation

You can now install `sortgs` directly using `pip`:

```bash
pip install sortgs
```

This will install the latest version of `sortgs` and its dependencies.

## Usage

Once installed, you can run `sortgs` directly from the command line:

```bash
sortgs "your keyword"
```

Replace `"your keyword"` with any keyword you'd like to search for. A CSV file with the name `your_keyword.csv` will be created in your current directory.

## Misc
If this project was helpful to you in any way, feel free to buy me a cup of coffee :)
Expand All @@ -25,106 +33,95 @@ If this project was helpful to you in any way, feel free to buy me a cup of coff

For a feedback, send me an email: fernando [dot] wittmann [at] gmail [dot] com

### Command Line Arguments

## Usage of `sortgs.py`
```
usage: sortgs.py [-h] [--kw KEYWORD] [--sortby SORTBY] [--nresults NRESULTS]
[--csvpath CSVPATH] [--notsavecsv] [--plotresults]
[--startyear STARTYEAR] [--endyear ENDYEAR]
```bash
usage: sortgs [-h] [--sortby SORTBY] [--nresults NRESULTS] [--csvpath CSVPATH]
[--notsavecsv] [--plotresults] [--startyear STARTYEAR]
[--endyear ENDYEAR] [--debug] kw

Example: $python sortgs.py --kw 'deep learning'
positional arguments:
kw Keyword to be searched. Use double quote followed by
simple quote for an exact keyword.
Example: sortgs "'exact keyword'"

optional arguments:
-h, --help show this help message and exit
--kw KEYWORD Keyword to be searched. Default is 'machine learning'
Use double quote followed by simple quote to search
for an exact keyword. Example: "'exact keyword'"
--sortby SORTBY Column to be sorted by. Default is by the columns
"Citations", i.e., it will be sorted by the number of
citations. If you want to sort by citations per year,
use --sortby "cit/year"
--nresults NRESULTS Number of articles to search on Google Scholar.
Default is 100. (carefull with robot checking if value
is too high)
--csvpath CSVPATH Path to save the exported csv file. By default it is
the current folder
--notsavecsv By default results are going to be exported to a csv
file. Select this option to just print results but not
store them
--plotresults Use this flag in order to plot the results with the
original rank in the x-axis and the number of citaions
in the y-axis. Default is False
--sortby SORTBY Column to be sorted by. Default is "Citations". To sort
by citations per year, use --sortby "cit/year"
--nresults NRESULTS Number of articles to search on Google Scholar. Default
is 100. (careful with robot checking if value is high)
--csvpath CSVPATH Path to save the exported csv file. Default is the
current folder
--notsavecsv By default, results are exported to a csv file. Select
this option to just print results but not store them
--plotresults Use this flag to plot results with the original rank on
the x-axis and the number of citations on the y-axis.
Default is False
--startyear STARTYEAR
Start year when searching. Default is None
--endyear ENDYEAR End year when searching. Default is current year
--debug Debug mode. Used for unit testing. It will get pages
stored on web archive
```
## Example
The following code will search for the top 100 results, rank by number of citations and save as a .csv file (same name of the keyword):
```
$python sortgs.py --kw "machine learning"
```
### Examples
Sorted by number of citations per year (**HIGHLY RECOMMENDED**):
```
$python sortgs.py --kw "machine learning" --sortby "cit/year"
```
1. **Default Search**:
```bash
sortgs "machine learning"
```
This command searches for the top 100 results related to "machine learning" and saves them as a CSV file.
From 2005 to 2015:
```
$python sortgs.py --kw "machine learning" --startyear 2005 --endyear 2015
```
2. **Sort by Citations per Year**:
```bash
sortgs "machine learning" --sortby "cit/year"
```
Search for "machine learning" and sort by the number of citations per year.
Search for an exact keywork:
```
$python sortgs.py --kw "'machine learning'"
```
3. **Specify Date Range**:
```bash
sortgs "machine learning" --startyear 2005 --endyear 2015
```
Search for papers from 2005 to 2015.
Save results under a subfolder called 'examples'
```
$python sortgs.py --kw 'neural networks' --csvpath './examples/'
```
4. **Search for an Exact Keyword**:
```bash
sortgs "'machine learning'"
```
You can also add multiple keywords by fencing them with a single quote:
```
$python sortgs.py --kw '"deep learning" OR "neural networks" OR "machine learning"' --sortby "cit/year"
```
5. **Save Results in a Specific Path**:
```bash
sortgs 'neural networks' --csvpath './examples/'
```
This will save the results under a subfolder called 'examples'.
6. **Multiple Keywords**:
```bash
sortgs '"deep learning" OR "neural networks" OR "machine learning"' --sortby "cit/year"
```
### Output Example
While running, `sortgs` will provide updates in the terminal:
Example of output while running:
```
❯ sortgs "'machine learning'"
Running with the following parameters:
Keyword: 'machine learning', Number of results: 100, Save database: True, Path: /Users/wittmann/sort-google-scholar, Sort by: Citations, Plot results: False, Start year: None, End year: 2023, Debug: False
Loading next 10 results
Robot checking detected, handling with selenium (if installed)
Loading...
Solve captcha manually and press enter here to continue...
year not found, appending 0
Loading next 20 results
Robot checking detected, handling with selenium (if installed)
Loading next 30 results
Robot checking detected, handling with selenium (if installed)
Loading next 40 results
...
```
## Installation
SortGS is not available (yet) on PyPa. The most straight foward way to use it is the following:
## Step-by-Step Installation
1. Install Python 3 and its dependencies from **Requirements** (suggestion: use Ananconda https://www.anaconda.com/distribution/)
2. Download the repository. Two ways to do this:
- Use the command `git clone https://github.com/WittmannF/sort-google-scholar.git` in your terminal (if linux/MAC) or CMD (if windows)
- Or download using this link: https://github.com/WittmannF/sort-google-scholar/archive/master.zip and unzip
3. Open the folder of sortgs on your terminal (if linux/MAC) or CMD (if windows)
4. Use the command `python sortgs.py --kw "your keyword"` (replace "your keyword" to any keyword that you'd like to search)
5. A CSV file with the name `your_keyword.csv` should be created.
2. In the terminal (or cmd if using Windows), run `pip install sortgs`
3. Use the command `sortgs "your keyword"` (replace "your keyword" to any keyword that you'd like to search)
4. A CSV file with the name `your_keyword.csv` should be created.
If those steps are too complicated for you, send me an email with a list of keyworks that you'd like them ranked to: fernando [dot] wittmann [at] gmail [dot] com
## Requirements
If you install anaconda, all of those requirements (except selenium) are going to be met:
- Python 2.7 or Python 3
- Install from the requirements file: `pip install -r requirements.txt`

Highly suggested, if having problems with robot checking:
- ChromeDriver: http://chromedriver.chromium.org/
- After downloading chromedriver, rename it to `chromedriver` and add it in a folder accessible by the PATH (Example: your python directory. Mine is at `/Users/.../anaconda/bin/`)

## Running Project Using Docker
This guide will walk you through the process of installing Docker, pulling the `fernandowittmann/sort-google-scholar` Docker image, and running the project.
Expand Down Expand Up @@ -165,20 +162,11 @@ This guide will walk you through the process of installing Docker, pulling the `
## Contributing
In order to make contributions, all of the tests must be passed. In order to test the code, we will be using the DEBUG mode which is going to use a URL from web archive. Please make sure to save the URL you want to test on web archive in case it is different from the one I already saved. By default it only works in debug mode when using the keywords 'machine learning'. There are 6 tests and all of them are testing different aspects that should match when using SortGS. In order to run the test cases, just run:
Just run:
```
$python -m unittest
```
And check if all tests passes. Alternativelly send a PR, github actions will run the tests for you.
## LICENSE
- MIT


### Citation
This code was originally developed for my [MS Dissertation](http://repositorio.unicamp.br/jspui/handle/REPOSIP/330610). For referencing this tool, you can use the following:

```
WITTMANN, Fernando Marcos. Optimization applied to residential non-intrusive load monitoring. 2017.
Dissertation (Masters) - University of Campinas, School of Electrical and Computer Engineering, Campinas, SP.
Available in: <http://www.repositorio.unicamp.br/handle/REPOSIP/330610>.
```
Loading

0 comments on commit 5f1a4ce

Please sign in to comment.