diff --git a/examples/Sort_Google_Scholar_No_Code_Version.ipynb b/examples/Sort_Google_Scholar_No_Code_Version.ipynb new file mode 100644 index 0000000..c098247 --- /dev/null +++ b/examples/Sort_Google_Scholar_No_Code_Version.ipynb @@ -0,0 +1,929 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "M36VFanosbkb" + }, + "source": [ + "# Sort Google Scholar - No Code Version\n", + "\n", + "\n", + "## 1. Type or Paste your search query bellow (including special Google Scholar rules like AND/OR or 'exact keyword')\n", + "For more keyword details, please refer to [this reference](https://guides.library.ucsc.edu/c.php?g=745384&p=5361954).\n", + "\n" + ] + }, + { + "cell_type": "code", + "source": [ + "search_query = \"large language models\" # @param {type:\"string\"}" + ], + "metadata": { + "cellView": "form", + "id": "xlpCibrIV4Nk" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "\n", + "Some examples:\n", + "\n", + "- `Large Language Models` → General search\n", + "- `\"Large Language Models\"` → Exact phrase search\n", + "- `Large Language Models -transformer` → Exclude specific term\n", + "- `Large Language Models author:\"Geoffrey Hinton\"` → Search by author\n", + "- `Large Language Models source:Nature` → Search within a specific publication\n", + "- `(\"Large Language Models\" OR \"Transformer Models\") AND (GPT OR BERT)` → Boolean search\n", + "- `intitle:\"Large Language Models\"` → Search in the title only\n" + ], + "metadata": { + "id": "fawjia86vL63" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Optional Parameters" + ], + "metadata": { + "id": "MnCQ9KOmYD3z" + } + }, + { + "cell_type": "code", + "source": [ + "# Expanded form with extra parameters\n", + "sortby = \"cit/year\" # @param [\"Citations\", \"cit/year\"] {type:\"string\"}\n", + "nresults = 100 # @param {type:\"number\"}\n", + "startyear = None # @param {type:\"string\"}\n", + "endyear = None # @param {type:\"string\"}\n", + "\n", + "# Constructing the base command\n", + "cmd = f\"sortgs '{search_query}' --sortby '{sortby}' --nresults {nresults}\"\n", + "\n", + "if startyear:\n", + " cmd += f\" --startyear {startyear}\"\n", + "\n", + "if endyear:\n", + " cmd += f\" --endyear {endyear}\"\n", + "\n", + "\n" + ], + "metadata": { + "cellView": "form", + "id": "VbVaoz3wYGQY" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# 2. Next, click in Runtime > Run All" + ], + "metadata": { + "id": "HOM1xu6daIGD" + } + }, + { + "cell_type": "code", + "metadata": { + "id": "oPot8aWcsfei", + "cellView": "form" + }, + "source": [ + "# @title\n", + "!pip install sortgs --quiet" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# @title\n", + "!{cmd}" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "cellView": "form", + "id": "qgfQT7i2XrPf", + "outputId": "a3ea8b37-154f-420e-95df-72471e0a53c0" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Running with the following parameters:\n", + "Keyword: large language models, Number of results: 100, Save database: True, Path: /content, Sort by: cit/year, Plot results: False, Start year: None, End year: 2024, Debug: False\n", + "Loading next 10 results\n", + "Loading next 20 results\n", + "Loading next 30 results\n", + "Loading next 40 results\n", + "Loading next 50 results\n", + "Loading next 60 results\n", + "Loading next 70 results\n", + "Loading next 80 results\n", + "Loading next 90 results\n", + "Loading next 100 results\n", + " Author ... cit/year\n", + "Rank ... \n", + "57 Wei, X Wang, D Schuurmans… ... 2406\n", + "80 Hu, Y Shen, P Wallis, Z Alle ... 1652\n", + "1 Kasneci, K Seßler, S Küchemann, M Bannert… ... 1390\n", + "78 Yao, D Yu, J Zhao, I Shafran… ... 1304\n", + "3 Chang, X Wang, J Wang, Y Wu, L Yang… ... 1198\n", + "... ... ... ...\n", + "84 Maatouk, N Piovesan, F Ayed… ... 29\n", + "45 Li, L Xia, J Tang, Y Xu, L Shi, L Xia, D Yin… ... 23\n", + "82 Cheng, S Huang, F Wei ... 22\n", + "50 Zhu, Q Zhao, H Chen, J Wang, X Xie ... 14\n", + "73 Ren, J Tang, D Yin, N Chawla, C Huang ... 6\n", + "\n", + "[100 rows x 8 columns]\n", + "Results saved to /content/large_language_models.csv\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "s_nuxpy_s_9c" + }, + "source": [ + "> _**NOTE:** It is normal to get some warnings, for example year not found or author not found. However, if you get the robot checking warning, then it might not work anymore in the IP that you have on Google Colab. You can try going in 'Runtime' > 'Disconnect and delete runtime' to get a new IP. If the problem persists, then you will have to run locally using selenium and solve the captchas manually. Make sure to avoid running this code too often to avoid the robot checking problem._" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QQIb9oYou9GM" + }, + "source": [ + "# 3. Download the results" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "09OFwDdM2K5h" + }, + "source": [ + "\n", + "To download the `.csv` file, click the **folder icon** on the left to open the **Files** panel, locate the file with the same name as your search keyword, click the **three dots** next to the file, and select **Download** from the options menu.\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "You can also visualize the top results here:" + ], + "metadata": { + "id": "LsZmVT5rd7-8" + } + }, + { + "cell_type": "code", + "metadata": { + "id": "pM_Bb4MH14eI", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 2951 + }, + "outputId": "8cb7847d-28b4-47cd-ad16-21f20e72b745", + "cellView": "form" + }, + "source": [ + "# @title\n", + "import pandas as pd\n", + "results = pd.read_csv(search_query.replace(' ', '_')+'.csv')\n", + "results" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Rank Author \\\n", + "0 57 Wei, X Wang, D Schuurmans… \n", + "1 80 Hu, Y Shen, P Wallis, Z Alle \n", + "2 1 Kasneci, K Seßler, S Küchemann, M Bannert… \n", + "3 78 Yao, D Yu, J Zhao, I Shafran… \n", + "4 3 Chang, X Wang, J Wang, Y Wu, L Yang… \n", + ".. ... ... \n", + "95 84 Maatouk, N Piovesan, F Ayed… \n", + "96 45 Li, L Xia, J Tang, Y Xu, L Shi, L Xia, D Yin… \n", + "97 82 Cheng, S Huang, F Wei \n", + "98 50 Zhu, Q Zhao, H Chen, J Wang, X Xie \n", + "99 73 Ren, J Tang, D Yin, N Chawla, C Huang \n", + "\n", + " Title Citations Year \\\n", + "0 Chain-of-thought prompting elicits reasoning i... 7219 2022 \n", + "1 Lora: Low-rank adaptation of large language mo... 6608 2021 \n", + "2 ChatGPT for good? On opportunities and challen... 2779 2023 \n", + "3 Tree of thoughts: Deliberate problem solving w... 1304 2024 \n", + "4 A survey on evaluation of large language models 1198 2024 \n", + ".. ... ... ... \n", + "95 Large language models for telecom: Forthcoming... 29 2024 \n", + "96 Urbangpt: Spatio-temporal large language models 23 2024 \n", + "97 Adapting large language models via reading com... 44 2023 \n", + "98 Promptbench: A unified library for evaluation ... 14 2024 \n", + "99 A survey of large language models for graphs 6 2024 \n", + "\n", + " Publisher Venue \\\n", + "0 proceedings.neurips.cc Advances in neural … \n", + "1 arxiv.org arXiv preprint arXiv … \n", + "2 Elsevier Learning and individual … \n", + "3 proceedings.neurips.cc Advances in … \n", + "4 dl.acm.org ACM Transactions on … \n", + ".. ... ... \n", + "95 ieeexplore.ieee.org IEEE … \n", + "96 dl.acm.org Proceedings of the 30th … \n", + "97 openreview.net The Twelfth International Conference on … \n", + "98 jmlr.org Journal of Machine Learning … \n", + "99 dl.acm.org Proceedings of the 30th … \n", + "\n", + " Source cit/year \n", + "0 https://proceedings.neurips.cc/paper_files/pap... 2406 \n", + "1 https://arxiv.org/abs/2106.09685 1652 \n", + "2 https://www.sciencedirect.com/science/article/... 1390 \n", + "3 https://proceedings.neurips.cc/paper_files/pap... 1304 \n", + "4 https://dl.acm.org/doi/abs/10.1145/3641289 1198 \n", + ".. ... ... \n", + "95 https://ieeexplore.ieee.org/abstract/document/... 29 \n", + "96 https://dl.acm.org/doi/abs/10.1145/3637528.367... 23 \n", + "97 https://openreview.net/forum?id=y886UXPEZ0 22 \n", + "98 https://www.jmlr.org/papers/v25/24-0023.html 14 \n", + "99 https://dl.acm.org/doi/abs/10.1145/3637528.367... 6 \n", + "\n", + "[100 rows x 9 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
RankAuthorTitleCitationsYearPublisherVenueSourcecit/year
057Wei, X Wang, D Schuurmans…Chain-of-thought prompting elicits reasoning i...72192022proceedings.neurips.ccAdvances in neural …https://proceedings.neurips.cc/paper_files/pap...2406
180Hu, Y Shen, P Wallis, Z AlleLora: Low-rank adaptation of large language mo...66082021arxiv.orgarXiv preprint arXiv …https://arxiv.org/abs/2106.096851652
21Kasneci, K Seßler, S Küchemann, M Bannert…ChatGPT for good? On opportunities and challen...27792023ElsevierLearning and individual …https://www.sciencedirect.com/science/article/...1390
378Yao, D Yu, J Zhao, I Shafran…Tree of thoughts: Deliberate problem solving w...13042024proceedings.neurips.ccAdvances in …https://proceedings.neurips.cc/paper_files/pap...1304
43Chang, X Wang, J Wang, Y Wu, L Yang…A survey on evaluation of large language models11982024dl.acm.orgACM Transactions on …https://dl.acm.org/doi/abs/10.1145/36412891198
..............................
9584Maatouk, N Piovesan, F Ayed…Large language models for telecom: Forthcoming...292024ieeexplore.ieee.orgIEEE …https://ieeexplore.ieee.org/abstract/document/...29
9645Li, L Xia, J Tang, Y Xu, L Shi, L Xia, D Yin…Urbangpt: Spatio-temporal large language models232024dl.acm.orgProceedings of the 30th …https://dl.acm.org/doi/abs/10.1145/3637528.367...23
9782Cheng, S Huang, F WeiAdapting large language models via reading com...442023openreview.netThe Twelfth International Conference on …https://openreview.net/forum?id=y886UXPEZ022
9850Zhu, Q Zhao, H Chen, J Wang, X XiePromptbench: A unified library for evaluation ...142024jmlr.orgJournal of Machine Learning …https://www.jmlr.org/papers/v25/24-0023.html14
9973Ren, J Tang, D Yin, N Chawla, C HuangA survey of large language models for graphs62024dl.acm.orgProceedings of the 30th …https://dl.acm.org/doi/abs/10.1145/3637528.367...6
\n", + "

100 rows × 9 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "results", + "summary": "{\n \"name\": \"results\",\n \"rows\": 100,\n \"fields\": [\n {\n \"column\": \"Rank\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 29,\n \"min\": 1,\n \"max\": 100,\n \"num_unique_values\": 100,\n \"samples\": [\n 62,\n 47,\n 51\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Author\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 100,\n \"samples\": [\n \"Yu, D Iter, S Wang, Y Xu, M Ju, S Sanyal\\u2026\",\n \"Hou, Y Zhao, Y Liu, Z Yang, K Wang, L Li\\u2026\",\n \"Ab\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Title\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 99,\n \"samples\": [\n \"Tidybot: Personalized robot assistance with large language models\",\n \"Exploring the potential of large language models (llms) in learning on graphs\",\n \"Urbangpt: Spatio-temporal large language models\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Citations\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1078,\n \"min\": 6,\n \"max\": 7219,\n \"num_unique_values\": 92,\n \"samples\": [\n 350,\n 310,\n 247\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Year\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 2007,\n \"max\": 2024,\n \"num_unique_values\": 6,\n \"samples\": [\n 2022,\n 2021,\n 2019\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Publisher\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 21,\n \"samples\": [\n \" proceedings.neurips.cc\",\n \" openreview.net\",\n \" thelancet.com\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Venue\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 56,\n \"samples\": [\n \" Advances in neural\\u00a0\\u2026\",\n \" Nature medicine\",\n \" Authorea\\u00a0\\u2026\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Source\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 100,\n \"samples\": [\n \"https://arxiv.org/abs/2209.10063\",\n \"https://dl.acm.org/doi/abs/10.1145/3695988\",\n \"https://mededu.jmir.org/2023/1/e48291/\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"cit/year\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 374,\n \"min\": 6,\n \"max\": 2406,\n \"num_unique_values\": 85,\n \"samples\": [\n 42,\n 2406,\n 73\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 20 + } + ] + }, + { + "source": [ + "# @title Rank vs Citations\n", + "\n", + "import plotly.express as px\n", + "view = results.reset_index().copy()\n", + "\n", + "# Function to truncate and add line breaks to long titles\n", + "def shorten_title(title, max_length=60):\n", + " words = title.split()\n", + " shortened_lines = []\n", + " current_line = []\n", + "\n", + " # Add words to the current line until max_length is exceeded\n", + " for word in words:\n", + " if len(' '.join(current_line + [word])) <= max_length:\n", + " current_line.append(word)\n", + " else:\n", + " shortened_lines.append(' '.join(current_line))\n", + " current_line = [word]\n", + "\n", + " # Add the last line\n", + " if current_line:\n", + " shortened_lines.append(' '.join(current_line))\n", + "\n", + " return '
'.join(shortened_lines)\n", + "\n", + "\n", + "# Apply this function to the 'Title' column and create a new column for the shortened titles\n", + "view['Short_Title'] = view['Title'].apply(shorten_title)\n", + "\n", + "# Now use 'Short_Title' for hover_name\n", + "fig = px.scatter(view,\n", + " x='Rank',\n", + " y='Citations',\n", + " title='Number of Citations vs Google Scholar Rank',\n", + " hover_name='Short_Title',\n", + " hover_data=['Rank', 'Author', 'Citations', 'Year', 'Publisher', 'Venue', 'cit/year']\n", + ")\n", + "fig.show()" + ], + "cell_type": "code", + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/html": [ + "\n", + "\n", + "\n", + "
\n", + "
\n", + "\n", + "" + ] + }, + "metadata": {} + } + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 542 + }, + "id": "GrkKI27neO8g", + "outputId": "85d20f7a-2e1b-4214-9d61-150b5e0a68c4", + "cellView": "form" + } + }, + { + "cell_type": "code", + "source": [ + "# @title\n", + "# Now use 'Short_Title' for hover_name\n", + "fig = px.scatter(view,\n", + " x='Rank',\n", + " y='cit/year',\n", + " title='Number of Citations/Year vs Google Scholar Rank',\n", + " hover_name='Short_Title',\n", + " hover_data=['Rank', 'Author', 'Citations', 'Year', 'Publisher', 'Venue', 'cit/year']\n", + ")\n", + "fig.show()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 542 + }, + "id": "w5uEKLUpb7rh", + "outputId": "b96ee593-5188-4f93-ff2c-f71802803b04", + "cellView": "form" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/html": [ + "\n", + "\n", + "\n", + "
\n", + "
\n", + "\n", + "" + ] + }, + "metadata": {} + } + ] + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "ro9NutXdhYoL" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "oh0fK7o5b7oq" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "_4DLe4Usb7mF" + }, + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file