Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate Public Search Engine Comparison #9

Merged
merged 16 commits into from
Apr 11, 2024

Conversation

ibrahim-kabir
Copy link
Collaborator

@ibrahim-kabir ibrahim-kabir commented Mar 22, 2024

Bing Search API

The Bing search is interesting as it is a search engine that stands out, and because we simply wanted to compare it to another popular search engine, using Bing search is an option.

Cons

However, after examining the pricing, I discovered that only 1,000 transactions are free per month. This equals approximately 33 free requests per day. 2 requests are needed for a single testing (QnA) file.

Tasks

  • Refactored snippet of code
  • Bing Global Web Search integration.
  • Bing Search by filtering the results only for the website inspection.canada.ca.
  • Reviewed calculaton of accuracy score it to consider multiple URLs for the same query.
  • Development of a script to convert an excel into JSON files.
  • Inclusion of a new data table for the highest and null scores.
  • Fixed time response variations

Closes

closes #6
closes #11
closes #10

Alternative considered

Google API

Since Google Search Api limits results by 10 at a time and we have at least 20 files, each needing 100 results for testing, we will need at least 200 requests to obtain all the answers. Google only offers 100 free requests by day. After $5 are charged per 1000 requests, around $1 per test.

num integer: Number of search results to return. Valid values are integers between 1 and 10, inclusive.

Documentation

Library tested

google-api-python-client

Issues consulted

Why Does The Google Search API Disallow More Than 100 Results? How Can I Get More?

Google web scrapping

Web scraping has been attempted as it allows for querying completely free of charge. However, Google has incorporated stringent security measures to limit the number of requests. Since Google displays only 10 results at a time and we have at least 20 files, each needing 100 results for testing, we will need at least 200 Google requests to obtain all the answers. Even with time delays, the 200 requests never succeed, leading to the machine's IP address being blocked for a while. Therefore, we must wait for a period longer than 30 minutes or use proxies or VPNs to work around the issue. Today, web scraping is complex and only feasible on a small scale. If we want to do it on a large scale, we need to use several VPNs and switch between them to make it undetectable.

Libraries tested

abenassi Google-Search-API
Nv7-GitHub googlesearch

Issues consulted

Github issue.
How to fix python requests module 429 error for google search?
Error 429 with simple query on google with requests python

Problem encountered

image

@ibrahim-kabir ibrahim-kabir self-assigned this Mar 22, 2024
@ibrahim-kabir ibrahim-kabir linked an issue Mar 22, 2024 that may be closed by this pull request
4 tasks
@ibrahim-kabir ibrahim-kabir changed the title Incorporate Google Comparison Incorporate Public Search Engine Comparison Mar 25, 2024
@ibrahim-kabir ibrahim-kabir requested a review from a team March 27, 2024 21:11
Copy link

@RussellJimmies RussellJimmies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Small typos

finesse/FINESSE_USAGE.md Outdated Show resolved Hide resolved
tests/test_bing_search.py Outdated Show resolved Hide resolved
@ibrahim-kabir ibrahim-kabir marked this pull request as ready for review March 28, 2024 20:05
@ibrahim-kabir ibrahim-kabir requested a review from rngadam March 28, 2024 20:05
@SonOfLope
Copy link
Collaborator

Failing pipeline will be fixed with ai-cfia/github-workflows#110

Copy link

@rngadam rngadam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will need to be rebased as it conflicts with files on the main branch (and also, we'd want to see that all checks passes. Right now, no checks are running.

finesse/scripts/xlsx_converter_json.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@MaxenceGui MaxenceGui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a lot of function that I feels could be generalize so other AI-Lab project can beneficiate from them. It could be cool to start a folder of common function that can be used for more than 1 project.

finesse/bing_search.py Show resolved Hide resolved
finesse/finesse_test.py Show resolved Hide resolved
finesse/finesse_test.py Show resolved Hide resolved
@ibrahim-kabir
Copy link
Collaborator Author

@RussellJimmies, do you have any more request changes ?

@ibrahim-kabir
Copy link
Collaborator Author

@RussellJimmies your change request is blocking my merge

@ibrahim-kabir ibrahim-kabir merged commit 24965cb into main Apr 11, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants