-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate Public Search Engine Comparison #9
Incorporate Public Search Engine Comparison #9
Conversation
… better rounding, sorted json files
…pt to the repo+ Review csv function+ Sort files by number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Small typos
Failing pipeline will be fixed with ai-cfia/github-workflows#110 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will need to be rebased as it conflicts with files on the main branch (and also, we'd want to see that all checks passes. Right now, no checks are running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a lot of function that I feels could be generalize so other AI-Lab project can beneficiate from them. It could be cool to start a folder of common function that can be used for more than 1 project.
@RussellJimmies, do you have any more request changes ? |
@RussellJimmies your change request is blocking my merge |
Bing Search API
The Bing search is interesting as it is a search engine that stands out, and because we simply wanted to compare it to another popular search engine, using Bing search is an option.
Cons
However, after examining the pricing, I discovered that only 1,000 transactions are free per month. This equals approximately 33 free requests per day. 2 requests are needed for a single testing (QnA) file.
Tasks
Closes
closes #6
closes #11
closes #10
Alternative considered
Google API
Since Google Search Api limits results by 10 at a time and we have at least 20 files, each needing 100 results for testing, we will need at least 200 requests to obtain all the answers. Google only offers 100 free requests by day. After $5 are charged per 1000 requests, around $1 per test.
Library tested
google-api-python-client
Issues consulted
Why Does The Google Search API Disallow More Than 100 Results? How Can I Get More?
Google web scrapping
Web scraping has been attempted as it allows for querying completely free of charge. However, Google has incorporated stringent security measures to limit the number of requests. Since Google displays only 10 results at a time and we have at least 20 files, each needing 100 results for testing, we will need at least 200 Google requests to obtain all the answers. Even with time delays, the 200 requests never succeed, leading to the machine's IP address being blocked for a while. Therefore, we must wait for a period longer than 30 minutes or use proxies or VPNs to work around the issue. Today, web scraping is complex and only feasible on a small scale. If we want to do it on a large scale, we need to use several VPNs and switch between them to make it undetectable.
Libraries tested
abenassi Google-Search-API
Nv7-GitHub googlesearch
Issues consulted
Github issue.
How to fix python requests module 429 error for google search?
Error 429 with simple query on google with requests python
Problem encountered