Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repairs broken changes, Cleaned up Browsing, added switch to choose Selenium or Headless #1534

Conversation

bszollosinagy
Copy link
Contributor

@bszollosinagy bszollosinagy commented Apr 15, 2023

Repairs broken changes, Cleaned up Browsing, added switch to choose Selenium or Headless, and reduced the code duplication between Selenium and Requests based browsing.

Background

AutoGPT received a working implementation to help it browse the web with Selenium. Amongst other reasons, this is good because each page can load with javascript enabled, and show its full content, as required by many websites today.

However, the implementation was a bit hasty, concerning existing features, (such as PR #968), and duplicated much code, thereby introducing the possibility of creating bugs when only one copy of the near identical implementations is changed.

This was the motivation behind making the code more DRY.

Using a headless browser for servers was also raised in PR #1520 and PR #1473, and the switch implemented here allows a slightly simpler headless mode: the original requests based one. Nevertheless headless Selenium might also be a good idea to allow a page to use its javascripts.

Changes

  • Combined summary.py web.py and browse.py because they were duplicating each others efforts in many aspects.

  • added a new config parameter in .env to control which kind of browser the user wants: headless or full Selenium with Chrome

  • restored browse_website() to commands.py

  • PR Implemented Selenium based web browsing. #1397 introduced a working Selenium adapter, but inadvertently clobbered PR Add visited website to memory for recalling content without being limited by the website summary. #968, and replicated most of the stuff in browse.py, but based on an old version, without any merge conflicts. This is now rectified by moving Selenium code into browse.py, and reducing duplication as much as possible.

  • there was a small typo, because an object reference was also returned along with the links in the link scraper.

  • listed the PROs and CONs of each browser in the source code

Documentation

In code comments, and "readable code" serve as documentation, and the messages in the git commits.

Test Plan

This was tested manually by trying both browsers. Now the Selenium part also makes use of the changes in PR #968.

Some unit tests are not passing:
That is because they are closely coupled with 'requests', and probably have not been passing since AutoGPT started using sessions.get instead of requests.get, thereby fouling the Mock.
For now, it is out of the scope of this PR to also correct that.

PR Quality Checklist

  • My pull request is atomic and focuses on a single change.
  • I have thoroughly tested my changes with multiple different prompts.
  • I have considered potential risks and mitigations for my changes.
  • I have documented my changes clearly and comprehensively.
  • I have not snuck in any "extra" small tweaks changes

…ing each others efforts in many aspects.

* added a new config parameter in .env to control which kind of browser the user wants: headless or full Selenium with Chrome
* restored browse_website() to commands.py
* PR Significant-Gravitas#1397 introduced a working Selenium adapter, but inadvertently clobbered PR Significant-Gravitas#968, and replicated most of the stuff in browse.py, but based on an old version, without any merge conflicts. This is now rectified by moving Selenium code into browse.py, and reducing duplication as much as possible.
* there was a small typo, because an object reference was also returned along with the links in the link scraper.
* listed the PROs and CONs of each browser in the source code
@bszollosinagy bszollosinagy mentioned this pull request Apr 15, 2023
1 task
@nponeccop
Copy link
Contributor

@bszollosinagy There are conflicts now

@bszollosinagy
Copy link
Contributor Author

Too many changes have happened due to a large scale source code reorg, but the core issue remains. There is duplication of nearly identical code between the web_requests.py and web_selenium.py. I might re-implement the refactoring for the new state of the source code. Some part of this PR is obselete: the memory storage of webpages from PR #968 is now not clobbered anymore. And that was my main motivation.

Btw, web_requests.py is not used by AutoGPT, maybe someone can deprecate it if the headless Selenium PR is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants