Repairs broken changes, Cleaned up Browsing, added switch to choose Selenium or Headless #1534

bszollosinagy · 2023-04-15T10:39:51Z

Repairs broken changes, Cleaned up Browsing, added switch to choose Selenium or Headless, and reduced the code duplication between Selenium and Requests based browsing.

Background

AutoGPT received a working implementation to help it browse the web with Selenium. Amongst other reasons, this is good because each page can load with javascript enabled, and show its full content, as required by many websites today.

However, the implementation was a bit hasty, concerning existing features, (such as PR #968), and duplicated much code, thereby introducing the possibility of creating bugs when only one copy of the near identical implementations is changed.

This was the motivation behind making the code more DRY.

Using a headless browser for servers was also raised in PR #1520 and PR #1473, and the switch implemented here allows a slightly simpler headless mode: the original requests based one. Nevertheless headless Selenium might also be a good idea to allow a page to use its javascripts.

Changes

Combined summary.py web.py and browse.py because they were duplicating each others efforts in many aspects.
added a new config parameter in .env to control which kind of browser the user wants: headless or full Selenium with Chrome
restored browse_website() to commands.py
PR Implemented Selenium based web browsing. #1397 introduced a working Selenium adapter, but inadvertently clobbered PR Add visited website to memory for recalling content without being limited by the website summary. #968, and replicated most of the stuff in browse.py, but based on an old version, without any merge conflicts. This is now rectified by moving Selenium code into browse.py, and reducing duplication as much as possible.
there was a small typo, because an object reference was also returned along with the links in the link scraper.
listed the PROs and CONs of each browser in the source code

Documentation

In code comments, and "readable code" serve as documentation, and the messages in the git commits.

Test Plan

This was tested manually by trying both browsers. Now the Selenium part also makes use of the changes in PR #968.

Some unit tests are not passing:
That is because they are closely coupled with 'requests', and probably have not been passing since AutoGPT started using sessions.get instead of requests.get, thereby fouling the Mock.
For now, it is out of the scope of this PR to also correct that.

PR Quality Checklist

My pull request is atomic and focuses on a single change.
I have thoroughly tested my changes with multiple different prompts.
I have considered potential risks and mitigations for my changes.
I have documented my changes clearly and comprehensively.
I have not snuck in any "extra" small tweaks changes

…ing each others efforts in many aspects. * added a new config parameter in .env to control which kind of browser the user wants: headless or full Selenium with Chrome * restored browse_website() to commands.py * PR Significant-Gravitas#1397 introduced a working Selenium adapter, but inadvertently clobbered PR Significant-Gravitas#968, and replicated most of the stuff in browse.py, but based on an old version, without any merge conflicts. This is now rectified by moving Selenium code into browse.py, and reducing duplication as much as possible. * there was a small typo, because an object reference was also returned along with the links in the link scraper. * listed the PROs and CONs of each browser in the source code

nponeccop · 2023-04-15T13:47:29Z

@bszollosinagy There are conflicts now

bszollosinagy · 2023-04-15T20:13:05Z

Too many changes have happened due to a large scale source code reorg, but the core issue remains. There is duplication of nearly identical code between the web_requests.py and web_selenium.py. I might re-implement the refactoring for the new state of the source code. Some part of this PR is obselete: the memory storage of webpages from PR #968 is now not clobbered anymore. And that was my main motivation.

Btw, web_requests.py is not used by AutoGPT, maybe someone can deprecate it if the headless Selenium PR is merged.

bszollosinagy mentioned this pull request Apr 15, 2023

PR Batch 5 (WIP) #1162

Closed

1 task

bszollosinagy closed this Apr 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repairs broken changes, Cleaned up Browsing, added switch to choose Selenium or Headless #1534

Repairs broken changes, Cleaned up Browsing, added switch to choose Selenium or Headless #1534

bszollosinagy commented Apr 15, 2023 •

edited

Loading

nponeccop commented Apr 15, 2023

bszollosinagy commented Apr 15, 2023

Repairs broken changes, Cleaned up Browsing, added switch to choose Selenium or Headless #1534

Repairs broken changes, Cleaned up Browsing, added switch to choose Selenium or Headless #1534

Conversation

bszollosinagy commented Apr 15, 2023 • edited Loading

Background

Changes

Documentation

Test Plan

PR Quality Checklist

nponeccop commented Apr 15, 2023

bszollosinagy commented Apr 15, 2023

bszollosinagy commented Apr 15, 2023 •

edited

Loading