-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow forms (with CAPTCHA) to bots #3936
Disallow forms (with CAPTCHA) to bots #3936
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works as intended. But keep in mind that the instructions in robots.txt files cannot enforce crawler behavior to the site, just suggest it 😄 to stop crawlers from accessing the pages we can try to manually whitelist user agents on the backend but I think it is not needed.
I don't think we should manually detect robots and disable those pages, because we cannot be 100% sure who is robot and who is not just by header. That's why we have captcha on those pages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One change requested as in: vivo-project/Vitro#438 (review)
Co-authored-by: Ivan R. Mršulja <nighteliteace@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have re-run all the tests using Merkle and now everything works as intended. Steps to reproduce the tests:
- Setup a publicly available VIVO server (I reccommend using a tool like NGROK)
- If the vivo does not run on the root URL and instead you have to go to
/vivo
or something similar, you have to providerobots.txt
manually in the text editor - Choose the crawler of your choice from dropdown menu and try to fetch any of the disallowed domains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works as advertised. Tested with Merkle and confirmed that submitFeedback, contact and forgotPassword are all disallowed.
VIVO GitHub issue: 3935
Linked Vitro PR
What does this pull request do?
Disallow access to /contact and /forgot-password to bots (at least to bots which respect robots.txt)
What's new?
robots.txt is updated
How should this be tested?
Run VIVO and try to access to /contact and /forgotPassword from the web browser (this should work), and then testing robots.txt file by using some validator such as this one. Please note that you run VIVO at some public address as a root application (meaning it should not be http://somedomain.com/vivo, it should be http://somedomain.com)
Interested parties
Tag (@ mention) interested parties or, if unsure, @VIVO-project/vivo-committers