Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to reanalyze pending comments #374

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

florianbrinkmann
Copy link
Member

@florianbrinkmann florianbrinkmann commented Dec 10, 2020

I started work on #281 with a button that allows the user to reanalyze all pending comments.

For now, the button lives in the row with the filter and bulk actions:

Screenshot_2020-12-10 Comments ‹ WordPress oder — WordPress(1)

The button label is a little bit long, but I am not sure if Reanalyze or something like that would be clear enough.

  • Test with large amount of pending comments

@florianbrinkmann
Copy link
Member Author

I am not sure what happens when a few thousand comments are pending, like in #281 😬 Do we need some form of queueing for such cases?

@florianbrinkmann florianbrinkmann marked this pull request as draft January 20, 2021 10:23
@florianbrinkmann
Copy link
Member Author

I testet the feature with 29.500 pending comments (thanks @2ndkauboy for the SQL export), and on my local test environment it processed close to 6.000 before a 504 error. So, it needs some sort of background processing and run the comments in batches.

@florianbrinkmann
Copy link
Member Author

florianbrinkmann commented Jan 27, 2021

I added Action Scheduler (https://github.com/woocommerce/action-scheduler) for the background processing. Currently, batches of 500 comments are reanalyzed, before the next action is created for the next batch. That takes some time when we have so many pending comments like in my testcase, but using a low value here makes sure it should run on most servers (I did not test how long the script runs with 500, it is just a guess that it should work fine on most server configurations).

We could also add a filter to let the user raise or lower the value.

Edit: I added Action Scheduler via Composer. It is not included in the Composer autoloader file, that is the reason why I direclty load the action-scheduler.php. We could also just copy the Action Scheduler files in a action-scheduler directory directly, but I thought using Composer would be the best way.

I added that part of vendor to the repo because we also have the minified CSS files in the repo.

@florianbrinkmann florianbrinkmann self-assigned this Jan 27, 2021
@florianbrinkmann florianbrinkmann linked an issue Jan 27, 2021 that may be closed by this pull request
@florianbrinkmann florianbrinkmann marked this pull request as ready for review January 27, 2021 13:59
@pfefferle
Copy link
Member

The actionscheduler can cause "redeclare" issues if it is required by more than one plugin. Maybe we should rewrite the namespace using php-scoper (see https://github.com/pfefferle/wordpress-webmention/blob/4.X/composer.json#L40) or check if the class/lib is already loaded.

@florianbrinkmann
Copy link
Member Author

I thought that, too, but Action Scheduler already handles that: https://github.com/pluginkollektiv/antispam-bee/blob/281-bulk-analyzing/vendor/woocommerce/action-scheduler/action-scheduler.php

Off topic: php-scoper looks very cool!

@florianbrinkmann
Copy link
Member Author

florianbrinkmann commented Feb 10, 2021

  • Add Button to Settings page
  • Only show (or enable) option with more than X pending comments

@florianbrinkmann florianbrinkmann marked this pull request as draft February 10, 2021 10:17
@florianbrinkmann
Copy link
Member Author

The latest commit adds an additional checkbox to the settings page. Currently when more than 20 comments are pending, I thought that would be a good count because it is the default count of one comments page, but of course we can increase or decrease that, or even include a filter.

My first thought was to include the option as a button instead of the checkbox, but that would require JavaScript to send the button action as an AJAX request or to add a additional form element somewhere on the settings page. And if the user clicks the button without JS/AJAX, he would loose other modifications he made during that session.

@florianbrinkmann florianbrinkmann marked this pull request as ready for review February 13, 2021 08:37
@florianbrinkmann
Copy link
Member Author

PS: for now I left the reanalyze button on the edit-comments.php page, and it is displayed as soon as comments are pending.

@florianbrinkmann
Copy link
Member Author

florianbrinkmann commented Feb 24, 2021

  • Rerun the reanalyzation until no more comments are detected as spam.

@florianbrinkmann florianbrinkmann marked this pull request as draft March 19, 2021 08:49
@florianbrinkmann
Copy link
Member Author

Switched it to WP cron and it seems to work fine (running the first test currently with over 10.000 pending comments). When it finishes without issues, I will implement the rerun until no more comments are detected as spam.

@florianbrinkmann florianbrinkmann marked this pull request as ready for review March 20, 2021 09:16
@florianbrinkmann
Copy link
Member Author

This PR is ready for review and testing again :) Reanalyzing reruns until no new comments are marked as spam during a run.

@pfefferle I removed Action Scheduler and replaced it with the core cron functions, that’s much better, you are right.

@florianbrinkmann
Copy link
Member Author

florianbrinkmann commented Mar 31, 2021

  • add note to option description that this is only useful with existing spam comments

@Zodiac1978
Copy link
Member

I have a friends site with 100k+ spam comments and 20k+ pending comments. >700 MB comments table ... Great for testing! So after some problems importing the DB I got it running in my local MAMP install.

PS: for now I left the reanalyze button on the edit-comments.php page, and it is displayed as soon as comments are pending.

The first thing that got me confused is why is this button on the published comments page too and not just on the pending comments page. It will just process the pending comments, so I think it could be confusing to show it on the other pages.

Then I clicked the button and it grays out showing "Reanalyzing in progress…". Now I don't have any more visual hint what is happening. Does it work? After minutes it changed from 21.927 to 21.925. Not sure if this is the outcome or just a very slow process ... After 5 minutes more I am on 21.923.

This is either tremendously slow on my local machine or we don't find much spam with the rest of the methods (look at the statistics below).

Then I looked at the settings screen: I was wondering if the checkbox in the settings is clear. At the moment it shows "Reanalyzing in progress…", the checkbox is grayed out and I am not sure what happens if the checkbox is active. Can I start the re-analyzing here (if it is not started yet)?

Fun fact: Our honeypot is by faaaaar the best method to find spam. Look at this data:

Spam: 109.282

109.228x Honeypot
7x BBCode
8x Empty fields
13x RegEx
6x Identical Title

Unfortunately they missed to activate the local spam db setting ...

Which leads to my next question: Does the re-analyze setting works with the active settings? So in my case I should have activated the local spam db setting before I've started the re-analyzing, or not?

Which brings me to my final question: How can I stop the re-analyzing and re-start it with a different setting? Do I need to wait until it's done? What happens then? A notice? Just the button from the beginning?

@florianbrinkmann
Copy link
Member Author

florianbrinkmann commented Apr 8, 2021

Hi @Zodiac1978,

thanks for testing and your feedback!

The first thing that got me confused is why is this button on the published comments page too and not just on the pending comments page. It will just process the pending comments, so I think it could be confusing to show it on the other pages.

Yes, you are right, that might be confusing. I will check if I can modify it to display only on the pending comments screen.

Then I clicked the button and it grays out showing "Reanalyzing in progress…". Now I don't have any more visual hint what is happening. Does it work? After minutes it changed from 21.927 to 21.925. Not sure if this is the outcome or just a very slow process ... After 5 minutes more I am on 21.923.

This is either tremendously slow on my local machine or we don't find much spam with the rest of the methods (look at the statistics below).

It reanalyzes 500 comments per batch before creating a new cron event, that needs to be triggered by a page load. When the button changes its state to »Reanalyzing in progress…« it is working, maybe I could try to add the count of reanalyzed comments there 🤔

I am thinking about if there is a way to directly trigger the cron event from ASB via a HTTP request after creating the event, so that there does not need to be a page load from someone, I will check that out.

Then I looked at the settings screen: I was wondering if the checkbox in the settings is clear. At the moment it shows "Reanalyzing in progress…", the checkbox is grayed out and I am not sure what happens if the checkbox is active. Can I start the re-analyzing here (if it is not started yet)?

Yes, you can start the reanalyzation there by checking the checkbox and saving the settings. After reanalyzation is finished, the checkbox is unchecked again. Reason for why I added it as a checkbox there was:

My first thought was to include the option as a button instead of the checkbox, but that would require JavaScript to send the button action as an AJAX request or to add a additional form element somewhere on the settings page. And if the user clicks the button without JS/AJAX, he would loose other modifications he made during that session.

Which leads to my next question: Does the re-analyze setting works with the active settings? So in my case I should have activated the local spam db setting before I've started the re-analyzing, or not?

Yes, it works with the active settings, but you can change the settings during reanalyzation, and it should take effect from the next reanalyze batch (the next 500 comments).

Which brings me to my final question: How can I stop the re-analyzing and re-start it with a different setting? Do I need to wait until it's done? What happens then? A notice? Just the button from the beginning?

Hm, currently there is no way to cancel it. Might be a good idea to add a button for that. Thinking about where to add that button in the comment list, I guess the best way would be to just remove the button from the comment list view and just have it in the settings, where we have more space.

When reanalyzation is finished, nothing happens except the button state changes.

  • Only display reanalyze button on pending comments screen
  • Display number of reanalyzed comments in button when reanalyzation is in progress
  • Add button to cancel reanalyzation

@Zodiac1978
Copy link
Member

Final result was: 21.896 (without local spam db, just bbcode and regex.
Now testing with restart and local spam db (100k!).

@florianbrinkmann
Copy link
Member Author

My last changes:

  • add the number of processed comments to the setting card. That number is only updated after a batch of comments was processed, so the first number to appear there is 500, the next 1000, et cetera. The reason for that is, that – at least in my opinion – there is no easy way to get the correct number in the current version of ASB. We might be able to change that with ASB3.
  • Removed the reanalyze button from the comments list screen.
  • Added a new setting that appears when reanalyzation is in process. It allows to stop the reanalyzation.

I think it is not possible to trigger the cron from the plugin, because it is a cron event itself. So, there need to be requests by users to make the reanalyzation continue after a batch finished.

@florianbrinkmann
Copy link
Member Author

florianbrinkmann commented Apr 22, 2021

  • add note that this might not be very useful without the local spam DB option

@Zodiac1978
Copy link
Member

With the latest version I don't see the button anywhere and therefore can't see, start or stop anything.

The I tried to start the re-analyzing with the checkbox. Needed some trial & error to understand, that I need to save with checked checkbox to start it. It is not very intuitive UX I think.

@florianbrinkmann
Copy link
Member Author

Yes, I removed the button from the comments list because there is no place for the additional information there. And yes, I guess you are right that the checkbox control is not the best option there. Would it be good UX when we use a second submit button for the form there (not sure if it is possible, just thinking), so when the user clicks the button, all made changes are saved and the reanalyzation starts? Or would it be better to have a button that just starts reanalyzation without any other side effects.

@Zodiac1978
Copy link
Member

Or would it be better to have a button that just starts reanalyzation without any other side effects.

This. I think a new section above or below the settings would be best. It is not a setting, but an action item and therefore needs to be separated, I think.

Without any visual representation it is really hard to tell if there something in the works and what. Maybe we could use the WooCommerce DB background upgrade method here. Displaying a notice that there is something running/has successfully ended.

@florianbrinkmann florianbrinkmann marked this pull request as draft May 6, 2021 09:47
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 4 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@pfefferle pfefferle added this to the 2.11 milestone Sep 9, 2021
@Zodiac1978 Zodiac1978 modified the milestones: 2.12.0, 2.13.0 Aug 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bulk analyzing or How to deal with huge spam amounts
3 participants