-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate if increased timeouts or other paramater changes improve crawl analysis accuracy #135
Comments
@franciscawijaya, for the Optanonconsent after GPC:
@natelevinson10 and @eakubilo, if you have a chance, please also look into this issue? This is the most important issue at the moment. It would be good to get a good understanding before we start the next crawl. |
|
I wanted to be precise and check the progression of the data across the different rounds of Crawl that we did and these are the findings:
Analysis:
Since I used Colorado VPN for the recent small crawl, I also have done a crawl focusing on these focus sites with California crawl. However, after analyzing the result with California VPN, I realized that the VPN could be a potential problem as it gave minor differences in the outputs. Hence what I will do is redo these focus mini crawls with more VPN IP addresses to have more data to compare (ie. using more than 1 Colorado and California VPN) and confirm before writing the analysis here. |
Thanks, @franciscawijaya!
That is helpful to know! So, before starting the next crawl we should try to understand what the reasons for these performance drops are.
Yes, that is a good point to try. @natelevinson10, can you coordinate with @franciscawijaya and also look into this as a team? |
Before the next crawl (#118), we should look into what caused the divergence of crawl results from the manual analysis results in our most recent analysis. See the red-labeled fields:
Do increased timeouts help? Maybe, some sites were not fully loaded before the data was captured. Are there other parameters that we can fine-tune to improve accuracy?
@franciscawijaya will take the lead here and work with @natelevinson10 and @eakubilo before starting the next crawl.
The text was updated successfully, but these errors were encountered: