Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible miscategorizing of aws-hosted sites #50

Open
S4lt5 opened this issue Oct 27, 2022 · 2 comments
Open

Possible miscategorizing of aws-hosted sites #50

S4lt5 opened this issue Oct 27, 2022 · 2 comments
Labels
improvement This issue or pull request will add new or improve existing functionality

Comments

@S4lt5
Copy link

S4lt5 commented Oct 27, 2022

💡 Summary

When running findcdn against a site like www.ahcp.gov, I get the following output:

❯ findcdn list www.achp.gov

{
   "date": "10/27/2022, 14:22:16",
   "cdn_count": "1",
   "domains": {
       "www.achp.gov": {
           "IP": "'3.32.142.183', '3.32.4.248'",
           "cdns": "'.amazonaws.com'",
           "cdns_by_names": "'Amazon AWS'"
       }
   }
}

I also put some debug prints in to see the following values:

HEADERS:  ['Apache']
Whois:  ['AMAZON EXPANSION, IE', 'AMAZON-EC2-USGOVCLD', 'AMAZON EXPANSION, IE', 'AMAZON-EC2-USGOVCLD']

When looking at various static files on the site, none appear to have any indication of being served via CDN (no x-cache-*, no via, no .cloudfront url, etc)

Also looking at the whois data, there appears to be at least some hint that this is an ec2 instance serving static files, which probably does not fall into our "has a CDN" category.

What do you think?

Motivation and context

Accuracy of reported output is important to me, and I'm not sure if there is a change to how such a site should be classified.

Implementation notes

Unsure

Acceptance criteria

Unsure

@S4lt5 S4lt5 changed the title Possible miscategorizing of AWS EC2 Users Possible miscategorizing of AWS EC2 Sites Oct 27, 2022
@S4lt5 S4lt5 changed the title Possible miscategorizing of AWS EC2 Sites Possible miscategorizing of aws-hosted sites Oct 27, 2022
@Pascal-0x90
Copy link
Collaborator

This is a good idea to fix. The classification of Cloudfront domains should be more accurate than to just assume all AWS hosted sites use Cloudfront. Stashing this in a TODO for now.

@Pascal-0x90 Pascal-0x90 added the improvement This issue or pull request will add new or improve existing functionality label Nov 23, 2022
@Pascal-0x90
Copy link
Collaborator

Suggestion for using IP blocks instead of just domain name listed here: #43 should help. Also, I think what is causing this is this line

".amazonaws.com": "Amazon AWS",

Adds anything that has amazonaws.com in the name to see it as an "Amazon AWS" CDN which realistically is not fully correct as you mention above. I think a lot of the other domains in this list suffer from this.

For example, something ending in .discord.com does not mean it was specifically hosted on cdn.discord.com.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement This issue or pull request will add new or improve existing functionality
Projects
None yet
Development

No branches or pull requests

2 participants