ABOUT

The premise of this project is to examine the C functions used by web browsers and see what information can be extrapolated from them. Variations on this concept include running based on limited information (small windows of functions).

This repo contains the standard program that reads an entire strace file as well as the program that reads small chunks of strace files and determines which site they're generated from.

#Requirements#

SciPy
NumPy
Scikit-learn

#Project Notes#

sites:

info.cern.ch/hypertext/WWW/TheProject.html
www.google.com
en.wikipedia.org/wiki/South_African_labour_law
www.reddit.com
www.yahoo.com
www.youtube.com/watch?v=dQw4w9WgXcQ
www.cs.unm.edu/~forrest/publications/acsac08.pdf
www.nyu.edu
www.wfu.edu
www.soundcloud.com/flume/lorde-tennis-court-flume-remix

Command to use:

strace -o ./[etc.] wget -e robots=off --wait 1 --page-requisites [link]

Soundcloud test links (streaming site example):

CERN test links (lightweight site example):

//TODO:

group websites into categories, i.e. university sites, streaming sites, news sites, wikipedia pages (lists vs articles), etc.

can it differentiate between website types?

Search site terms:

homepage
wake forest
nyu
linux
computer science

University sites:

wfu.edu
nyu.edu
duke.edu
unc.edu
utexas.edu
berkeley.edu
usc.edu
ucla.edu
cornell.edu
uchicago.edu

Valgrind on server:

run strace on valgrind runs
see if you can see the difference between high and low memory usage

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
extra_files		extra_files
output_files/wget_files		output_files/wget_files
README.md		README.md
chunked_window_classifier.py		chunked_window_classifier.py
chunker.py		chunker.py
output_parser_chunked.py		output_parser_chunked.py
output_parser_grouping.py		output_parser_grouping.py
window_classifier.py		window_classifier.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ABOUT

About

Releases

Packages

Languages

noahfl/motif-site_grouping

Folders and files

Latest commit

History

Repository files navigation

ABOUT

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages