Skip to content

Generate representative samples from Pwned Passwords (HIBP)

Notifications You must be signed in to change notification settings

openwall/pwned-passwords-sampler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Generate representative samples from Pwned Passwords (HIBP)

This program generates representative samples from Pwned Passwords (HIBP), taking the count fields into account.

To use it, you need a file such as pwned-passwords-ntlm-ordered-by-hash-v8.txt from https://haveibeenpwned.com/Passwords

Compile and invoke the program on Linux as follows:

$ gcc pwned-passwords-sampler.c -o pwned-passwords-sampler -O2 -s -Wall
$ ./pwned-passwords-sampler < pwned-passwords-ntlm-ordered-by-hash-v8.txt > pp-sample
Total 5579399834
$ wc -l pp-sample
1000000 pp-sample

With everything already optimally cached in RAM, this takes under 1 minute.

The input file is expected to use CRLF linefeeds exactly as provided by HIBP, whereas the output has LF-only linefeeds.

You need to be on a 64-bit system with at least 48 GB RAM, preferably 72+ GB. Usage on non-Linux might require minor changes to the code.

About

Generate representative samples from Pwned Passwords (HIBP)

Resources

Stars

Watchers

Forks

Languages