Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consecutive Codes #3

Open
chancyk opened this issue May 26, 2015 · 0 comments
Open

Consecutive Codes #3

chancyk opened this issue May 26, 2015 · 0 comments

Comments

@chancyk
Copy link

chancyk commented May 26, 2015

Consecutive codes may not be handled correctly, as can be seen with the test cases Pfister and Tymczak referenced at http://www.archives.gov/research/census/soundex.html.

The original Russell and census versions of the algorithm seem to implement this consecutive code behavior for adjacent letters only (not separated by a vowel or '0' code character).

The archives.gov reference also mentions another special case where a consecutive code is discarded when separated by an 'H' or 'W'.

EDIT: The 'H' or 'W' rule actually is used in the SQL Server implementation. Removed the comment that it's not.

EDIT2: I was right and wrong before my first edit. MSSQL is case sensitive for its handling of 'H' and 'W'. Consecutive codes are discarded for upper case and not for lower case...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant