Voter Verifier is offered "BYOD" (bring your own data), meaning the raw data from voter records necessary to make it useful is not included. But if you already have voter record data, this is what general data points Voter Verifier assumes to exist in an ElasticSearch index. They are divided into groups below and explained in more detail. Some groups are used in the matching process, and some are only returned in results.
Group | Used in Matching? |
---|---|
Identifier | No |
Name | Yes |
Date of Birth | Yes |
Address | Yes |
Contact Info | Yes |
Voter Details | No |
Issue and Characteristic Scores | No |
These are the fields and their types (yes...ElasticSearch does have types) that comprise the voter record index.
A stable identifier for the record. This may not, and frequently will not, map 1:1 to people.
Field | Type | Notes |
---|---|---|
id | string |
There are two fields for each name component (except suffix) because it's useful to use two different sets of tokenizers and analyzers when looking for name matches to account for a lot of different weird things with names: McDonald vs Mc Donald, O'Rourke vs O Rourke vs O' Rourke for example. The same raw value is stored in each field, but the fields use different types (with different tokenizers and analyzers applied to them).
Field | Type | Notes |
---|---|---|
first_name | string | |
first_name_compact | string | |
middle_name | string | |
middle_name_compact | string | |
last_name | string | |
last_name_compact | string | |
suffix | string |
Date of birth is stored as three separate fields and as ints to be able to do range and arithmetic queries.
Field | Type | Notes |
---|---|---|
dob_year | int | |
dob_month | int | |
dob_day | int |
There are two sets of address fields for a given record. The first is the registered address per the voter record. The registered address on the voter file and the current address (that most users would enter in a search) are frequently different. Fields prefixed with ts_
are for a best-guess-actual-address (usually commercially sourced). The query logic takes both addresses into account when ranking matches.
Field | Type | Notes |
---|---|---|
address | string | full (unparsed) street address |
city | string | |
st | string | two-letter state code (insert Gary Gulman clip here) |
zip_code | string | either zip-5 of zip+4 is fine, only zip-5 is used |
lat_lng_location | geo_point | derived from the zip code, a comma-separated pair of lat,lng |
address_street_name | string | "Main St" of "000 Main St, Apt 4" |
address_street_number | string | "000" of "000 Main St, Apt 4" |
address_unit_designator | string | "Apt" of "000 Main St, Apt 4" |
address_apt_number | string | "4" of "000 Main St, Apt 4" |
ts_address | string | |
ts_city | string | |
ts_st | string | |
ts_zip_code | string | |
ts_lat_lng_location | geo_point | this is derived from geocoding the zip code like above |
ts_address_street_name | string | |
ts_address_street_number | string | |
ts_address_unit_designator | string | |
ts_address_apt_number | string |
Similar to an address, phone numbers have multiple fields. The current index allows for 4 (potentially but not necessarily) distinct phone number fields on a given record, two of which are specifically intended to be a cell phone number, which is what most end users would provide in a search.
Field | Type | Notes |
---|---|---|
phone | string | Registered on the actual voter record |
vb_phone | string | Best guess current phone |
vb_phone_type | string | Type of phone that the best guess is. see thrift defs for an enum of expected values. |
vb_phone_wireless | string | Best guess cell phone |
ts_wireless_phone | string | Alternate best guess cell phone |
string | ||
email_append_level | string | See thrift defs for an enum of expected values. |
These fields are not used in searching, but are returned with the results.
Field | Type | Notes |
---|---|---|
registration_date | string | any valid date format |
party | string_enum | Political party affiliation. See the thrift defs for an enum of expected values. |
voter_score | string_enum | A categorization of someone's frequency of voting. |
general_YYYY | bool | a boolean flag indicating whether someone participated in the general election in year YYYY any number of years may be added as fields (see updating the index for a discussion of what changes to the index mean for the application code). |
vf_gYYYY | string_enum | a categorization of someone's vote type (e.g. early, absentee, etc) in the (g)eneral election of year YYYY. Any number of such fields may be added for different years. |
vf_pYYYY | string_enum | a categorization of someone's vote type (e.g. early, absentee, etc) in the (p)rimary of year YYYY. Any number of such fields may be added for different years. |
num_general_election_votes | int | |
num_primary_election_votes | int |