- Both
clean_variable_spelling()
andclean_spelling()
have been migrated over to the {matchmaker} package and arguments from the aformentioned functions are passed to the {matchmaker} functions. Tests and documentation have been updated to reflect this. - Remove {rlang} from imports (but is still imported by {matchmaker}).
clean_variable_spelling()
andclean_spelling()
gain the option to specify which columns contain the keys (from
) and values (to
). These default to 1 and 2, respectively, which ensure that backwards compatibility is retained. (this fixes #99).linelist_example()
is a new function that serves as an alias forsystem.file("extdata", thing, package = "linelist")
, which is much easier for new R users to understand.
top_values()
no longer throws a spurious warning when the levels in the subset data are identical to the levels in the full data (#96)
top_values()
gains a newsubset
argument that allows the user to retain the top levels of a subset of a vector. This is particularly useful for retrospective analysis based on current trends (fixes #92 via #94 and #95, @thibautjombart)
top_values()
gains the explicit ties.method parameter, which defaults to "first" to fix issue #88 (thanks to @cwhittaker1000 for spotting the issue and providing a detailed explanation).top_values()
issues a warning if one of the top values had a tied value that was not included.top_values()
issues a warning if the user uses a ties.method that is not guaranteed to return exactly n top values.
clean_spelling()
gains theanchor_regex
argument, which will wrap all regex keyword entries in "^" and "$" before processing.
-
The linelist class and all associated epivars/dictionary functions have been removed as out of scope of this package. Without any validation, these functions were no more than a fancy wrapper to
dplyr::rename()
, thus they are being removed after fda9e18b02f5853cd311ddcc513c427244b21dd7. If the linelist class is ressurrected, (e.g. to implement a hxl validator package), it can be taken from that commit. This is related to #29 -
clean_spelling()
now gains the.regex
keyword that allows the user to supply perl-style regular expressions to change words that may have similar spelling.
guess_dates()
now processes at double the speed of the previous version.guess_dates()
will now properly constrain date vectors to the start and end dates.guess_dates()
correctly parses dates represented as integers from excel (#73).
print.data_comparison()
now setsdiff_only = TRUE
by default (#71)
compare_data()
gains the optioncolumns
, which allows users to choose which columns they want to compare. Defaults toTRUE
, which compares all columns (#58).
guess_dates()
can now handle dates that were imported from Excel as integers (#66).guess_dates()
gains the argument "modern_excel" to indicate how integer dates should be formatted.getOption("linelist_guess_orders")
replaces the explicit list of orders inguess_dates()
for easier access.guess_dates()
no longer throws an error if passed a date class object (#65).guess_dates()
has been better documented to reflect the above changes (#64).
clean_spelling()
gains a new keyword:.na
(or should I say "valueword"). When this keyword is in the values (second) column of the wordlist, the keys will be replaced with a missing (<NA>
) value. This is useful for contrasting between presence of an absence and an absence of a presence with the.missing
keyword. See #55 and #57 for details
print.data_comparison()
gains the logical argumentscommon_values
anddiff_only
to control the length of print output (See #61).
compare_data()
now correctly accounts for different values in variables. Thanks to @ffinger for finding the bug (#56).- pre-release in-development numbering scheme updated to only increment the patch version to indicate the ongoing WIP. Release to CRAN will shift to 0.1.0
compare_data()
now returns list of variable classes instead of TRUE if the classes match. (See #53 for details).
clean_variable_spelling()
will now run global variables before processing named variables instead of in tandem. This allows the user to define misspellings in the.global
variable. See reconhub#51 for details.
clean_spelling()
will no longer throw a warning if there is no value for .default to replace.clean_variable_spelling()
,clean_variables()
, andclean_data()
gain thewarn
andwarn_spelling
arguments which will capture all errors and warnings issued fromclean_spelling()
for each variable. See reconhub#48 for details).
compare_data()
allows users to compare structural changes to data frames This includes, names, classes, dimensions, and values in matching categorical variables. (See reconhub#50 for details).top_values()
will mask all but the topn
values in a factor.- the
crayon
package is added to imports
clean_spelling()
wordlists now allow the optional.missing
keyword to replace bothNA
and blank ("") cells in the data. Values that areNA
will be converted to "NA" (character) with a warning. See reconhub#44 and reconhub#45 for details.
guess_dates()
can once again parse date formats that are file names:example_format_2019-02-19.xlsx
. (See #43 for details)
clean_spelling()
gains aquiet
argument to suppress warnings.clean_variable_spelling()
will no longer error if there are variable specifications that don't exist in the data. It will also suppress all warnings fromclean_spelling()
. (see #41 for details)
clean_spelling()
will check the spelling of a vector against a wordlistclean_variable_spelling()
will applyclean_spelling()
to all specified columns in a data frameclean_variables()
wrapsclean_variable_labels()
andclean_variable_spelling()
clean_data()
now can optionally check labels againt a wordlist.
(see #38 for details)
mask()
will temporarily replace column names with epivarsunmask()
reverses the effect of mask.- New Imports: tidyselect and purrr (see #37 for details)
geo
epivar was replaced withgeo_lat
andgeo_lon
(see #35)
- add optional constraints for what columns can be manipulated and make clean_data() faster (see #32)
- use lubridate package to parse dates (see #30)
lookup()
function can look up the column name corresponding to an epivar (see #28)
add_epivars()
adds epivars to the global dictionaryadd_description()
updates the description of one of the epivars (see #26)
- add
template_linelist()
function (see #24)
- add rio to imports (see #23)
- rename all_dictionary argument to full_dict (see #22)
- re-instate validator of dots (see #21)
- re-instate data validation (see #20)
- restructure linelist class to make dictionary global (see #19)
- dictionary validation and tibble import (see #17)
- new functions to handle epivars (see #16)
get_vars()
can take multiple variables (see #15)
- adds linelist class (see #9)
guess_dates()
now throws an appropriate error if a vector is passed instead of a data frame. See reconhub#4 for details
- Added a
NEWS.md
file to track changes to the package.