-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
241 changed files
with
32,575 additions
and
3,485 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
^\.Rproj\.user$ | ||
^.*\.kdev4$ | ||
^\.kdev4 | ||
^devel | ||
^.devel | ||
kate-swp$ | ||
^README | ||
LICENSE | ||
|
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: 90245c3729073b508ce8579ed96408b3 | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
stringx: Drop-in replacements for base R string functions powered by stringi | ||
============================================================================ | ||
|
||
.. epigraph:: | ||
English is the native language for only 5% of the World population. | ||
Also, only 17% of us can understand this text. Moreover, the Latin alphabet | ||
is the main one for merely 36% of the total. The early computer era, | ||
now a very long time ago, was dominated by the US. Due to the proliferation | ||
of the internet, smartphones, social media, and other technologies | ||
and communication platforms, this is no longer the case. | ||
This package replaces base R string functions with ones that fully | ||
support the Unicode standards related to natural language | ||
and date-time processing. | ||
Thanks to `ICU <https://icu.unicode.org/>`_ | ||
(International Components for Unicode) and | ||
`stringi <https://stringi.gagolewski.com/>`_, | ||
they are fast, reliable, and portable across different platforms. | ||
|
||
|
||
`R <https://www.r-project.org/>`_'s ambitions go far beyond being merely the | ||
"free software environment for statistical computing and graphics". | ||
It has proven effective in developing whole data analysis pipelines: | ||
from gathering information through the discovery of knowledge to | ||
the communication of results. | ||
|
||
**Modern data science is no longer just about number crunching.** | ||
Text is a rich source of new knowledge — from natural language | ||
processing to bioinformatics. It also gives powerful | ||
means to represent or transfer unstructured data. | ||
|
||
**stringx brings R string processing abilities into the 21st century.** | ||
It replaces functions like ``paste()``, ``grep()``, ``tolower()``, | ||
``strptime()``, and ``sprintf()`` with ones that: | ||
|
||
* support a wide range of languages and scripts and | ||
fully conform to `Unicode <https://www.unicode.org/>`_ standards | ||
(see also `this video <https://www.youtube.com/watch?v=-n2nlPHEMG8>`_), | ||
* work in the same way on every platform, | ||
* fix some long-standing inconsistencies in the base R functions | ||
(related to vectorisation, handling of missing values, | ||
preservation of attributes, order of arguments, interoperability | ||
with other procedures, etc.; | ||
they are all thoroughly documented in this online manual, | ||
happy reading! 🤓), | ||
* are more forward-pipe (``|>`` or ``magrittr::%>%``) operator-friendly. | ||
|
||
Also, a few new, useful operations are introduced. | ||
|
||
.. code-block:: r | ||
install.packages("stringx") # install from CRAN | ||
suppressMessages(library("stringx")) | ||
c("ACTGCT", "42", "stringx \U0001f970") |> grepv2("\\p{EMOJI_PRESENTATION}") | ||
## [1] "stringx 🥰" | ||
toupper("gro\u00DF") # replaces base::toupper() | ||
## [1] "GROSS" | ||
l <- c("e", "e\u00b2", "\u03c0", "\u03c0\u00b2", "\U0001f602\U0001f603") | ||
r <- c(exp(1), exp(2), pi, pi^2, NaN) | ||
cat(sprintf("%8s=%+.3f", l, r), sep="\n") # replaces base::sprintf() | ||
## e=+2.718 | ||
## e²=+7.389 | ||
## π=+3.142 | ||
## π²=+9.870 | ||
## 😂😃= NaN | ||
.. COMMENT | ||
but we do not aim to fix the whole nam.ING_meSS | ||
99% compatible (cannot be 100% as they use a different regex engine, | ||
for example, and some inconsistencies are quite obvious and can be a push | ||
for a change in the right direction) | ||
* collator - portable (locales), Unicode-correct (normalisation) | ||
* date/time - portable (locales) | ||
* iconv - portable | ||
* regex - Unicode-correct, portable | ||
* speed | ||
TODO: mention https://unicode-org.github.io/icu/userguide/icu/posix.html | ||
**stringx** is a set of wrappers around | ||
`stringi <https://stringi.gagolewski.com/>`_ — a mature | ||
`R <https://www.r-project.org/>`_ package for | ||
fast, consistent, convenient, and portable string/text/natural language | ||
processing in any locale that relies on | ||
`ICU – International Components for Unicode <https://icu.unicode.org/>`_. | ||
|
||
*stringx*'s source code is hosted on | ||
`GitHub <https://github.com/gagolews/stringx>`_. Its official releases | ||
are available on `CRAN <https://cran.r-project.org/package=stringx>`_. | ||
It is distributed under the terms of the GNU General Public License, | ||
either Version 2 or Version 3; see | ||
`license <https://raw.githubusercontent.com/gagolews/stringx/master/LICENSE>`_. | ||
|
||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:caption: stringx | ||
:hidden: | ||
|
||
About <self> | ||
Author <https://www.gagolewski.com/> | ||
|
||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:caption: Reference Manual | ||
:glob: | ||
|
||
rapi/* | ||
.. rapi.md | ||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: Other | ||
|
||
Source Code (GitHub) <https://github.com/gagolews/stringx> | ||
Bug Tracker and Feature Suggestions <https://github.com/gagolews/stringx/issues> | ||
CRAN Entry <https://cran.r-project.org/package=stringx> | ||
news.md | ||
|
||
.. COMMENT | ||
.. |downloads1| image:: https://cranlogs.r-pkg.org/badges/grand-total/stringx | ||
.. |downloads2| image:: https://cranlogs.r-pkg.org/badges/last-month/stringx |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
# What Is New in *stringx* | ||
|
||
> Note that the date-time processing functions in *stringx* are a work | ||
> in progress. Feature requests/comments/remarks are welcome, | ||
> see https://github.com/gagolews/stringx/issues. | ||
|
||
|
||
|
||
## 0.2.3 (2022-10-13) | ||
|
||
* [BUGFIX] Fixed failing checks/tests. | ||
|
||
|
||
## 0.2.2 (2021-09-03) | ||
|
||
* [DOCUMENTATION] ICU Project site has been moved to <https://icu.unicode.org/>. | ||
|
||
|
||
## 0.2.1 (2021-08-27) | ||
|
||
* [BACKWARD INCOMPATIBILITY, BUGFIX] #7: Dates without times are now always | ||
treated as being at midnight in the local (default) time zone. | ||
|
||
* [BACKWARD INCOMPATIBILITY] Date-time functions now yield objects | ||
of class `POSIXxt`, which extend upon `POSIXct` (and allow for custom | ||
formatting etc.). | ||
|
||
* [BACKWARD INCOMPATIBILITY, BUGFIX] #7: `strftime` uses the `tzone` attribute | ||
by default. | ||
|
||
* [NEW FEATURE] Added functions: `as.POSIXxt`, `is.POSIXxt`, | ||
`Sys.time`, `ISOdatetime`, `ISOdate`, `Ops.POSIXxt`, | ||
`c.POSIXxt`, `rep.POSIXxt`, `seq.POSIXxt`. | ||
|
||
|
||
## 0.1.3 (2021-08-05) | ||
|
||
* [BUGFIX] #4: Fixed failing check with ICU 55. | ||
|
||
* [BUGFIX] #5: Fixed failing check under POSIX/C locale. | ||
|
||
|
||
## 0.1.2 (2021-07-27) | ||
|
||
* First [CRAN](https://cran.r-project.org/package=stringx) release. | ||
|
||
|
||
## 0.1.1 (2021-07-15) | ||
|
||
* [GENERAL] [On-line manual](https://stringx.gagolewski.com) is now available. | ||
|
||
* [GENERAL] Using [*realtest*](https://realtest.gagolewski.com) | ||
for documenting base R behaviour, unit testing, and desired outcomes. | ||
|
||
* [NEW FEATURE] Added constants: `letters_greek`, `digits_hex`, etc. | ||
|
||
* [NEW FEATURE] Added functions and operators: | ||
`strcat`, `%x+%`, `%x*%`, | ||
`chartr2`, `strtrans`, | ||
`printf`, | ||
`xtfrm2`, | ||
`strftime`, `strptime`, | ||
`strcoll`, `%x==%`, `%x!=%`, `%x<%`, `%x<=%`, `%x>%`, `%x>=%`, | ||
`substrl`, `substrl<-`, | ||
`sub2`, `gsub2`, | ||
`grepl2`, `grepv2`, `grepv2<-`, | ||
`regexpr2`, `gregexpr2`, | ||
`regexec2`, `gregexec2`, | ||
`gsubstrl`, `gsubstrl<-`, | ||
`gsubstr`, `gsubstr<-`, | ||
`regextr2`, `regextr2<-`, | ||
`gregextr2`, `gregextr2<-`. | ||
|
||
* [NEW FEATURE] Rewritten functions: | ||
`paste`, `paste0`, | ||
`strrep`, | ||
`chartr`, `tolower`, `toupper`, `casefold`, | ||
`sprintf`, | ||
`strftime`, `strptime`, | ||
`nchar`, `nzchar`, | ||
`strtrim`, | ||
`trimws`, | ||
`startsWith`, `endsWith`, | ||
`sort`, | ||
`strwrap`, | ||
`substr`, `substring`, `substr<-`, `substring<-`, | ||
`strsplit`, | ||
`sub`, `gsub`, | ||
`grep`, `grepl`, | ||
`regexpr`, `gregexpr`, | ||
`regexec`, `gregexec`. | ||
|
||
|
||
## 0.0.0 (2021-05-07) | ||
|
||
* The *stringx* project has been started. |
File renamed without changes.
80 changes: 80 additions & 0 deletions
80
.devel/sphinx/_build/html/_sources/rapi/ISOdatetime.md.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
# ISOdatetime: Construct Date-time Objects | ||
|
||
## Description | ||
|
||
`ISOdate` and `ISOdatetime` construct date-time objects from numeric representations. `Sys.time` returns current time. | ||
|
||
## Usage | ||
|
||
``` r | ||
ISOdatetime( | ||
year, | ||
month, | ||
day, | ||
hour, | ||
min, | ||
sec, | ||
tz = "", | ||
lenient = FALSE, | ||
locale = NULL | ||
) | ||
|
||
ISOdate( | ||
year, | ||
month, | ||
day, | ||
hour = 0L, | ||
min = 0L, | ||
sec = 0L, | ||
tz = "", | ||
lenient = FALSE, | ||
locale = NULL | ||
) | ||
|
||
Sys.time() | ||
``` | ||
|
||
## Arguments | ||
|
||
| | | | ||
|------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| `year, month, day, hour, min, sec` | numeric vectors | | ||
| `tz` | `NULL` or `''` for the default time zone (see [`stri_timezone_get`](https://stringi.gagolewski.com/rapi/stri_timezone_set.html)) or a single string with a timezone identifier, see [`stri_timezone_list`](https://stringi.gagolewski.com/rapi/stri_timezone_list.html) | | ||
| `lenient` | single logical value; should date/time parsing be lenient? | | ||
| `locale` | `NULL` or `''` for the default locale (see [`stri_locale_get`](https://stringi.gagolewski.com/rapi/stri_locale_set.html)) or a single string with a locale identifier, see [`stri_locale_list`](https://stringi.gagolewski.com/rapi/stri_locale_list.html) | | ||
|
||
## Value | ||
|
||
These functions return an object of class `POSIXxt`, which extends upon [`POSIXct`](https://stat.ethz.ch/R-manual/R-devel/library/base/help/POSIXct.html), [`strptime`](strptime.md). | ||
|
||
You might wish to consider calling [`as.Date`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/as.Date.html) on the result yielded by `ISOdate`. | ||
|
||
No attributes are preserved (because they are too many). | ||
|
||
## Differences from Base R | ||
|
||
Replacements for base [`ISOdatetime`](https://stat.ethz.ch/R-manual/R-devel/library/base/help/ISOdatetime.html) and [`ISOdate`](https://stat.ethz.ch/R-manual/R-devel/library/base/help/ISOdate.html) implemented with [`stri_datetime_create`](https://stringi.gagolewski.com/rapi/stri_datetime_create.html). | ||
|
||
- `ISOdate` does not treat dates as being at midnight by default **\[fixed here\]** | ||
|
||
## Author(s) | ||
|
||
[Marek Gagolewski](https://www.gagolewski.com/) | ||
|
||
## See Also | ||
|
||
The official online manual of <span class="pkg">stringx</span> at <https://stringx.gagolewski.com/> | ||
|
||
Related function(s): [`strptime`](strptime.md) | ||
|
||
## Examples | ||
|
||
|
||
|
||
|
||
```r | ||
ISOdate(1970, 1, 1) | ||
## [1] "1970-01-01T00:00:00+1000" | ||
ISOdatetime(1970, 1, 1, 12, 0, 0) | ||
## [1] "1970-01-01T12:00:00+1000" | ||
``` |
25 changes: 25 additions & 0 deletions
25
.devel/sphinx/_build/html/_sources/rapi/about_stringx.md.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# about_stringx: Drop-in Replacements for Base String Functions Powered by Stringi | ||
|
||
## Description | ||
|
||
<span class="pkg">stringx</span> reimplements the built-in R string processing functions based on <span class="pkg">stringi</span> -- a mature R package for fast, correct, consistent, and convenient text manipulation. Thanks to the <span class="pkg">ICU</span> library, we obtain predictable results on every platform, in each locale, and under any native character encoding. | ||
|
||
**Keywords**: R, text processing, character strings, internationalisation, localisation, ICU, ICU4C, i18n, l10n, Unicode | ||
|
||
**License**: GNU General Public License version 2 or later | ||
|
||
## Author(s) | ||
|
||
[Marek Gagolewski](https://www.gagolewski.com/) | ||
|
||
## References | ||
|
||
*<span class="pkg">stringi</span> Package homepage*, <https://stringi.gagolewski.com/> | ||
|
||
*ICU -- International Components for Unicode*, <https://icu.unicode.org/> | ||
|
||
*The Unicode Consortium*, <https://home.unicode.org/> | ||
|
||
## See Also | ||
|
||
The official online manual of <span class="pkg">stringx</span> at <https://stringx.gagolewski.com/> |
Oops, something went wrong.