stri_opts_collator: Generate a List with Collator Settings
Description
A convenience function to tune the ICU Collator’s behavior, e.g., in stri_compare, stri_order, stri_unique, stri_duplicated, as well as stri_detect_coll and other stringi-search-coll functions.
Usage
stri_opts_collator(
locale = NULL,
strength = 3L,
alternate_shifted = FALSE,
french = FALSE,
uppercase_first = NA,
case_level = FALSE,
normalization = FALSE,
normalisation = normalization,
numeric = FALSE,
...
)
stri_coll(
locale = NULL,
strength = 3L,
alternate_shifted = FALSE,
french = FALSE,
uppercase_first = NA,
case_level = FALSE,
normalization = FALSE,
normalisation = normalization,
numeric = FALSE,
...
)
Arguments
|
single string, |
|
single integer in {1,2,3,4}, which defines collation strength; |
|
single logical value; |
|
single logical value; used in Canadian French; |
|
single logical value; |
|
single logical value; controls whether an extra case level (positioned before the third level) is generated or not |
|
single logical value; if |
|
alias of |
|
single logical value; when turned on, this attribute generates a collation key for the numeric value of substrings of digits; this is a way to get ‘100’ to sort AFTER ‘2’; note that negative numbers will not be ordered properly |
|
[DEPRECATED] any other arguments passed to this function generate a warning; this argument will be removed in the future |
Details
ICU’s collator performs a locale-aware, natural-language alike string comparison. This is a more reliable way of establishing relationships between strings than the one provided by base R, and definitely one that is more complex and appropriate than ordinary bytewise comparison.
Value
Returns a named list object; missing settings are left with default values.
References
Collation – ICU User Guide, https://unicode-org.github.io/icu/userguide/collation/
ICU Collation Service Architecture – ICU User Guide, https://unicode-org.github.io/icu/userguide/collation/architecture.html
icu::Collator Class Reference – ICU4C API Documentation, https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/classicu_1_1Collator.html
See Also
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02
Other locale_sensitive: %s<%(), about_locale, about_search_boundaries, about_search_coll, stri_compare(), stri_count_boundaries(), stri_duplicated(), stri_enc_detect2(), stri_extract_all_boundaries(), stri_locate_all_boundaries(), stri_order(), stri_rank(), stri_sort_key(), stri_sort(), stri_split_boundaries(), stri_trans_tolower(), stri_unique(), stri_wrap()
Other search_coll: about_search_coll, about_search
Examples
stri_cmp('number100', 'number2')
## [1] -1
stri_cmp('number100', 'number2', opts_collator=stri_opts_collator(numeric=TRUE))
## [1] 1
stri_cmp('number100', 'number2', numeric=TRUE) # equivalent
## [1] 1
stri_cmp('above mentioned', 'above-mentioned')
## [1] -1
stri_cmp('above mentioned', 'above-mentioned', alternate_shifted=TRUE)
## [1] 0