Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace domains information #134

Open
vidonne opened this issue Jan 31, 2024 · 8 comments
Open

Replace domains information #134

vidonne opened this issue Jan 31, 2024 · 8 comments
Labels
enhancement New feature or request

Comments

@vidonne
Copy link

vidonne commented Jan 31, 2024

Is there a way to replace the value in a feature service with the domains values?

For example, when I run the following, I get code for the "iso3" column:

library(arcgislayers)

furl <- "https://gis.unhcr.org/arcgis/rest/services/core_v2/wrl_polbnd_int_15m_a_unhcr/FeatureServer/0"

pop_fl <- arc_open(furl)

arc_select(pop_fl, fields = c("iso3"), where = "iso3='SSD'")
#> Simple feature collection with 1 feature and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 2696545 ymin: 388463.4 xmax: 4001680 ymax: 1371522
#> Projected CRS: WGS 84 / Pseudo-Mercator
#>   iso3                       geometry
#> 1  SSD MULTIPOLYGON (((3797469 106...

But it would be good to be able to get the domains values associated like you can do with {esri2sf} package, as shown below:

library(esri2sf)

furl <- "https://gis.unhcr.org/arcgis/rest/services/core_v2/wrl_polbnd_int_15m_a_unhcr/FeatureServer/0"

esri2sf(furl, outFields = c("iso3"), where = "iso3='SSD'", replaceDomainInfo = TRUE)
#> Layer Type: Feature Layer
#> Geometry Type: esriGeometryPolygon
#> Service Coordinate Reference System: 3857
#> Output Coordinate Reference System: 4326
#> Simple feature collection with 1 feature and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 24.22348 ymin: 3.487471 xmax: 35.9477 ymax: 12.22673
#> Geodetic CRS:  WGS 84
#>          iso3                          geoms
#> 1 South Sudan MULTIPOLYGON (((34.11325 9....

Did I miss it somewhere in the documentation or is there an Web service related way to do so? I know that "iso3" is not the best use case for this but it's a simple example and we do have services that code some location or population type that would be good to be able to get the label and not the code out of the arcgislayer call.

Thanks for the great package and your support on this.
Cedric

@vidonne vidonne added the enhancement New feature or request label Jan 31, 2024
@JosiahParry
Copy link
Collaborator

Thanks, Cedric! There's no support for it yet. But this is really great feedback. I think the way that I'd prefer this be handled would be outside of the arc_select() function in an effort to keep the scope of each function as minimal as possible.

It would probably be something like domain_substitute(feature_layer, .data, vars = c("a", "b", "c")).

Here's how you can accomplish it today, though!

library(dplyr)
library(arcgislayers)

furl <- "https://gis.unhcr.org/arcgis/rest/services/core_v2/wrl_polbnd_int_15m_a_unhcr/FeatureServer/0"

pop_fl <- arc_open(furl)

res <- arc_select(pop_fl, fields = c("iso3"), n_max = 10)

# start replacing domains 
var_names <- setdiff(names(res), attr(res, "sf_column"))
non_null_domains <- list_fields(pop_fl) |> 
  filter(name %in% var_names, !is.null(domain)) |> 
  select(name, domain) |> 
  tibble::deframe() 


for (.x in non_null_domains) {
  field_name <- .x$name
  # create a lookup table
  lu <- .x$codedValues[,2:1] |> 
    tibble::deframe()
  
  # modify in place
  res[[field_name]] <- lu[res[[field_name]]]
}

res
#> Simple feature collection with 10 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -6278153 ymin: -7018201 xmax: 18710210 ymax: 5957233
#> Projected CRS: WGS 84 / Pseudo-Mercator
#>                                       iso3                       geometry
#> 1             No code (ISO user specified) MULTIPOLYGON (((8835003 390...
#> 2             No code (ISO user specified) MULTIPOLYGON (((10836631 32...
#> 3                     Norfolk Island (AUS) MULTIPOLYGON (((18706427 -3...
#> 4             No code (ISO user specified) MULTIPOLYGON (((3693874 251...
#> 5             No code (ISO user specified) MULTIPOLYGON (((3970901 262...
#> 6             No code (ISO user specified) MULTIPOLYGON (((8788443 394...
#> 7                   Christmas Island (AUS) MULTIPOLYGON (((11768307 -1...
#> 8             No code (ISO user specified) MULTIPOLYGON (((3207325 106...
#> 9           Saint Pierre et Miquelon (FRA) MULTIPOLYGON (((-6271573 59...
#> 10 Heard Island and McDonald Islands (AUS) MULTIPOLYGON (((8195908 -70...

Created on 2024-01-31 with reprex v2.0.2

@JosiahParry
Copy link
Collaborator

There's probably a use case for modifying them from code to label and back. One use case would be getting data from an external source that needs to be appended or updated in a feature service. The data you get uses the labels and not the code.

Perhaps domain_encode() and domain_decode() could be provided?

@vidonne
Copy link
Author

vidonne commented Jan 31, 2024

Thanks for the feedback and current solution, really helpful.
I don't really have a strong position on the implementation but it would be a nice thing to have.
Thanks again for the great work on this package.

@dickoa
Copy link

dickoa commented Feb 1, 2024

Another option is to encode domain using the labelled. It adds a new dep but provide a mechanism to switch between labels and values. It's also supported by many packages since {haven} uses it for labelled data from Stata, SPSS, SAS, etc.

With labelled columns, you'll have this type of output:

arc_select(pop_fl, fields = "iso3", where = "iso3='SSD'")
#> Simple feature collection with 1 feature and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 2696545 ymin: 388463.4 xmax: 4001680 ymax: 1371522
#> Projected CRS: WGS 84 / Pseudo-Mercator
#>   iso3                       geometry
#>  <chr+lbl>              <MULTIPOLYGON [m]> 
#> 1  SSD [South Sudan] MULTIPOLYGON (((3797469 106...

And you can play with methods like labelled::to_character/labelled::to_factor (applied to specific columns or to the whole data) to get the labels on-demand.

@elipousson
Copy link
Contributor

FWIW - esri2sf has some code for handling domain information where available. I believe @jacpete contributed this code to the original version of esri2sf but I've never touched it myself: https://github.com/elipousson/esri2sf/blob/master/R/addDomainInfo.R

@elipousson
Copy link
Contributor

I am interested in putting in some time to implement this feature. I put together a draft list_field_domains() and pull_coded_values() function that could help.

library(arcgislayers)

list_field_domains <- function(x, field = NULL, keep_null = FALSE) {
  fields <- list_fields(x)
  nm <- fields[["name"]]
  
  domains <- rlang::set_names(
    fields[["domain"]],
    nm
  )
  
  if (!is.null(field)) {
    field <- rlang::arg_match(nm, multiple = TRUE)
    domains <- domains[nm %in% field]
  }
  
  if (keep_null) {
    return(domains)
  }
  
  domains[!vapply(domains, is.null, logical(1))]
}

pull_coded_values <- function(x, field = NULL) {
  domains <- list_field_domains(x, field = field, keep_null = FALSE)
  
  domains <- lapply(
    domains,
    \(x)  {
      if (x[["type"]] != "codedValue") {
        return(NULL)
      }
      
      values <- x[["codedValues"]]
      
      rlang::set_names(
        values[["code"]],
        values[["name"]]
      )
    }
  )
  
  domains
}


layer <- arcgislayers::arc_open(
  "https://geodata.baltimorecity.gov/egis/rest/services/Housing/dmxOwnership/MapServer/0"
)

list_field_domains(layer)
#> $RESPAGCY
#> $RESPAGCY$type
#> [1] "codedValue"
#> 
#> $RESPAGCY$name
#> [1] "Responsible Agency_1_1"
#> 
#> $RESPAGCY$description
#> [1] "Responsible Agency"
#> 
#> $RESPAGCY$codedValues
#>                            name code
#> 1                    Unassigned   01
#> 2                   BC Hospital   02
#> 3                           DGS   03
#> 4             Docks and Wharves   04
#> 5                     Education   05
#> 6               Fire Department   07
#> 7                     Libraries   06
#> 8                     Mayoralty   08
#> 9                        Health   10
#> 10                     Highways   11
#> 11                         Jail   12
#> 12                       M & CS   13
#> 13                   Mec / Elec   14
#> 14                      Museums   15
#> 15                  Rec & Parks   18
#> 16                   Sanitation   19
#> 17                       Sewers   20
#> 18      Supervisor of Elections   21
#> 19        Post Mortem Examiners   22
#> 20                 Water Supply   23
#> 21                       Police   24
#> 22               Central Garage   25
#> 23                          BCC   26
#> 24               Urban Services   27
#> 25      War Memorial Commission   28
#> 26                     Aquarium   29
#> 27                    Tax Sales   30
#> 28        Excess Street Opening   31
#> 29                  Unallocated   32
#> 30           Off Street Parking   33
#> 31                      NPA/HCD   34
#> 32              Social Services   35
#> 33 Civic and Convention Centers   36
#> 34            Transit & Traffic   37
#> 35                      Finance   38
#> 36         Commission for Aging   39
#> 37                          BDC   43
#> 
#> $RESPAGCY$mergePolicy
#> [1] "esriMPTDefaultValue"
#> 
#> $RESPAGCY$splitPolicy
#> [1] "esriSPTDefaultValue"

pull_coded_values(layer)
#> $RESPAGCY
#>                   Unassigned                  BC Hospital 
#>                         "01"                         "02" 
#>                          DGS            Docks and Wharves 
#>                         "03"                         "04" 
#>                    Education              Fire Department 
#>                         "05"                         "07" 
#>                    Libraries                    Mayoralty 
#>                         "06"                         "08" 
#>                       Health                     Highways 
#>                         "10"                         "11" 
#>                         Jail                       M & CS 
#>                         "12"                         "13" 
#>                   Mec / Elec                      Museums 
#>                         "14"                         "15" 
#>                  Rec & Parks                   Sanitation 
#>                         "18"                         "19" 
#>                       Sewers      Supervisor of Elections 
#>                         "20"                         "21" 
#>        Post Mortem Examiners                 Water Supply 
#>                         "22"                         "23" 
#>                       Police               Central Garage 
#>                         "24"                         "25" 
#>                          BCC               Urban Services 
#>                         "26"                         "27" 
#>      War Memorial Commission                     Aquarium 
#>                         "28"                         "29" 
#>                    Tax Sales        Excess Street Opening 
#>                         "30"                         "31" 
#>                  Unallocated           Off Street Parking 
#>                         "32"                         "33" 
#>                      NPA/HCD              Social Services 
#>                         "34"                         "35" 
#> Civic and Convention Centers            Transit & Traffic 
#>                         "36"                         "37" 
#>                      Finance         Commission for Aging 
#>                         "38"                         "39" 
#>                          BDC 
#>                         "43"

Created on 2024-11-14 with reprex v2.1.1

@JosiahParry
Copy link
Collaborator

Thanks @elipousson the pull_coded_values() has the lookup vector order backwards if we want to use it replace values.

I'm thinking what do we want users to do next with this? What if we wrap this all up in a recode_layer_df() the idea being that you can recode after (note after not during) having brought some features into memory.

The workflow could look something like so?

Function definitions
library(arcgislayers)

list_field_domains <- function(x, field = NULL, keep_null = FALSE) {
  fields <- list_fields(x)
  nm <- fields[["name"]]
  
  domains <- rlang::set_names(
    fields[["domain"]],
    nm
  )
  
  if (!is.null(field)) {
    field <- rlang::arg_match(nm, multiple = TRUE)
    domains <- domains[nm %in% field]
  }
  
  if (keep_null) {
    return(domains)
  }

  domains[!vapply(domains, is.null, logical(1))]
}

pull_coded_values <- function(x, field = NULL) {
  domains <- list_field_domains(x, field = field, keep_null = FALSE)
  
  domains <- lapply(
    domains,
    \(x)  {
      if (x[["type"]] != "codedValue") {
        return(NULL)
      }

      values <- x[["codedValues"]]
      rlang::set_names(
        values[["name"]], 
        values[["code"]]
      )
    }
  )
  domains
}

recode_layer_df <- function(.data, .layer) {
  encodings <- pull_coded_values(.layer)
  to_replace <- names(encodings)

  for (col in to_replace) {
    .data[[col]] <- encodings[[col]][.data[[col]]]
  }
  .data
}
layer <- arcgislayers::arc_open(
  "https://geodata.baltimorecity.gov/egis/rest/services/Housing/dmxOwnership/MapServer/0"
)

my_df <- arc_select(layer, n_max = 1000)

recode_layer_df(my_df, layer) |> 
  sf::st_drop_geometry() |> 
  dplyr::count(RESPAGCY)
#>    RESPAGCY   n
#> 1 Education   1
#> 2   NPA/HCD  50
#> 3 Tax Sales  12
#> 4      <NA> 937

@elipousson
Copy link
Contributor

I opened a PR that exports something similar to recode_layer_df() but also (mostly?) supports the value labelling from {labelled} and {haven}. It should work as long as fields with codedValue domains are character vectors only - so please let me know if that is not a safe assumption!

At the risk of a too big PR, I also exported the set_layer_col_names() function in the same PR since it follows the same pattern of post-processing for a data frame that takes the original Table or FeatureLayer object as an input. Symbology could be handled using a similar pattern, e.g. set_layer_symbology() per #106

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants