-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement subset() and transform() methods for network objects #3
Comments
would subset() be different from network::get.inducedSubraph() ? or are you thinking of it as a clearer alias? |
A clearer alias that lets the user specify the subset with vertex attributes as variable names. |
* subset() would drop vertices or edges based on a specification.
* transform() would rewrite or construct vertex or edge attributes based on a specification.
nice.
|
I may have implemented equivalents of these functions elsewhere.
|
That's the plan. |
Cool. Here's an implementation for library(rlang) # tibble dependency
suppressPackageStartupMessages(library(network)) This extracts vertex attributes (adapted from my own package, but may be handy if it were native to {network}): vert_tibble <- function(x) {
.vert_attr_names <- unique(unlist(lapply(x[["val"]], names)))
.vert_attrs <- set_names(lapply(.vert_attr_names, function(attr_name) {
unlist(lapply(lapply(x[["val"]], `[[`, attr_name), `%||%`, NA))
}), .vert_attr_names)
tibble::as_tibble(.vert_attrs)
} The actual subset function: subset.network <- function(x, .predicate, what = c("vertices", "edges")) {
what <- arg_match(what, c("vertices", "edges"))
if (what == "edges") {
.e_df <- network:::as_tibble.network(x, attrnames = TRUE)
.eids <- eval_tidy(quo(.e_df[!!enquo(.predicate), , drop = FALSE]),
data = .e_df)[[".eid"]]
return(get.inducedSubgraph(x, eid = .eids))
}
.v_df <- vert_tibble(x)
.v_df[[".vid"]] <- seq_len(nrow(.v_df))
.vids <- eval_tidy(quo(.v_df[!!enquo(.predicate), , drop = FALSE]),
data = .v_df)[[".vid"]]
get.inducedSubgraph(x, v = .vids)
} Some quick helpers for visual inspection: edge_tibble <- function(x) {
network:::as_tibble.network(x, attrnames = TRUE)
}
`%>%` <- magrittr::`%>%` Example data: data("sampson", package = "ergm")
subset(samplike, group == "Loyal" & !cloisterville)
#> Network attributes:
#> vertices = 5
#> directed = TRUE
#> hyper = FALSE
#> loops = FALSE
#> multiple = FALSE
#> total edges= 8
#> missing edges= 0
#> non-missing edges= 8
#>
#> Vertex attribute names:
#> cloisterville group vertex.names
#>
#> Edge attribute names:
#> nominations
samplike %>%
subset(group == "Loyal" & !cloisterville) %>%
vert_tibble()
#> # A tibble: 5 x 4
#> na group vertex.names cloisterville
#> <lgl> <chr> <chr> <lgl>
#> 1 TRUE Loyal Berthold FALSE
#> 2 TRUE Loyal Victor FALSE
#> 3 TRUE Loyal Ambrose FALSE
#> 4 TRUE Loyal Romauld FALSE
#> 5 TRUE Loyal Louis FALSE
samplike %>%
subset(nominations == 1, what = "edges")
#> Network attributes:
#> vertices = 18
#> directed = TRUE
#> hyper = FALSE
#> loops = FALSE
#> multiple = FALSE
#> total edges= 38
#> missing edges= 0
#> non-missing edges= 38
#>
#> Vertex attribute names:
#> cloisterville group vertex.names
#>
#> Edge attribute names:
#> nominations
samplike %>%
subset(nominations == 1, what = "edges") %>%
edge_tibble()
#> # A tibble: 38 x 5
#> .tail .head .eid na nominations
#> <int> <int> <int> <lgl> <dbl>
#> 1 5 1 1 FALSE 1
#> 2 7 1 2 FALSE 1
#> 3 1 2 3 FALSE 1
#> 4 3 2 4 FALSE 1
#> 5 6 5 20 FALSE 1
#> 6 10 8 32 FALSE 1
#> 7 10 9 41 FALSE 1
#> 8 8 10 49 FALSE 1
#> 9 18 13 72 FALSE 1
#> 10 2 15 77 FALSE 1
#> # ... with 28 more rows
samplike %>%
subset(group == "Outcasts") %>%
subset(nominations == 3, what = "edges")
#> Network attributes:
#> vertices = 4
#> directed = TRUE
#> hyper = FALSE
#> loops = FALSE
#> multiple = FALSE
#> total edges= 4
#> missing edges= 0
#> non-missing edges= 4
#>
#> Vertex attribute names:
#> cloisterville group vertex.names
#>
#> Edge attribute names:
#> nominations Thoughts? sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17134)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] network_1.14-355 rlang_0.2.2.9001
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_0.12.18 fansi_0.3.0 assertthat_0.2.0
#> [4] utf8_1.1.4 crayon_1.3.4 digest_0.6.17
#> [7] magrittr_1.5 evaluate_0.11 pillar_1.3.0.9000
#> [10] cli_1.0.0 stringi_1.2.4 rmarkdown_1.10.13
#> [13] htmldeps_0.1.1 tools_3.5.1 stringr_1.3.1
#> [16] xfun_0.3 yaml_2.2.0 compiler_3.5.1
#> [19] htmltools_0.3.6 knitr_1.20.15 tibble_1.4.2 |
is there a reason to not use the network packages standard accessors in |
I’ll change that. After putting it through more tests, any performance gained is negligible, even for large, heavily-decorated graphs. |
I can't say I'm thrilled with the additional dependencies that have been added - we should not be contributing to the ever-growing problem of R toolchain cruft. The idea was for network to be a minimal package, with minimal dependencies. Adding extra stuff just because it is convenient is a departure from the path of wisdom. |
|
@knapply , thank you for your hard work.
I wonder if it might be worth folding it into Speaking of that, in the |
Speaking of that, in the egor package, we ended up using the argument name unit= to specify
whether a function refers to vertices or edges (as in, "units of analysis"). I wonder if it
might be worth using instead of what=.
that sounds good to me.
|
The statnet team has produced invaluable tools for my work and the work of my colleagues. I'm very happy to help. I wasn't familiar with I believe the following addresses everyone's concerns. Here's an The comments are intentionally verbose to clarify the thought process and steps as much as possible. suppressPackageStartupMessages(library(network))
as.data.frame.network <- function(x, unit = c("edges", "vertices")) {
unit <- match.arg(unit, c("edges", "vertices"))
if (unit == "edges") {
# get valid, non-missing edge indices
edge_ids <- which(!vapply(x[["mel"]], is.null, logical(1L)))
# use those to get valid, non-missing tails and heads
tails <- lapply(x[["mel"]], `[[`, "outl")[edge_ids]
heads <- lapply(x[["mel"]], `[[`, "inl")[edge_ids]
# create "metadata" data.frame to use in returned object
if (is.hyper(x)) {
# keep hypergraph edges as list columns
edge_df <- data.frame(.tail = tails,
.head = heads,
.eid = edge_ids)
} else {
# cast non-hypergraph edges to atomic vectors, retaining the integer casting from
# network:::as_tibble.network()
edge_df <- data.frame(.tail = as.integer(unlist(tails)),
.head = as.integer(unlist(heads)),
.eid = edge_ids)
}
attr_names <- list.edge.attributes(x)
# extract edge attributes, subsetting on valid indices
out <- lapply(attr_names, function(e_attr) {
get.edge.attribute(x, e_attr, unlist = FALSE)[edge_ids]
})
} else if (unit == "vertices") {
# get valid, non-missing vertex indices (as a 1 column data.frame)
vert_df <- data.frame(.vid = which(!vapply(x[["val"]], is.null, logical(1L))))
# get vertex attr names
attr_names <- list.vertex.attributes(x)
# extract vertex attributes, subsetting on valid indices
out <- lapply(attr_names, function(v_attr) {
get.vertex.attribute(x, v_attr, unlist = FALSE)[vert_df[[".vid"]]]
})
}
# attach attr names to resulting list
names(out) <- attr_names
# probe attributes to determine which will be list columns (i.e. they contain non-scalar elements)
# in returned data.frame and create a logical vector to use in subsetting
list_col_index <- vapply(out, function(attr_name) {
any(vapply(attr_name, function(attr) length(attr) != 1L, logical(1L)))
}, logical(1))
# `I()`nhibit `base::data.frame`s default behavior of converting list columns
out[list_col_index] <- lapply(out[list_col_index], I)
# flatten attributes that only contain scalar elements to atomic vectors
atomic_col_names <- names(list_col_index[!list_col_index])
out[atomic_col_names] <- lapply(out[atomic_col_names], unlist)
# convert the prepped list to a data.frame
out <- as.data.frame(out, stringsAsFactors = FALSE)
# `I()` works by prepending "AsIs" to the provided argument's class, which is no longer
# necessary and will only cause confusion, so it's dropped.
out[list_col_index] <- lapply(out[list_col_index], `class<-`, "list")
# combine edge/vertex metadata with attribute data
if (unit == "edges") {
out <- cbind.data.frame(edge_df, out, stringsAsFactors = FALSE)
} else if (unit == "vertices") {
out <- cbind.data.frame(vert_df, out, stringsAsFactors = FALSE)
}
# attach the attributes used in the current network:::as_tibble.network()
attr(out, "n") <- network.size(x)
attr(out, "vnames") <- network.vertex.names(x)
if (is.bipartite(x)) {
attr(out, "bipartite") <- get.network.attribute(x, "bipartite")
}
# rownames are unnecessary for these purposes and most people will convert this to a
# `tibble`, where they're not permitted, anyways
rownames(out) <- NULL
out
} A "simple" network, that doesn't have nested attributes. `%>%` <- magrittr::`%>%`
data("sampson", package = "ergm")
as.data.frame.network(samplike, unit = "vertices") %>% head(3)
#> .vid cloisterville group na vertex.names
#> 1 1 TRUE Turks TRUE John Bosco
#> 2 2 TRUE Turks TRUE Gregory
#> 3 3 TRUE Outcasts TRUE Basil An object that does have nested attributes ( data("windsurfers", package = "networkDynamic")
as.data.frame.network(windsurfers, unit = "vertices") %>% head(3)
#> .vid active group1
#> 1 1 0, 17, 26, 28, 16, 24, 27, 31 1
#> 2 2 0, 5, 9, 18, 23, 29, 2, 8, 11, 21, 24, 31 1
#> 3 3 0, 6, 8, 10, 18, 23, 25, 29, 4, 7, 9, 14, 21, 24, 28, 31 0
#> group2 na regular vertex.names
#> 1 0 FALSE 1 1
#> 2 0 FALSE 1 2
#> 3 0 FALSE 1 3 ... and the matrix dimensions are safely preserved. as.data.frame.network(windsurfers)$active[[1]]
#> [,1] [,2]
#> [1,] 0 1
#> [2,] 2 3
#> [3,] 7 8
#> [4,] 9 14
#> [5,] 17 18
#> [6,] 19 24
#> [7,] 26 27
#> [8,] 28 29
#> [9,] 30 31 Returned objects are still compatible with modern data frames like identical(
network:::as_tibble.network(windsurfers, attrnames = TRUE),
tibble::as_tibble(as.data.frame.network(windsurfers))
)
#> [1] TRUE
tibble::as_tibble(as.data.frame.network(windsurfers))
#> # A tibble: 556 x 5
#> .tail .head .eid active na
#> <int> <int> <int> <list> <lgl>
#> 1 4 1 1 <dbl [9 x 2]> FALSE
#> 2 5 1 2 <dbl [2 x 2]> FALSE
#> 3 6 1 3 <dbl [5 x 2]> FALSE
#> 4 8 1 4 <dbl [1 x 2]> FALSE
#> 5 9 1 5 <dbl [1 x 2]> FALSE
#> 6 11 1 6 <dbl [2 x 2]> FALSE
#> 7 4 2 7 <dbl [6 x 2]> FALSE
#> 8 5 2 8 <dbl [3 x 2]> FALSE
#> 9 7 2 9 <dbl [1 x 2]> FALSE
#> 10 5 4 10 <dbl [3 x 2]> FALSE
#> # ... with 546 more rows
data.table::data.table(as.data.frame.network(windsurfers))
#> .tail .head .eid active na
#> 1: 4 1 1 0, 2, 7, 9,17,19, 1, 3, 8,14,18,24,... FALSE
#> 2: 5 1 2 0,10, 2,11 FALSE
#> 3: 6 1 3 0, 6,10,23,30, 1, 8,11,24,31,... FALSE
#> 4: 8 1 4 0,1 FALSE
#> 5: 9 1 5 0,1 FALSE
#> ---
#> 552: 25 14 552 30,31 FALSE
#> 553: 25 68 553 30,31 FALSE
#> 554: 38 25 554 30,31 FALSE
#> 555: 16 15 555 30,31 FALSE
#> 556: 95 16 556 30,31 FALSE If there is ultimately a decision to drop the as_tibble.network <- function(x, unit = "edges") {
if (!requireNamespace("tibble", quietly = TRUE)) {
stop("The tibble package is required for this functionality.", call. = FALSE)
}
tibble::as_tibble(as.data.frame.network(x, unit = unit))
}
as_tibble.network(samplike, unit = "vertices")
#> # A tibble: 18 x 5
#> .vid cloisterville group na vertex.names
#> <int> <lgl> <chr> <lgl> <chr>
#> 1 1 TRUE Turks TRUE John Bosco
#> 2 2 TRUE Turks TRUE Gregory
#> 3 3 TRUE Outcasts TRUE Basil
#> 4 4 TRUE Loyal TRUE Peter
#> 5 5 TRUE Loyal TRUE Bonaventure
#> 6 6 FALSE Loyal TRUE Berthold
#> 7 7 TRUE Turks TRUE Mark
#> 8 8 FALSE Loyal TRUE Victor
#> 9 9 FALSE Loyal TRUE Ambrose
#> 10 10 FALSE Loyal TRUE Romauld
#> 11 11 FALSE Loyal TRUE Louis
#> 12 12 FALSE Turks TRUE Winfrid
#> 13 13 FALSE Outcasts TRUE Amand
#> 14 14 FALSE Turks TRUE Hugh
#> 15 15 FALSE Turks TRUE Boniface
#> 16 16 FALSE Turks TRUE Albert
#> 17 17 FALSE Outcasts TRUE Elias
#> 18 18 FALSE Outcasts TRUE Simplicius I'd also recommend that a parameter be given to list.vertex.attributes <- function(x, .sort = TRUE) {
if (!is.network(x)) {
stop("list.vertex.attributes requires an argument of class network.\n")
}
if (network.size(x) == 0) {
return(NULL)
}
allnam <- unlist(sapply(x$val, names))
out <- unique(as.vector(allnam))
if (!.sort) {
return(out)
}
sort(out)
} Apologies for the amount of text here. I can split this up into different issues if that's more palatable. Cheers. |
…butes instead of edge attributes. References #3.
Apologies for the force-updates. Note that |
* add UTF-8 encoding (roxygen throws warning without it), update zzz.R (documentation doesn't build with old .onLoad() and network-package.R now has @useDynLib) git push * finish up documentation, verify all is still good * rebuild * fixed botched parentheses local test doesn't catch
subset()
would drop vertices or edges based on a specification.transform()
would rewrite or construct vertex or edge attributes based on a specification.The text was updated successfully, but these errors were encountered: