Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

as.network.data.frame does not handle two-mode adjacency matrices correctly #64

Open
CarterButts opened this issue Sep 19, 2021 · 10 comments
Labels

Comments

@CarterButts
Copy link
Contributor

CarterButts commented Sep 19, 2021

When called with a two-mode adjacency matrix, as.network.matrix will correctly interpret this as a graph with an enforced bipartition, with the passed matrix being the off-diagonal portion of the full adjacency matrix. as.network.data.frame will not, and indeed returns errors if e.g. the matrix is square and has non-zero diagonal entries when loops==FALSE. (See also related issue of as.network.data.frame not respecting the same semantics as as.network.matrix.) Setting loops=TRUE and bipartite=TRUE does not rectify the problem, because it throws an error when loops are set on bipartite graphs.

For this issue, the needed fix is for as.network.data.frame to correctly detect and implement two-mode matrix processing. Here is a demonstration:

#Create a two-mode adjacency structure
set.seed(1331)
g<-matrix(rbinom(25,1,0.2),5,5)
diag(g)<-1

#Coerce to a graph the traditional way
gn<-as.network.matrix(g,bipartite=T)
gn
gn[,]

#Now, with data frames.
gf<-as.data.frame(g)
gfn<-as.network(gf,bipartite=T)         #Loop error!
gfn<-as.network(gf,bipartite=T,loops=T) #Displeased!
as.network.matrix(gf,bipartite=T)       #Works fine!

We should be seeing the same behavior for as.network.data.frame as as.network.matrix here, and are not.

@knapply
Copy link
Contributor

knapply commented Sep 30, 2021

(for this and #65)

I pointed out that using S3 dispatch would be a breaking change when it was requested that I use that instead of the function I originally proposed (network_from_data_frame()).

#20 (comment)

@martinamorris
Copy link
Member

Ok, so it sounds like we have an issue to address. IIUC, using the S3 dispatch for this function is what causes the breakage. I believe this choice was originally motivated by maintainability concerns. @CarterButts do you have a preferred solution?

@krivit
Copy link
Member

krivit commented May 21, 2022

I have mixed feelings about this. To me, a data frame is not a generalisation of a matrix or an array, though for bipartite networks, it's a bit less clear-cut.

That having been said, if bipartite=TRUE, and the matrix looks like an adjacency matrix, it makes more sense for as.network.data.frame() to interpret the way as.network.matrix() does: that rows are actors and columns are events. From what I understand, it currently interprets it as the "expanded bipartite" representation, in which both rows and columns contain both actors and events, and actor-actor and event-event blocks are fixed at 0.

I think this would fix @CarterButts's issue. @knapply, is there any reason not to change the bipartite=TRUE handling of adjacency data frames to be consistent with the matrix method?

@knapply
Copy link
Contributor

knapply commented May 21, 2022

The input shouldn't be a data frame if it's supposed to be a matrix.

The errors could probably be more informative ("is this supposed to be an adjacency matrix? If so, use as.matrix() first."), but this is not a bug -- it's user error.

If memory serves, the reason this is an issue is because the original as.network() default skipped S3 dispatch and called as.network.matrix() directly instead of attempting to coerce the input to a matrix first. Something like as.network(as.matrix(x)).

I'm assuming this normalized the behavior of passing data frames as input that really should've been matrices.

@krivit
Copy link
Member

krivit commented May 21, 2022

@knapply, perhaps I misremembered. Does as.network.data.frame() always treat the input data frame as an edge list of some type?

@jdohmen
Copy link

jdohmen commented Jun 15, 2022

Has anything been changed in the as.network command? I can no longer read in all my empirical data after a statnet update. It used to work fine. Can´t find the error. Thank you guys!

> WissOperativeAnpass1 <- read_excel("WissOperativeAnpass.xlsx")
> NetWissOperativeAnpass1 <- as.network(WissOperativeAnpass1)
Error: `loops` is `FALSE`, but `x` contains loops.
The following values are affected:
	- `x[1, 1:2]`
	- `x[2, 1:2]`
	- `x[3, 1:2]`
	- `x[4, 1:2]`
	- `x[5, 1:2]`
	- `x[6, 1:2]`

Data

Also an issue here: https://community.rstudio.com/t/as-network-file-ergm-error-loops-is-false-but-x-contains-loops/115793

@mbojan
Copy link
Member

mbojan commented Jun 15, 2022

@jdohmen indeed, in the recent version of network the data.frame is interpreted as an edgelist (first two columns) plus optional edge attributes (the remaining columns, if any). In your case the data frame is a "two-mode" (non-square) adjacency matrix. What you need is convert it to R matrix with e.g. data.matrix(), for example:

d <- data.frame(
+   a = c(0,0,1,1),
+   b = c(0,0,1,0),
+   c = c(1,1,0,0)
+ )

net <- as.network(data.matrix(d), bipartite = TRUE)
as.matrix(net)
#   a b c
# 1 0 0 1
# 2 0 0 1
# 3 1 1 0
# 4 1 0 0

In your case it will be something like

WissOperativeAnpass1 <- read_excel("WissOperativeAnpass.xlsx")
NetWissOperativeAnpass1 <- as.network(data.matrix(WissOperativeAnpass1), bipartite = TRUE)

... assuming you have no other columns in Excel beyond the adjacency information.

@mbojan
Copy link
Member

mbojan commented Jun 15, 2022

@krivit @knapply @CarterButts , is it feasible to retain the original behavior by having an argument to as.network.data.frame() for the case above (#64 (comment)). I'm thinking input = c("adjacency", "edgelist") (and then match.arg() internally) or simply adjacency = TRUE (or FALSE if edgelist)?

@jdohmen
Copy link

jdohmen commented Jun 15, 2022

@krivit @knapply @CarterButts , is it feasible to retain the original behavior by having an argument to as.network.data.frame() for the case above (#64 (comment)). I'm thinking input = c("adjacency", "edgelist") (and then match.arg() internally) or simply adjacency = TRUE (or FALSE if edgelist)?

Then PLEASE also add a NODELIST (ego, alter1, alter2). Empirical survey data mostly comes as a NODELIST. I have spent so much time with getting nodelists into statnet:

NODELIST

@mbojan
Copy link
Member

mbojan commented Jun 15, 2022

@jdohmen I've made a separate issue #79 about such structured input. I believe this is so-called "adjacency list" (ego id and ids of it's "neighbors").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants