-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
87 lines (52 loc) · 4.25 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = F}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
###String2AdjMatrix
This package has been designed to provide functions which enable the construction of adjacency matrices, from a specified string of characters. It contains functions to generate an empty adjacency matrix (with each unique substring as a row name or column name) from a given list of strings. This is useful when performing social network anaylsis. In this current release, the strings are bidirectional, meaning it doesn't matter where two unique substrings occur in the string.
The output matrices can be used directly with packages such as `igraph()`
Generating large matrices is computationally intensive and may take a while.
####Example
```{r, echo = T}
library(String2AdjMatrix)
#Start with character string to generate an adjacency matrix from
string_in = c('apples, pears, bananas', 'apples, bananas', 'apples, pears')
#Generate a new blank matrix
blank_matrix = generate_adj_matrix(string_in)
#Now fill the matrix
string_2_matrix(blank_matrix, string_in)
```
####Installation instructions
#####From CRAN - stable release
`install.packages('String2AdjMatrix')`
#####From GitHub - development version
`devtools::install_github('tomdrake\String2AdjMatrix')`
####Dependencies
Requires `stringr`
####Overview
This package comprises three functions; `generate_adj_matrix`, `string_2_matrix` and `string_2_matrix_x`.
#####1. `generate_adj_matrix()`
Generates an adjacency matrix from a given string. Detects unique values and generates a blank matrix with colnames and rownames of each unique value in supplied string.
The `string_data` argument is the string from which the unique values and matrix will be generated.
The `data_separator` argument is the chracter separating specified substrings in the given string. Default is `,`.
The `remove_spaces` argument will remove spaces from the header values (thus disrupting the search unless all spaces are removed in the given string in next steps). This is useful for separating strings with an irregular number of spaces between the same substrings.
Data must be provided as a character string.
#####2. `string_2_matrix()`
Iteratively applies `string_2_matrix_x()` to each column of a matrix. Use to generate an entire adjacency matrix.
The `new_matrix` element of the function should be either the matrix generated by `generate_adj_matrix()` or an empty data matrix of equal number of rows and columns. These should have unique values specified as the row names and column names.
The `supplied_string` element refers to the string in which the search is to be performed. i.e `list = c('apples, pears, bananas', 'apples, bananas', 'apples, pears')`
The `self` option specifies how to handle data when the specified object is found within a string. Default is 0. i.e. the adjacency matrix does not count it when the substring is found, only when the substring is found in combination with another unique substring.
Generating large matrices is computationally intensive and may take a while.
#####3. `string_2_matrix_x()`
This function takes a specified column in a matrix and identifies how many times that substring appears with each row name for a given set of strings. Use to generate one column of adjacency data. Also used iteratively as part of `string_2_matrix()`
The `new_matrix` element of the function should be either the matrix generated by `generate_adj_matrix()` or an empty data matrix of equal number of rows and columns. These should have unique values specified as the row names and column names.
The `supplied_string` element refers to the string in which the search is to be performed. i.e `list = c('apples, pears, bananas', 'apples, bananas', 'apples, pears')`
The `coord_x` argument specifies the number column for which to convert. i.e. `coord_x = 1` is the first column of a data matrix, `coord_x = 12` is the twelfth column of a data matrix.
The `self` option specifies how to handle data when the specified object is found within a string. Default is 0. i.e. the adjacency matrix does not count it when the substring is found, only when the substring is found in combination with another unique substring.