-
Notifications
You must be signed in to change notification settings - Fork 0
/
1ccombineaccounts_sqldf.Rmd
496 lines (376 loc) · 22.6 KB
/
1ccombineaccounts_sqldf.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
---
title: "Council accounts"
output: html_notebook
---
# Council accounts: querying change across different years
This notebook details the process of querying multiple council accounts to make comparisons.
The data comes from the Ministry of Housing, Communities & Local Government's [Local authority revenue expenditure and financing collection](https://www.gov.uk/government/collections/local-authority-revenue-expenditure-and-financing).
We have downloaded the data for 2010/11 onwards with the intention of using the script to add new figures when new data is published.
To make this script work with the new data, download it to the same location as this file, and change the line below:
```{r}
#Change this name to the name of the latest dataset
filename <- "RS_2017-18_data_by_LA.xlsx"
```
## Breaking down the problem
Here are the tasks we need to complete:
1. Load the spreadsheets into R
2. Perform basic cleaning (remove redundant rows, combine/clarify headings across multiple cells)
3. Create a subset containing the data we want to focus on
4. Perform analysis across multiple data frames
The files are in the same folder in this script with names like *RS_2015-16_data_by_LA_-_Revision.xlsx*. Some use one format while the latest 2 use a different format. This makes it problematic for combining using functions like `cbind` (see other notebooks in this folder), but SQL queries mean we don't have to combine at all - the combination takes place within analysis.
Let's start with activating some useful packages.
```{r}
#Activate the library for handling Excel files, and for calling URLs
library(readxl)
#Activate the library for running SQL-style queries
library(sqldf)
#You could download the files directly, in which case you need this library
#library(httr)
```
## Clean the spreadsheet while importing
Now to import those sheets, and clean along the way.
There are two problems here: first, the column headers don't start until row 4; and second, those headers are spread across 3 rows.
The most detailed headings are in row 7, with related codes for each heading in row 6, and broader categories in row 5. So we skip to row 7 for the main dataset, and store the other headings in another data frame just in case.
This is best stored in a function so we can run it on the new data - but we test it first before creating any functions.
```{r}
#Read that file into a new data frame
#We want the 3rd sheet, and to skip the first 6 lines so row 7 is used for headers
#We also ask it to class dashes and lack of data as NA
ro1516 <- read_excel("RS_2015-16_data_by_LA_-_Revision.xlsx",
sheet = 3,
skip = 6,
na = "c('-','')",
col_types = "guess") #This helps numeric columns be treated as such
summary(ro1516)
```
Let's also store the other headings in case we need to reintegrate them:
```{r}
#Grab the other headings
ro1516.otherheads <- read_excel("RS_2015-16_data_by_LA_-_Revision.xlsx",
sheet = 3, skip = 4, n_max = 1, range = "A6:IR6")
```
Now to try the same process on another year's spreadsheet, checking if the column names are the same as our other imported data:
```{r}
#Read file into a new data frame
#We want the 3rd sheet, and to skip the first 6 lines so row 7 is used for headers
#We also ask it to class dashes and lack of data as NA
ro1617 <- read_excel("RS_2016-17_data_by_LA.xlsx",
sheet = 3,
skip = 6,
na = "c('-','')")
# Test the columns are the same
colnames(ro1516) == colnames(ro1617)
```
The columns are not all the same. This makes it harder to combine, so we'll come back to that.
We can make it a bit easier to compare by bringing in the *codes* associated with each column of data. That was stored in the 'otherheads' data frame we created. This is how we can add those to the front of each column in our data:
```{r}
#Check the colnames
colnames(ro1516)
#Add to the front of each the code from the other list
colnames(ro1516) <- paste(head(colnames(ro1516.otherheads),length(colnames(ro1516))),
colnames(ro1516), sep=":")
#Show us the column names now
colnames(ro1516)
```
## Import the latest data (2017/18)
Now we can run the lines below which import the latest data.
```{r}
#Read that file into a new data frame
#We want the 3rd sheet, and to skip the first 6 lines so row 7 is used for headers
#We also ask it to class dashes and lack of data as NA
ro1718 <- read_excel(filename,
sheet = 3,
skip = 6,
na = "c('-','')",
col_types = "guess") #This helps numeric columns be treated as such
summary(ro1718)
```
### Import the older data (2010 - 2015)
The older datasets have a different structure so need to be imported and handled separately before being combined. We don't use most of these again in this workbook, but in a separate workbook (1ccombineaccounts_cbind.Rmd) these are cleaned and subsetted.
```{r}
#Read the file into a new data frame
#We want the 3rd sheet, and to skip the first 5 lines so row 6 is used for headers
#We also ask it to class dashes and lack of data as NA
ro1112 <- read_excel("Revenue_Outturn_Summary__RS__data_2010-11_by_LA_-_27-Nov-2012-v2.xls", sheet = 3, skip = 5, na = "c('-','')",col_types = "guess")
ro1112 <- read_excel("Revenue_Outturn_Summary__RS__data_2011-12_by_LA_-_Revised_28-Nov-2013.xls", sheet = 3, skip = 5, na = "c('-','')",col_types = "guess")
ro1213 <- read_excel("Revenue_Outturn_Summary__RS__data_2012-13_by_LA__Revised__-_18-Feb-2014.xls", sheet = 3, skip = 5, na = "c('-','')",col_types = "guess")
ro1314 <- read_excel("Revenue_Outturn_Summary__RS__data_2013-14_by_LA_-_Revised_28-Jan-2015.xls", sheet = 3, skip = 5, na = "c('-','')",col_types = "guess")
ro1415 <- read_excel("Revenue_Outturn_Summary__RS__data_2014-15_by_LA_-_02-Feb-2016.xls", sheet = 3, skip = 5, na = "c('-','')",col_types = "guess")
```
## Create a subset for main overall categories of spending
There are 100 variables in this spreadsheet, so let's narrow down to the ones we want to focus on.
We can use regex to find the columns where 'Social Care' is mentioned:
```{r}
#The grepl function is used to generate a series of TRUE/FALSE matches for each column
#That is then nested within square brackets after the colnames function to show those that are TRUE
colnames(ro1516)[grepl("Social Care",colnames(ro1516))]
#Let's try some others
colnames(ro1516)[grepl(".*Highway.*",colnames(ro1516))]
colnames(ro1516)[grepl(".*GFRA*",colnames(ro1516))]
colnames(ro1516)[grepl("Public Health",colnames(ro1516))]
colnames(ro1516)[grepl(".*Cultural*",colnames(ro1516))]
colnames(ro1516)[grepl(".*Environmental*",colnames(ro1516))]
colnames(ro1516)[grepl(".*Planning*",colnames(ro1516))]
colnames(ro1516)[grepl(".*Education*",colnames(ro1516))]
#And reserves
colnames(ro1516)[grepl(".*at 1 April*",colnames(ro1516))]
```
We also need to grab the first few columns:
```{r}
ro1516.las <- ro1516[c(1,2,3,5)]
colnames(ro1516.las)
```
Now that we think we have the right columns, let's generate that subset:
```{r}
#Create a list of key terms that we can use to match columns against
listofterms <- "Social Care|Highway|GFRA|Public Health|Cultural|Environmental|Planning|Education|at 1 April"
#This tells us which columns match that list
grepl(listofterms, colnames(ro1516))
#We nest that in square brackets to index the relevant columns from the data frame as a whole
ro1516.sub <- ro1516[grepl(listofterms, colnames(ro1516))]
#Check the results
colnames(ro1516.sub)
#Add in the LA details extracted earlier
ro1516.sub <- cbind(ro1516.las,ro1516.sub)
```
We've now gone from 100 variables to 16, which is going to be much easier to deal with.
## Storing in a function
The question is, can we store all those steps in a function so we don't have to write them all over again?
```{r}
stripbackro.horiz <- function(filename, year){
#read from Excel into an object
roobject <- read_excel(filename,
sheet = 3,
skip = 6,
na = "c('-','')",
col_types = "guess")
#Store the first few columns separately
roobject.las <- roobject[c(1,2,3,5)]
#Read from Excel again to grab the codes
roobject.otherheads <- read_excel(filename,
sheet = 3, skip = 4, n_max = 1, range = "A6:IR6")
#Combine the codes with the relevant column headings
colnames(roobject) <- paste(head(colnames(roobject.otherheads),length(colnames(roobject))),
colnames(roobject), sep=": ")
#Add the year as a prefix
colnames(roobject) <- paste(year, colnames(roobject), sep=": ")
#Create a list of terms that we can use to filter
listofterms <- "Social Care|Highway|GFRA|Public Health|Cultural|Environmental|Planning|Education|at 1 April"
#Use grepl to filter to just those columns containing the keywords
roobject.sub <- roobject[grepl(listofterms,
colnames(roobject))]
#Add in the LA details extracted earlier
roobject.sub <- cbind(roobject.las,roobject.sub)
#No need to add the year in this version
#roobject.sub$year <- year
#Return to whatever called this function
return(roobject.sub)
}
```
And can we remove the 'other' and shire districts..?
## SQL queries across multiple objects
Instead of querying the combined data we can use SQL to query the earliest and latest datasets.
The query is quite long. The first part selects all the fields it wants to use. In some cases it renames those using `AS`, and in some cases it creates new fields by performing calculations. For example:
`(ro1617.`Cultural and related services` - ro1112.`Cultural and related services`) / ro1112.`Cultural and related services` AS culpercchange1117`
This subtracts one figure (the 2012 figures - used because social and adult care were not separated in 2010/11) from another (the 2017 figures), then divides the result by the original figure (2011) to calculate a percentage change.
The second part uses `LEFT JOIN` to join the 2016/17 dataset to the 2010/11 dataset and specifies `ON` which column the two can be matched.
The third part uses `WHERE` to filter so we don't get shire districts or 'other' authorities.
And the final part uses `ORDER BY` so we get it ordered by authorities with the largest percentage change.
```{r}
sqldf("SELECT ro1112.`E-code`, ro1112.`Local authority`, ro1112.`Class`, ro1112.`Cultural and related services` AS culture1112, ro1617.`Cultural and related services` AS culture1617, ro1617.`Cultural and related services` - ro1112.`Cultural and related services` AS culchange1117, (ro1617.`Cultural and related services` - ro1112.`Cultural and related services`)/ro1112.`Cultural and related services` AS culpercchange1117 from ro1112
LEFT JOIN ro1617
ON ro1112.`E-code` = ro1617.`E-code`
WHERE ro1112.`Class` != 'O' AND ro1112.`Class` != 'SD' AND ro1112.`E-code` != 'NA'
ORDER BY culpercchange1117 ASC
")
```
Now the code is adapted for the latest data (2017/18:
```{r}
sqldf("SELECT ro1112.`E-code`, ro1112.`Local authority`, ro1112.`Class`, ro1112.`Cultural and related services` AS culture1112, ro1718.`Cultural and related services` as culture1718, ro1718.`Cultural and related services` - ro1112.`Cultural and related services` AS culchange1118, (ro1718.`Cultural and related services` - ro1112.`Cultural and related services`)/ro1112.`Cultural and related services` AS culpercchange1118 from ro1112
LEFT JOIN ro1718
ON ro1112.`E-code` = ro1718.`E-code`
WHERE ro1112.`Class` != 'O' AND ro1112.`Class` != 'SD' AND ro1112.`E-code` != 'NA'
ORDER BY culpercchange1118 ASC
")
```
At the moment this isn't stored anywhere, but we test it against other categories first...
### For culture
```{r}
sqldf("SELECT ro1112.`E-code`, ro1112.`Local authority`, ro1112.`Class`, ro1112.`Cultural and related services` AS culture1112, ro1718.`Cultural and related services` as culture1718, ro1718.`Cultural and related services` - ro1112.`Cultural and related services` AS culchange1118, (ro1718.`Cultural and related services` - ro1112.`Cultural and related services`)/ro1112.`Cultural and related services` AS culpercchange1118 from ro1112
LEFT JOIN ro1718
ON ro1112.`E-code` = ro1718.`E-code`
WHERE ro1112.`Class` != 'O' AND ro1112.`Class` != 'SD' AND ro1112.`E-code` != 'NA'
ORDER BY culpercchange1118 ASC
")
```
## Automatically generate the SQL query based on a column
We need to find a way to generate the SQL query, given a column heading. First let's see what we have:
```{r}
head(colnames(ro1617),20)
```
Let's try to generate a query that should give us the same results as the cultural one:
```{r}
#Specify the column we want to grab
colwewant <- "`Cultural and related services`"
#Generate the query by using paste to insert that column in the appropriate places
querywewant <- paste("SELECT ro1112.`E-code`, ro1112.`Class`, ro1112.`Local authority`, ro1112.",", ro1112."," AS culture1112, ro1718."," as culture1718, ro1718."," - ro1112."," AS culchange1118, (ro1718."," - ro1112.",")/ro1112."," AS culpercchange1118 from ro1112 LEFT JOIN ro1718 ON ro1112.`E-code` = ro1718.`E-code` WHERE ro1112.`Class` != 'O' AND ro1112.`Class` != 'SD' AND ro1112.`E-code` != 'NA' ORDER BY culpercchange1118 ASC","",sep=colwewant)
#The paste function inserts that string at the end too but we need to remove that
#So we work out how many characters are in the query
nchar(querywewant)
#How many are in the text we want to remove
nchar(colwewant)
#And the characters we need by subtracting one from the other
nchar(querywewant) - nchar(colwewant)
#Now 'clean up' the query by reducing it to the characters up to the point where the unneeded text appears
querywewant <- substr(querywewant,1,nchar(querywewant) - nchar(colwewant))
```
Now to test:
```{r}
querywewant
sqldf(querywewant)
```
Let's store this in a function:
```{r}
givemesqlquery <- function(colwewant){
#Generate the query by using paste to insert that column in the appropriate places
querywewant <- paste("SELECT ro1112.`E-code`, ro1112.`Class`, ro1112.`Local authority`, ro1112.",", ro1112."," AS figs1112, ro1718."," as figs1718, ro1718."," - ro1112."," AS change1118, (ro1718."," - ro1112.",")/ro1112."," AS percchange1118 from ro1112 LEFT JOIN ro1718 ON ro1112.`E-code` = ro1718.`E-code` WHERE ro1112.`Class` != 'O' AND ro1112.`Class` != 'SD' AND ro1112.`E-code` != 'NA' ORDER BY percchange1118 ASC","",sep=colwewant)
#The paste function inserts that string at the end too but we need to remove that
#Now 'clean up' the query by reducing it to the characters up to the point where the unneeded text appears
querywewant <- substr(querywewant,1,nchar(querywewant) - nchar(colwewant))
#Run that query and store results
resultsofquery <- sqldf(querywewant)
return(resultsofquery)
}
```
And test again, this time storing the results in a new dataframe:
```{r}
changeculture1218 <- givemesqlquery("`Cultural and related services`")
changeculture1218
```
### For highways
Try it for highways:
```{r}
changehighway1218 <- givemesqlquery("`Highways and transport services`")
changehighway1218
```
### For housing, environment, planning
Try it for 3 other columns:
```{r}
head(colnames(ro1617),20)
changehousing1218 <- givemesqlquery("`Housing services (GFRA only)`")
changeenviro1218 <- givemesqlquery("`Environmental and regulatory services`")
changeplanning1218 <- givemesqlquery("`Planning and development services`")
changehousing1218
changeenviro1218
changeplanning1218
```
### For social care
Social care and public health weren't in the 2010/11 figures so we need to generate a different query. This means adapting our function:
```{r}
givemesqlquery.fromto <- function(colwewant,fromdata,todata){
#Generate the query by using paste to insert that column in the appropriate places
querywewant <- paste("SELECT ro1112.`E-code`, ro1112.`Local authority`, ro1112.`Class`, ro1112.",", ro1112."," AS figs1112, ro1718."," as figs1718, ro1718."," - ro1112."," AS change, (ro1718."," - ro1112.",")/ro1112."," AS percchange from ro1112 LEFT JOIN ro1718 ON ro1112.`E-code` = ro1718.`E-code` WHERE ro1112.`Class` != 'O' AND ro1112.`Class` != 'SD' AND ro1112.`E-code` != 'NA' ORDER BY percchange ASC","",sep=colwewant)
#The paste function inserts that string at the end too but we need to remove that
#Now 'clean up' the query by reducing it to the characters up to the point where the unneeded text appears
querywewant <- substr(querywewant,1,nchar(querywewant) - nchar(colwewant))
#replace the specific datasets with those passed to the function
querywewant <- gsub("ro1112",fromdata,querywewant)
querywewant <- gsub("ro1718",todata,querywewant)
querywewant <- gsub("figs1112",fromdata,querywewant)
querywewant <- gsub("figs1718",todata,querywewant)
#Uncomment for debugging
#print(querywewant)
#Run that query and store results
resultsofquery <- sqldf(querywewant)
return(resultsofquery)
}
```
### Child social care, adult social care: 2012-18
Now using the new function
```{r}
changechildsoc1218 <- givemesqlquery.fromto("`Children Social Care`","ro1112","ro1718")
changeadultsoc1218 <- givemesqlquery.fromto("`Adult Social Care`","ro1112","ro1718")
changechildsoc1218
changeadultsoc1218
```
### Aside: data checking
It's worth pointing out that if we matched on local authority name we would have missed a few councils beacuse Middlesbrough is misspelt as "Middlesborough" in the earlier years.
```{r}
middbro <- subset(ro1718, ro1718$`E-code` == "E0702")[grep("Local authority|nallocated|earmarked",colnames(ro1718))]
middbro
middbro <- subset(ro1112, ro1112$`E-code` == "E0702")[grep("Local authority|nallocated|earmarked",colnames(ro1112))]
middbro
```
### Changes in reserves
When it comes to comparing reserves across different years we have a problem with naming:
```{r}
colnames(ro1112)[grep("reserves",colnames(ro1112))]
colnames(ro1718)[grep("reserves",colnames(ro1718))]
```
We can look for patterns in these to specify better terms rather than 'reserves', which match fewer columns:
```{r}
colnames(ro1112)[grep("unallocated|earmarked",colnames(ro1112))]
colnames(ro1718)[grep("unallocated|earmarked",colnames(ro1718))]
```
We can edit a previous SQL query directly to get some results:
```{r}
#Show us the column names so we can use them in the query
colnames(ro1112)[grep("nallocated|earmarked",colnames(ro1112))]
colnames(ro1718)[grep("nallocated|earmarked",colnames(ro1718))]
#Run a SQL query and store the results in a new data frame
changeunallocated1218 <- sqldf("SELECT ro1112.`E-code`, ro1112.`Local authority`, ro1112.`Class`, ro1112.`Unallocated financial reserves level`, ro1112.`Unallocated financial reserves level` AS ro1112, ro1718.`Estimated other earmarked financial reserves level at 31 March` as ro1718, ro1718.`Estimated unallocated financial reserves level at 31 March` - ro1112.`Unallocated financial reserves level` AS change, (ro1718.`Estimated unallocated financial reserves level at 31 March` - ro1112.`Unallocated financial reserves level`)/ro1112.`Unallocated financial reserves level` AS percchange from ro1112
LEFT JOIN ro1718 ON ro1112.`E-code` = ro1718.`E-code`
WHERE ro1112.`Class` != 'O' AND ro1112.`Class` != 'SD' AND ro1112.`E-code` != 'NA'
ORDER BY percchange ASC",
#This just ensures that columns aren't reclassed as text when first row is text
method = c("character","character","numeric","numeric","numeric","numeric","numeric"))
changeunallocated1218
```
```{r}
changeearmarked1218 <- sqldf("SELECT ro1112.`E-code`, ro1112.`Local authority`, ro1112.`Class`, ro1112.`Other earmarked financial reserves level`, ro1112.`Other earmarked financial reserves level` AS ro1112, ro1718.`Estimated other earmarked financial reserves level at 31 March` as ro1718, ro1718.`Estimated other earmarked financial reserves level at 31 March` - ro1112.`Other earmarked financial reserves level` AS change, (ro1718.`Estimated other earmarked financial reserves level at 31 March` - ro1112.`Other earmarked financial reserves level`)/ro1112.`Other earmarked financial reserves level` AS percchange from ro1112
LEFT JOIN ro1718 ON ro1112.`E-code` = ro1718.`E-code`
WHERE ro1112.`Class` != 'O' AND ro1112.`Class` != 'SD' AND ro1112.`E-code` != 'NA'
ORDER BY percchange ASC",
#This just ensures that columns aren't reclassed as text when first row is text
method = c("character","character","numeric","numeric","numeric","numeric","numeric"))
changeearmarked1218
```
## Write to XLSX
We can use the `openxlsx` package to write an Excel workbook with all the tables as sheets.
```{r}
#Uncomment this if you get an error message about the package not being installed
#install.packages("openxlsx")
library(openxlsx)
#Store a list of sheet names twinned with data frame names
listofdatasets <- list("changeadultsoc1218" = changeadultsoc1218, "changechildsoc1218" = changechildsoc1218, "changeculture1218" = changeculture1218, "changeenviro1218" = changeenviro1218,"changehighway1218" = changehighway1218, "changehousing1218" = changehousing1218, "changeplanning1218" = changeplanning1218, "changeunallocated1218" = changeunallocated1218, "changeearmarked1218", changeearmarked1218)
#Write to XLSX
write.xlsx(listofdatasets, file = "changes1218.xlsx")
```
We can also generate a query that pulls in from all tables:
```{r}
alltables <- sqldf("SELECT * FROM changeadultsoc1218,changechildsoc1218,changeculture1218,changeenviro1218,changehighway1218,changehousing1218,changeplanning1218")
```
```{r}
alltables <- sqldf("SELECT * FROM changeadultsoc1218,changechildsoc1218,changeenviro1218,changehighway1218,changehousing1218,changeplanning1218,changeunallocated1218,changeearmarked1218
WHERE changeadultsoc1218.`E-code` = changechildsoc1218.`E-code`
AND changeadultsoc1218.`E-code` = changeenviro1218.`E-code`
AND changeadultsoc1218.`E-code` = changehighway1218.`E-code`
AND changeadultsoc1218.`E-code` = changehousing1218.`E-code`
AND changeadultsoc1218.`E-code` = changeplanning1218.`E-code`
AND changeadultsoc1218.`E-code` = changeunallocated1218.`E-code`
AND changeadultsoc1218.`E-code` = changeearmarked1218.`E-code`")
write.csv(alltables,"changetablescombined.csv")
```
Can we extract averages, etc.
```{r}
#The categories are always in the 3rd column of each 7-column range
#so we can access them like this:
sheetnameindexes = seq(3,56,7)
colnames(alltables)[sheetnameindexes]
#The percentages are always in col 7
percindexes = seq(7,56,7)
colnames(alltables)[percindexes]
```
```{r}
alltables[percindexes]
```