-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path04-airtable.qmd
362 lines (231 loc) · 22.4 KB
/
04-airtable.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
---
output-file: airtable.html
---
::: {.callout-note title="Airtable not anymore used to host the database"}
As of May 2024, the Africa CDC Database backend has shifted from Airtable to DoltHub.
This chapter has been kept here to document how the database started and to show the history of its development.
To read about DoltHub, see @sec-dolt. To learn more about the use of dolt and DoltHub as the current backend for the database, read @sec-database-dolthub.
:::
# Airtable {#sec-airtable}
[Airtable](https://airtable.com) is a cloud-based software platform that allows users to create and manage databases, spreadsheets, and other types of organizational tools. It can be used for a variety of purposes, including project management, customer relationship management, inventory tracking, event planning, and much more.
One of the key features of Airtable is its flexible and customizable nature. Users can create and customize their own database structures, and can choose from a wide range of data types, including text, attachments, checkboxes, and more. This allows for a high degree of customization and adaptability to different use cases and workflows.
Airtable also offers a variety of collaboration features, including real-time syncing and commenting, as well as integrations with other popular tools such as Slack, Google Drive, and Trello. Additionally, Airtable has a robust API that allows developers to build custom integrations and applications on top of the platform.
Airtable's flexibility, customizability, and features that support collaboration along with its spreadsheet-style interface that is familiar to most that have used other spreadsheet software such as Microsoft Excel or Google Sheets are the key reasons why it was chosen to be the database tool for RIG.
@sec-airtable provides an overview of Airtable and good practices for designing data models in relational databases.
## Key Terms {#sec-airtable-key-terms}
- **Workspace** - A collection of bases
- **Base** - A database. Each is identified by the Airtable API via a `base id`
- **Table** - A tabular data set within a base. Each table is identified by the Airtable API via the name of the table e.g. "Demo Table"
- **Record** - An individual cell within a table. Each record is identified by the Airtable API via a `record id`
- **Field** - A property of data in a table
- **Views** - A specific way of displaying a table. Default is grid.
- **Entity** - Something that either physically or logically exists whose properties are typically stored in a table and composed of data elements.
- **Element** - an attribute of a Entity (a field)
## Security and Access Control {#sec-airtable-security}
Airtable maintains physical and technological security as part of its ISO IEC 27001:2013 and SOC 2 compliance measures. Data are 256-bit encrypted when storing on the server and also when transferring data over the internet. To find vulnerabilities in their software, they run daily, weekly, and monthly scans on different components of their system and regularly commission external penetration tests. They also run a bug bounty program to help identify issues. Their data centers have fire detection and suppression systems, redundant power systems, and strict control for physical access. Because Airtable relies on Amazon Web Services (AWS) for its cloud infrastructure (the same providers used by previous EHA projects), data are geo-redundantly replicated in backups across multiple zones to increase data durability. They also have a team monitoring services at all times. Airtable employees are thoroughly vetted before hiring and continually trained on data protection best practices. Their workstations are secured by using full-disk encryption, automatic locking, and strong password requirements.
### User and Administrator Security Features {#sec-airtable-user-security}
#### Access Controls {#sec-airtable-access-controls}
Airtable provides database (referred to as "base") and workspace administrators with granular controls over who can view, edit, comment, or otherwise modify data at the base, table, and field levels. There are four levels of Airtable user permissions:
- **Owner/Creator**: Full administrative control of base
- **Editor**: Sees full base, create and modify records and views, create and modify view share links
- **Commenter**: Sees full base, comment on records
- **Reader**: Sees full base
Direct access to a base or workspace is granted or removed by base owners and creators to Airtable users. Base owners and creators can control who has access to a base and can control any “share” links created for that base. They may also restrict editing of tables or fields within a base. Any collaborator given direct access to a base at any permission level will be able to duplicate that base and share that data further. It is important that direct access to the base is limited to individuals with a need to curate or analyze the data.
#### Share Links and Interfaces {#sec-airtable-interfaces}
To further restrict access to a base, users can be given indirect access via revocable share links or interfaces.
[Share links](https://support.airtable.com/docs/creating-a-base-share-link-or-a-view-share-link) can be customized to prevent users from seeing the full base, prevent duplicating the base, and prevent copying data from the base. The ability to use the link can be password protected, restricted to people with certain email domains, and may be revoked at any time. If there are concerns about data leaks via base or table duplication, inviting people with a need to view the data via share links constrains their ability to extract data from the base.
[Interfaces](https://support.airtable.com/docs/interface-designer-overview) are dynamic dashboards built on a limited set of data in an airtable base. Users can explore or even edit data based on the permissions provided by the interface creator. Access can be further tuned by setting up a "current-user filter". See [this guide](https://support.airtable.com/docs/interface-designer-permissions) for more information.
## Data in Airtable {#sec-airtable-data}
See [Airtable plan comparison](https://support.airtable.com/hc/en-us/articles/115010928147-Airtable-plans) for more information on the size of bases and features available. Information in this section pertains to all plans unless specified.
### Workspaces, Bases, Tables, Fields, Records {#sec-airtable-workspaces}
Airtable uses workspaces, bases, tables, and fields to manage data. A workspace generally pertains to a particular project and contains all bases relevant to that project. Sharing a workspace with someone allows them to see all bases within that workspace.
Bases are equivalent to databases. They consist of a set of tables that can be linked and allow you to perform some task (e.g. IRB tracking, capturing research data, etc.). Bases can be duplicated and shared across your workspaces and you can share bases with other users.
Tables are where most of the action happens. Data is entered in tables, tables can be transformed via views into calendars, dashboards, or galleries, and tables can be manipulated via the API. They describe a data entity and are composed of fields. In the bat sampling example, each circle would be a table in the bat sampling base (see @fig-bat-sampling).
```{dot}
//| label: fig-bat-sampling
//| fig-cap: Data model for bat sampling example
digraph bat_sampling {
graph [overlap = true,
nodesep = .5,
ranksep = .25,
compound = true,
color = "MediumAquamarine",
fontsize = 10,
rankdir = LR]
node [shape = circle,
fixedsize = true,
width = 1,
fontname = "Helvetica",
style = filled,
fillcolor = "MediumAquamarine"]
a [label = "Collector"]
b [label = "Site"]
c [label = "Bat"]
d [label = "Antibody\ntest result"]
e [label = "PCR\ntest result"]
f [label = "Physical\nsample"]
edge [color = "grey",
arrowhead = none,
arrowtail = none,
penwidth = 3]
a -> c [label = "Collected by"]
b -> c [label = "Collected at"]
c -> f [label = "Collected from"]
f -> d [label = "Derived from"]
f -> e [label = "Derived from"]
}
```
Fields represent properties of a data entity. In a spreadsheet view, fields are columns. Airtable allows you to control field types (date, number, text, file attachments, logical, etc.) and paid plans allows you to control who can edit fields. Fields can be linked between tables creating links.
Records are the individual data points in a table. In a spreadsheet they would be the rows. Records are shaped by the structure you have created in tables and fields. Each record has a URL that can be used to access it programmatically or share it.
## Views {#sec-airtable-views}
In Airtable, tables can be displayed in different views to emphasis different components of the data. Views are great for creating concise presentations of data, especially in sprawling tables. The default view in an table is the grid (spreadsheet) view. All other views will derive from the data entered in this view.
For more on views, see [the guide to views](https://support.airtable.com/hc/en-us/articles/202624989-Guide-to-views).
## Internal Backups: Record History, Base Snapshots {#sec-airtable-internal-backups}
Airtable has system for tracking changes to a base. They provide revision history for individual records (how long those histories are stored varies by plan). Any comments made in the revision history will be stored for the life of the record (on any plan). The current state of an entire base may be captured through snapshots. Should a systemic issue arise, the base can be restored to a snapshot at a later date. Restoring from a snapshot will remove the revision history for a record but comments on that record will be maintained.
As revision histories are maintained, "deleted" records may be retrieved. In the event of the need to permanently remove data, the revision history of the base may be removed.
Airtable allows data to be exported as CSVs from individual tables. At this time there is not an Airtable supported base export function.
## Importing data {#sec-airtable-import-data}
Data can be imported to Airtable from a number of sources including CSV files, excel, Google sheets, XML, and via copy paste. Airtable will guess what the most appropriate field type is, so make sure the field type is appropriate for the data (e.g. convert from text to date type fields). Certain sources, like Google sheets, can be imported as bases.
For more on importing data see: [Importing and Adding data](https://support.airtable.com/hc/en-us/sections/200928025-Importing-and-adding-data)
## Base Design {#sec-airtable-base-design}
Having a good base design will make using your data easier. Generally, the process looks like this:
1. Describe what the database will do and collect use cases
2. Determine the roles of various stakeholders
3. List out the entities in the database and define their properties
4. Map out how the entities fit together (which properties link them)
5. Check that the mapping meets the use cases
6. Build base in Airtable
7. Check that the base meets the use cases
If you are migrating from spreadsheets, you likely already have an idea of what you need the base to do and a collection of data properties. It is still a good idea to follow the steps outlined above for mapping out entities. You may find that entire sheets can be replaced by views or that the data in one sheet should actually be stored in two different tables.
## Automating Airtable {#sec-airtable-automation}
Airtable has five main routes for automating processes.
1. [Automations](https://support.airtable.com/hc/en-us/articles/360050974153-Automations-overview) - a drag and drop visual programming tool
2. [Extensions](https://support.airtable.com/docs/airtable-extensions-overview) - pre-built applications that perform some task
3. [Scripting](https://support.airtable.com/hc/en-us/articles/360043041074-Scripting-app-overview) - use JavaScript to automate tasks within Airtable
4. [Blocks](https://www.airtable.com/developers/apps/guides/getting-started) - use JavaScript to create custom applications
5. [REST API](https://airtable.com/api) - use whatever programming language you like to automate processes
### Automations with Drag and Drop Programming (pro and above) {#sec-airtable-automation-pro}
The automations feature within Airtable allows you to visually program routines. Each automation has three basic components (see @fig-actions-diagram):
- Status - controls whether or not the automation will run when the trigger condition is met
- Triggers - condition for automation to run: the creation or change of a record, a scheduled time, or some other external action
- Actions - what the automation will do
```{r fig.height = 2}
#| label: fig-actions-diagram
#| fig-cap: Actions diagram
#| echo: false
library(ggplot2)
nodes <- data.frame(
x = c(1, 2, 3), y = 0,
label = c("Status", "Trigger", "Action")
)
edges <- data.frame(
x1 = c(1, 2), x2 = c(2, 3), y1 = 0, y2 = 0,
seg_label = c("Automation is On", "Condition is Met")
)
ggplot()+
geom_segment(
data = edges,
mapping = aes(x = x1, y = y1, xend = x2, yend = y2),
colour = "light gray"
) +
geom_text(
data = edges,
mapping = aes( label = seg_label),y = 0.005, x = c(1.5, 2.5),
size = 5
) +
geom_point(
data = nodes, aes(x =x, y = y), size = 30, color = "mediumaquamarine"
) +
geom_text(
data = nodes,
mapping = aes(x =x, y=y, label = label), size = 5
) +
theme_void() +
xlim(c(.9, 3.1))
```
Automations are commonly used to augment the synced tables feature, send notifications, check data quality, and manipulate or create records.
This automation (see @fig-airtable-weekly-summary) creates a weekly summary of applicants who applied for a position. Its trigger is time based, then it finds records that match a condition, and finally it generates an email from those records and sends to the appropriate recipients.
![Airtable weekly summary](images/airtable_Weekly Digest.png){#fig-airtable-weekly-summary fig-align="center"}
### Scripting and Blocks (pro plan and above) {#sec-airtable-scripting}
Scripting uses JavaScript to manipulate a base from within an application in Airtable. Scripting is flexible but has a steep learning curve. The scripting environment Airtable provides can be helpful as it provides code [linting](https://en.wikipedia.org/wiki/Lint_(software)), direct access to documentation, and example scripts to build from. The major drawback to scripting is that scripts live in the base and files are not version controlled.
- There are a LOT of [pre-written scripts, in a shared "marketplace"](https://airtable.com/marketplace/category/scripts) to perform automatic actions. Search the marketplace before you start writing bespoke code.
Blocks are custom applications built in JavaScript and node.js that add to base functionality. They are created in a development environment outside of Airtable then brought back into platform. There are number of [tutorials for getting started with blocks](https://www.airtable.com/developers/apps/guides/getting-started).
## Using the REST API {#sec-airtable-api}
All Airtable bases are automatically accessible to authorized users via a REST API. The list of API accessible bases you have access to can be found here: https://airtable.com/api. By clicking on a base you will be able to see the full API documentation for that base.
### Scoped Tokens {#sec-airtable-tokens}
Airtable is moving to a [scoped tokens](https://airtable.com/developers/web/guides/personal-access-tokens) based approach to api access. Scoped personal access tokens allow you to create a token for a specific base with specific permissions - e.g. token has read-only access to a bat sampling base. Using scoped tokens in this way means that if the token is compromised (leaked, stolen, accidentally committed unencrypted to a github repo, etc), you can delete that token to remove any access it might have had and the limited scope means that you know exactly what a person would have been able to access.
To create a personal access token go here: [https://airtable.com/create/tokens](https://airtable.com/create/tokens)
:::{.callout-warning}
Remember to save the token in a secure place. Do not store unencrypted tokens on the web (e.g. pushing them to GitHub).
:::
Airtable has also deployed [OAuth tokens](https://airtable.com/developers/web/api/oauth-reference) for all users.
### Airtable and R {#sec-airtable-r}
The Airtable REST API can be used via R with the [airtabler package](https://github.com/ecohealthalliance/airtabler). EHA has started a fork of the package that has additional functionality so it is recommended to use that version. The original package design works well for exploring the data. Our extension adds additional functionality to help use `airtabler` in automation via continuous integration such as GitHub Actions.
```R
devtools::install_github("ecohealthalliance/airtabler")
```
The Airtable API serves up data as JSON, which has a hierarchical structure similar to a list in R. To handle JSON, `airtabler` uses the `jsonlite` package. Its helpful to understand how `jsonlite` handles different JSON structures when working with more complicated Airtable data. See the `jsonlite` [quick-start guide](https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html) for a basic overview. The [`purrr` package](https://purrr.tidyverse.org/) is extremely helpful when dealing with data objects derived from JSON because it facilitates navigating nested data structures.
The `airtabler` package provides instructions for setting up access to the Airtable API. You will need to follow those instructions for @exm-one-to-one-join and @exm-one-to-many-join to work.
::: {#exm-one-to-one-join}
```R
library(airtabler)
table1 <- fetch_all(base = "app49bbyLczZxX9PM",table_name = "One To One")
table2 <- fetch_all(base = "app49bbyLczZxX9PM",table_name = "Table 2")
## Linked records are stored as JSON arrays so they become lists
## when part of a data frame. Because each array has a length of 1
## we can safely unlist the arrays and add them back as to the
## data frame.
table1$LinkedRec <- unlist(table1$LinkedRec)
joinedTables <- dplyr::left_join(table1,table2, by = c("LinkedRec"="id"))
recordKey <- joinedTables[c("Name","LinkedRec","Number")]
recordKey
```
One to One join
:::
::: {#exm-one-to-many-join}
```R
library(airtabler)
library(dplyr)
library(purrr)
oneToMany <- fetch_all(base = "app49bbyLczZxX9PM", table_name = "One To Many")
table2 <- fetch_all(base = "app49bbyLczZxX9PM",table_name = "Table 2")
# Depending on how you want to work with the data joins are a
# little trickier here.
## to replace the values but keep the structure
oneToMany$LinkedRecReplace <- purrr::map(oneToMany$LinkedRec, function(x){
table2 %>%
filter(id %in% x) %>%
select(c("id","Number")) %>%
pull("Number")
})
data.frame(
id = unlist(oneToMany$LinkedRec),
label = unlist(oneToMany$LinkedRecReplace)
)
```
One to Many Join
:::
## Data Management {#sec-airtable-data-management}
Because of its flexibility and ease of use, it is extremely important that data
management for airtable be taken seriously. Unlike most other relational databases, the fundamental properties of your base can be changed easily by multiple users without warning! Documenting the structure and purpose of your base, as well as creating regular external backups could save you from catastrophe.
### Metadata {#sec-airtable-metadata}
- **Structural metadata** - Information about a resource that tells you how its
put together - e.g. the relationship between a table, field, and automation. With the exception of users on the enterprise plan, you must be deliberate about
creating and maintaining structural metadata for your base.
Example of Airtable Structural Metadata:
<iframe class="airtable-embed" src="https://airtable.com/embed/shrsv0CWMSERgUIR8?backgroundColor=gray&viewControls=on" frameborder="0" onmousewheel="" width="100%" height="533" style="background: transparent; border: 1px solid #ccc;"></iframe>
- **Descriptive metadata** - Information about a resource that makes it easier to find and attribute data. This includes things attributes like author, title, description, keywords, etc. This table becomes especially important when data are transitioned out of airtable, at the end of a project, or if the base will be shared broadly with collaborators.
Example of Airtable Descriptive Metadata:
<iframe class="airtable-embed" src="https://airtable.com/embed/shrFr7eafWtH4s7ll?backgroundColor=gray&viewControls=on" frameborder="0" onmousewheel="" width="100%" height="533" style="background: transparent; border: 1px solid #ccc;"></iframe>
To see the template base, follow this [link]( https://airtable.com/invite/l?inviteId=inv5ycCAbIVw1WM3Z&inviteToken=ef7b72e998a0a69d835244f87f88b6b9b6b1094915239ebc93423e1257fcd72b&utm_medium=email&utm_source=product_team&utm_content=transactional-alerts)
#### Metadata for automations and extensions {#sec-airtable-metadata-extension}
Automations and extensions live entirely inside airtable. As of August 2022, Airtable introduced a "Manage Fields" tab that provides metadata for a specific table, including any automation dependencies. It is not possible to export those dependency tables or the code used in automations automatically. This makes it extremely important to document any automations and extensions outside your base.
### External Backups {#sec-airtable-external-backups}
It is a good idea to create regular external backups of airtable data in the event that something catastrophic happens to your base or you simply decide you no longer wish to use airtable as your data store.Unfortunately, airtable does not provide an off the shelf solution for this. Below are three options for extracting all data from your base.
#### Using airtabler {#sec-airtabler-airtabler}
Using `airtabler::air_dump()` and `airtabler::air_dump_to_csv()` functions, you can export all tables to R then create a versioned folder of CSVs. See the `air_dump_to_csv` help page for more information.
#### Other Open Source Projects {#sec-airtable-other}
UnlyEd - use this template to push airtable backups to AWS S3.
- [https://github.com/UnlyEd/airtable-backups-boilerplate](https://github.com/UnlyEd/airtable-backups-boilerplate)
#### Paid services {#sec-airtable-paid-services}
Sequin - replicate airtable to postGres database ([https://www.sequin.io/sources/airtable](https://www.sequin.io/sources/airtable))