diff --git a/01_considerations.Rmd b/01_considerations.Rmd index 3c75447..43f3edd 100644 --- a/01_considerations.Rmd +++ b/01_considerations.Rmd @@ -52,15 +52,16 @@ and space. There are different considerations for data management when you are working on a project alone, as a collaboration, or archiving data for future use. However, similar best practices for data management will benefit them all. -Even if you are the only person who will ever interact with your data, data -management is still something to consider. If you collect or modify your data -over time, you need to make sure you keep track of those changes, and make sure -you are always working with the most up to date version of your data. This could -be because you are adding new data or correcting errors in older data. This is -doubly important if data gets entered by hand. Some data stores keep track of -changes to your data for you and even provide some quality control tools. Others -require you to do track versions of your data and do quality control on your -own. +Data Management is a crucial part of valid and reproducible research. Even if +you are the only person who will ever interact with your data, you must manage +your data in a way that you in the future will be able to recall what past you +did, and why. If you collect or modify your data over time, you need to make +sure you keep track of those changes, and make sure you are always working with +the most up to date version of your data. This could be because you are adding +new data or correcting errors in older data. This is doubly important if data +gets entered by hand. Some data stores keep track of changes to your data for +you and even provide some quality control tools. Others require you to track +versions of your data and do quality control manually. However, science is a collaborative process, so it is more than likely you won't be the only one working with your data. Any time more than one person needs to diff --git a/02_data.Rmd b/02_data.Rmd index f4e7d21..2f830c7 100644 --- a/02_data.Rmd +++ b/02_data.Rmd @@ -31,18 +31,19 @@ Note: there are other definitions of data structure from other disciplines (ex. in comp sci), we will not be using those definitions --> -If you are doing research, chances are you are going to be working with data in -some form or another. Even if you do not use traditional rows and columns, you -are certainly working with some form of information. And if you have translated -information into a form that is more easily shared or stored for later use, you -are working with data. Information becomes data when we give it some sort of -structure. That could be the rows and columns of a table, or something more +I think the first paragraph should be “Information becomes data when we give it +some sort of structure. If you are doing research, chances are you are going to +be working with data in some form or another. If you have translated information +into a form that is more easily shared or stored for later use, you are working +with data. That could be the rows and columns of a table, or something more human readable, like the grammar and syntax of natural language in text form. +Even if you do not use traditional rows and columns, you are certainly working +with some form of information that you have turned into data. One of the challenges of working with data is figuring out a useful way to store it. Thankfully, we have mostly left behind the 20th century's favorite way of storing data: paper in filing cabinets. Modern digital methods definitely take -up less space but also have a higher barrier to entry. There are many more types +up less space but also require more technical knowledge. There are many more types of data stores available now as well. This means you will need to make a decision on which one to use. **Ultimately, the best way to store your data** **will depend on the data you want to store and the questions you want to ask** @@ -56,13 +57,13 @@ type and data structure. Your data's type tells the computer what sorts of operations make sense. Common data types include: -- integers -- decimal -- floating point numbers -- categories (small, medium, large) -- characters (text) +- Integers +- Decimal +- Floating point numbers +- Categories (small, medium, large) +- Characters (text) - Boolean values (TRUE and FALSE) -- dates and times +- Dates and times All data, including individual data points, have a type. However, a given piece of information can be stored using multiple data types. For example, you can diff --git a/03_data-stores.Rmd b/03_data-stores.Rmd index 888a65d..c58c8d9 100644 --- a/03_data-stores.Rmd +++ b/03_data-stores.Rmd @@ -27,8 +27,11 @@ computer through text commands. **Query:** commands that create, read, update, or delete information from a data store, generally entered through a command line interface. -**Transaction:** the unit of work in a data store. Ideally a transaction -succeeds or fails as a unit. +**Transaction:** the unit of work in a data store. The database keeps track of +the order of transactions submitted by each user, to prevent conflicts. A +transaction succeeds or fails as a unit, so if an update affects multiple parts +of a database, the update only succeeds if every individual part of it is +allowed. ## Types of Data Stores diff --git a/data/sql_software.csv b/data/sql_software.csv index 5059436..4cc4607 100644 --- a/data/sql_software.csv +++ b/data/sql_software.csv @@ -4,3 +4,4 @@ PostgreSQL,"Technical documentation, with robust community support",Free,Open So MySQL,"Large community base, No professional support without a paid subscription",Free,Partially Open Source,Built-in vector support Microsoft SQL Server,Robust professional and community support,Paid,Proprietary,Built-in vector support Oracle,Robust professional and some community support,Paid,Proprietary,Raster and vector support with Spatial Studio +Microsoft Access,"Not supported, do not use",NA,Proprietary,None