Skip to content

Commit

Permalink
incorporating more of Aviva's comments, #30
Browse files Browse the repository at this point in the history
  • Loading branch information
elisehellwig committed Apr 10, 2024
1 parent 52df097 commit 6152c89
Show file tree
Hide file tree
Showing 4 changed files with 30 additions and 24 deletions.
19 changes: 10 additions & 9 deletions 01_considerations.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -52,15 +52,16 @@ and space. There are different considerations for data management when you are
working on a project alone, as a collaboration, or archiving data for future
use. However, similar best practices for data management will benefit them all.

Even if you are the only person who will ever interact with your data, data
management is still something to consider. If you collect or modify your data
over time, you need to make sure you keep track of those changes, and make sure
you are always working with the most up to date version of your data. This could
be because you are adding new data or correcting errors in older data. This is
doubly important if data gets entered by hand. Some data stores keep track of
changes to your data for you and even provide some quality control tools. Others
require you to do track versions of your data and do quality control on your
own.
Data Management is a crucial part of valid and reproducible research. Even if
you are the only person who will ever interact with your data, you must manage
your data in a way that you in the future will be able to recall what past you
did, and why. If you collect or modify your data over time, you need to make
sure you keep track of those changes, and make sure you are always working with
the most up to date version of your data. This could be because you are adding
new data or correcting errors in older data. This is doubly important if data
gets entered by hand. Some data stores keep track of changes to your data for
you and even provide some quality control tools. Others require you to track
versions of your data and do quality control manually.

However, science is a collaborative process, so it is more than likely you won't
be the only one working with your data. Any time more than one person needs to
Expand Down
27 changes: 14 additions & 13 deletions 02_data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -31,18 +31,19 @@ Note: there are other definitions of data structure from other disciplines
(ex. in comp sci), we will not be using those definitions
-->

If you are doing research, chances are you are going to be working with data in
some form or another. Even if you do not use traditional rows and columns, you
are certainly working with some form of information. And if you have translated
information into a form that is more easily shared or stored for later use, you
are working with data. Information becomes data when we give it some sort of
structure. That could be the rows and columns of a table, or something more
I think the first paragraph should be “Information becomes data when we give it
some sort of structure. If you are doing research, chances are you are going to
be working with data in some form or another. If you have translated information
into a form that is more easily shared or stored for later use, you are working
with data. That could be the rows and columns of a table, or something more
human readable, like the grammar and syntax of natural language in text form.
Even if you do not use traditional rows and columns, you are certainly working
with some form of information that you have turned into data.

One of the challenges of working with data is figuring out a useful way to store
it. Thankfully, we have mostly left behind the 20th century's favorite way of
storing data: paper in filing cabinets. Modern digital methods definitely take
up less space but also have a higher barrier to entry. There are many more types
up less space but also require more technical knowledge. There are many more types
of data stores available now as well. This means you will need to make a
decision on which one to use. **Ultimately, the best way to store your data**
**will depend on the data you want to store and the questions you want to ask**
Expand All @@ -56,13 +57,13 @@ type and data structure.
Your data's type tells the computer what sorts of operations make sense. Common
data types include:

- integers
- decimal
- floating point numbers
- categories (small, medium, large)
- characters (text)
- Integers
- Decimal
- Floating point numbers
- Categories (small, medium, large)
- Characters (text)
- Boolean values (TRUE and FALSE)
- dates and times
- Dates and times

All data, including individual data points, have a type. However, a given piece
of information can be stored using multiple data types. For example, you can
Expand Down
7 changes: 5 additions & 2 deletions 03_data-stores.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,11 @@ computer through text commands.
**Query:** commands that create, read, update, or delete information from a data
store, generally entered through a command line interface.

**Transaction:** the unit of work in a data store. Ideally a transaction
succeeds or fails as a unit.
**Transaction:** the unit of work in a data store. The database keeps track of
the order of transactions submitted by each user, to prevent conflicts. A
transaction succeeds or fails as a unit, so if an update affects multiple parts
of a database, the update only succeeds if every individual part of it is
allowed.


## Types of Data Stores
Expand Down
1 change: 1 addition & 0 deletions data/sql_software.csv
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ PostgreSQL,"Technical documentation, with robust community support",Free,Open So
MySQL,"Large community base, No professional support without a paid subscription",Free,Partially Open Source,Built-in vector support
Microsoft SQL Server,Robust professional and community support,Paid,Proprietary,Built-in vector support
Oracle,Robust professional and some community support,Paid,Proprietary,Raster and vector support with Spatial Studio
Microsoft Access,"Not supported, do not use",NA,Proprietary,None

0 comments on commit 6152c89

Please sign in to comment.