Run SQL queries on Snowflake, Redshift, or a postgres database from an R script.
This package is designed to make it easy to run SQL queries from R. It is designed to work with Snowflake, Redshift, or a postgres database. It is not designed to work with other databases, but it could be extended to do so.
# The easiest way to get snowquery
install.packages("snowquery")
# Or you can get the development version from GitHub
# install.packages("devtools")
devtools::install_github("mermelstein/snowquery")
Redshift is currently only available on the development version of this package. See installation instructions above.
When connecting to a Redshift DWH you might need to specify an SSL connection. You can do this with a sslmode='require'
connection variable or by passing that to the queryDB()
function directly.
Because Snowflake's driver requires a ton of fiddling in order to make it work for R. It sucks. A lot.
To sum up the current experience of running SQL against Snowflake from:
- python: good ✅
- R: bad ❌
That's why the snowquery
package takes the Snowflake python connector and leverages it in the background to run queries from R.
For more information on using snowquery
, please see the package website.
Redshift and Postgres db connections are entirely contained by this package. If querying Snowflake you must have a local python installation and the Snowflake python connector installed. If you need to install python you can do that with Homebrew from the terminal:
# for example to install python 3.10 on MacOS
brew install python@3.10
If you need to install the Snowflake python connector, you can do that with the following command from the terminal:
pip install "snowflake-connector-python[pandas]"
For all db connections you will need to have your database credentials in a YAML file called snowquery_creds.yaml
. The file should be located in the root directory of your machine and should have the following format (depending on which database type you are connecting to):
---
my_snowflake_dwh:
db_type: snowflake
account:
warehouse:
database:
username:
password:
role:
my_redshift_dwh:
db_type: redshift
sslmode: require
host:
port:
database:
username:
password:
my_postgres_db:
db_type: postgres
host:
port:
database:
username:
password:
This follows a named connection format, where you can have multiple named connections in the same file. For example you might have a my_snowflake_dwh
connection and a my_snowflake_admin
connection, each with their own credentials.
This package looks for the credential file at this location: ~/snowquery_creds.yaml
. If it is in any other location it will not work. If the package cannot locate the file you will receive an error like: cannot open file '/expected/path/to/file/snowquery_creds.yaml': No such file or directory
. You can manually pass credentials to the queryDB()
function but it is recommended to use the YAML file.
You are now ready to query away!
Load this library in your R environment with library(snowquery)
.
There is one function you need: queryDB()
. It will take a SQL query as a string parameter and run it on the db.
For example:
library(snowquery)
query <- "SELECT * FROM MY_AWESOME_TABLE"
result_dataframe <- queryDB(query, conn_name='my_snowflake_dwh')
print(result_dataframe)
or
library(snowquery)
queryDB("SELECT * FROM MY_AWESOME_TABLE", conn_name='my_snowflake_dwh')
or
library(snowquery)
# You can also pass in credentials manually
result <- queryDB("SELECT * FROM my_table",
db_type='snowflake',
username='my_username',
password='my_password',
account='my_account',
database='my_database',
warehouse='my_warehouse',
role='my_role',
timeout=30)
print(result)