-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tutorials needed on connecting to and using INBO data from R #37
Comments
With respect to GIS-data, the available dbase of mercator-vonet should be in scope ass well. I cite here the information provided by Carine Wils:
|
Just tested
Best for INBO would be to provide examples that can be added to the connections pane in RStudio. Or bundle it into something that can easily be installed for everyone? |
With respect to |
at https://inbo.github.io/tutorials/tutorials/r_database_access/ a introduction is provided about R access to databases |
Woohaa! This rocks. I'll certainly test this further, for now I'll propose a few tweaks here or there. |
Sorry for a late response, but I only now noticed this issue. With respect to RODBC and DBI, our styleguide recommends to use RODBC to connect to SQL Server and Access, and DBI to connect to other databases. For me as a Windows user, this seems the right guideline to follow: the odbc-driver that is used with DBI for SQL Server on Windows causes a lot of difficulties with special characters. (Apparently Linux machines have a better driver and do not have these problems.) As @peterdesmet already mentions: the documentation in RODBC is better, and to be honest: I think the error handling in DBI is terrible. Often I have to paste my query in SQL Server Management Studio to find out what is wrong. For databases without any special characters, it could be ok to suggest to use DBI (e.g. advantage to use dbplyr), but otherwise I would recommend to test first in a Windows environment if all characters can be imported well. The majority of the INBO users has Windows, so I think choices should mainly be based on this OS. |
A way to circumvent problems with special characters, yet still use DBI is given in this function. The input parameter |
As you know, I knew this solution, and it works indeed for some special characters, but unfortunately not for all (e.g. characters for male, female, <= in one character,...). (The examples indeed have an alternative and can be replaced in the database, but there will probably pop up others without alternative that give the same problem.) But what actually worries me, is that this all makes it more complicated for starting or occasional R programmers to use DBI. The whole process of connecting to a database and running a query is already complicated, and then they have to add another statement to translate these characters that are not presenting well,... Furthermore, we already searched for weeks to find this (partial) solution, they might already search for some time before they figure out it is not their mistake (which can be demotivating) and ask for a solution. So my main concern was not if there is a solution, but: if we know there are some difficulties with DBI that will make things complicated for less experienced users and there is an alternative (RODBC), will we still recommend DBI for using with SQL Server? What is actually the advantage of using DBI over RODBC? Except for the presence of a connection pane in RStudio, I don't see any arguments to choose for DBI in the discussion of this issue. So please convince me with some good arguments why DBI is prefered over RODBC. (I don't know why @ThierryO mentioned RODBC as a recommended package for SQL Server in the styleguide. It seems he wanted to promote DBI (as he did for most database types), but for some reason he made an exception for SQL Server and Access.) |
I was just pointing to that solution because it matters a lot where in your code the conversion to the correct encoding is done. For instance, encoding the input parameter and then passing it to glue_sql will not work. I don't have strong opinions, about DBI or RODBC, but I find exploring the database via the connection pane a big plus and it is considered the "best in class" package by rstudio. |
Something else to consider here is that the tidyverse package dbplyr is designed to work with DBI connections (see next coding club). |
Fast forward to 2020: we now have the inbodb package to address some challenges! It would be good to have a pointer to Also the INBOVEG tutorial will need some updating @ElsDeBie Beside that, I think we should further investigate / specify the need for functionality for specific databases, i.e. database-specific queries and preprocessing through R functions. This relates to the second sub-part of this issue, i.e. "best practices using each specific database". |
Some ideas for the database-R tutorial:
INBO software: I'll add inbodb and a short description I'll start working on a PR with the first idea for the database tutorial and the software and I'll wait on some more opinions for the other ideas. Maybe it would be better if someone with experience on it would add something on Maybe it is a good idea to involve @hansvancalster in this discussion as well? (For the tutorials on Florabank and INBOVEG) |
Considering the INBOVEG tutorial: maybe it better fits as a vignet in inbodb? |
Fine ideas @ElsLommelen !
perhaps
You could simply refer to https://dbplyr.tidyverse.org/index.html
If you would do that, I'd prefer two tutorials rather than three, and which should clearly refer to each other at the beginning. Otherwise, the interconnection gets lost (the website doesn't provide tutorial-hierarchy) and in such case I think it's better to keep it together as one. |
Because the general use is similar for all database types, I am not sure the best option is to give each database type its own tutorial. Also, maybe (part of) the database tutorial should better be part of a vignet in |
Sure, common parts better aren't duplicated. Regarding the tutorials-website vs. vignette, I leave such preference to you, it is not that obvious to me what is best. It's certainly a good thing to have self-contained help within the package. If the tutorial (or parts of it) were to move to the |
Part of the tutorial describes methods not using |
Another issue with |
Maybe in the table add a row |
I agree with Hans to add a row |
Are there any good sites about 'how to write a vignette'? I quickly searched and find this: https://kbroman.org/pkg_primer/pages/vignettes.html |
Hi @ElsDeBie, I found Hadley Wickham's book very helpful; see the vignettes chapter: http://r-pkgs.had.co.nz/vignettes.html |
Euh, no idea. Easiest to generate a vignette is using the function And for the content, to me it seems the same as any other text: focus on a (potential) user group for your package and write down what they should really know to use your package. I generally start with writing the goal of the vignette (and the focal user group), and try to use a lot of examples. |
Hm, I should have looked better, you've listed r-pkgs already 😉 Anyway, thanks for the tutorial of Broman, seems interesting - didn't know it. |
@ElsDeBie In the link by Hadley Wickham, just replace package name |
There is a 2nd edition of the book under development at https://r-pkgs.org. It seems that the authors did already make updates to the vignettes chapter at https://r-pkgs.org/vignettes.html. |
Els Lommelen: > Another issue with --> Is it an idea to put between brackets the databases mentioned in this package? Otherwise users can be lost if they don't no where to search for ..."Retrieve data: general | inbodb (technical support, Inboveg, FLoradatabank)". |
@ElsDeBie Adding the explicit databases has both pros (more informative) and cons (needs maintenance). @ElsLommelen what do you prefer? Adding the explicit databases (in the description) was actually her first approach; I suggested generalization 🤔 (see #171 (comment)). Regarding the form: I would not insert package-specific information in the table, as the table intends to give a very global overview. Package-specific information is given in the text below. @ElsDeBie Maybe you can propose a better category name than 'Retrieve data: general', that would be a better fit ? Still another possibility could be to repeat the package under more than 1 category in the table. |
@ElsDeBie I agree with @florisvdh and I would rather prefer to not mention specific databases for maintenance reasons. When adding new databases to inbodb, it is easily added in for instance the readme of inbodb itself, but keeping the tutorials website up-to-date as well (and especially remembering to do so whenever we make changes in inbodb), will require an additional effort. On the other hand it would be nice to name all these specific databases (inboveg and florabank for now) in the readme of inbodb, so visitors will immediately have this information while using the link in the tutorials website. |
INBO hosts many databases, so tutorials are needed for scientific and scientific support staff on how to use them from within R.
These tutorials should focus on:
There is an urgent need for the following INBO databases:
filegdb
#29Further, the need is also urgent for these databases:
This list is most probably incomplete. (So having an overview of data is another, related need.)
The text was updated successfully, but these errors were encountered: