Skip to content
Javier Otegui edited this page Mar 24, 2015 · 2 revisions

#Biodiversity Data from Social Networking Sites

Summary: Discover primary biodiversity occurrence records from Social Networking Sites and make them directly available via R

Description: Primary Biodiversity Data (PBD), the most fundamental pieces of information describing the presence of a species in time and space, are a cornerstone in ecological research. This field is gaining importance due to its potential to understand the distributions of the species, for example in light of climate change. The [Global Biodiversity Information Facility] (http://www.gbif.org) is an international initiative that currently serves ca. 530 million PBD, but several studies suggest that this is still not enough for many purposes and that there are still gaps to be filled among data sources. There is a big interest in exploring the potential of Social Networking Sites (SNS) like [Flickr] (https://www.flickr.com) and Picasaweb to serve as new PBD sources. We would like to propose the design and development of a new R package that would wrap the APIs of these SNS, with special focus on biodiversity data. This package would mine the content of these SNS and make the data available to users in international standard formats that are widely used within the biodiversity community, like [Darwin Core] (http://rs.tdwg.org/dwc/), [DarwinCore Archive] (http://rs.tdwg.org/dwc/terms/guides/text/index.htm) and [Audubon Core] (http://terms.tdwg.org/wiki/Audubon_Core).

Related work: Packages that mine SNS are already available, e.g. Rflickr and Rfacebook, but a seamless interface that meets the needs of biodiversity data users is required.

Potential tasks:

  • Design a wrapper for the APIs of several SNS in R
  • Create and test functions to fetch, parse and store data for multiple species effectively
  • Create and test functions to store a standardized version of the retrieved data, by converting them into formats like darwin core and audubon core
  • Create and test functions to export the data in DarwinCore Archive format

Skills required:

  • Understanding of Biodiversity data and Biodiversity data users’ needs
  • Understanding of APIs and effective data retrieval
  • Experience with biodiversity data retrieval packages like rgbif or rvertnet
  • Experience with social networking sites packages like Rflickr or Rfacebook
  • Familiarity with data standards like Darwin core and Audubon Core
  • Understanding and experience with the DarwinCore Archive data format

Test:

  • Show ability for the discovery, retrieval, parsing and storing of data related to a single species from one or more SNS
  • Demonstrate ability to build a package

Mentors: [Robert Guralnick] (mailto:robgur@gmail.com), [Javier Otegui] (http://about.me/jotegui) ([@] (mailto:javier.otegui@gmail.com)), [Jorge Soberón] (mailto:jsoberon@ku.edu)