pydap is an implementation of the Opendap/DODS protocol, written from scratch in pure python. You can use pydap to access scientific data on the internet without having to download it; instead, you work with special array and iterable objects that download data on-the-fly as necessary, saving bandwidth and time. The module also comes with a robust-but-lightweight Opendap server, implemented as a WSGI application.
In this fork, a feature for automatic citation generation is added to pydap. This is implemented by adding a "citation response" to pydap. The citation is constructed from meta data in the DAS, the date of access and the subsetting (selection) parameters. The citation response (citation representation of the data) can be accessed by appending ".citation" to the url of the dataset.
The addition of the citation response is implemented in a fork of pydap, rather than in just an additional response for two reasons:
- Since the citation response contains the subsetting parameters, it requires knowledge of the request, rather than just the dataset. Therefore changes to the data handler lib had to be made.
- For the purpose of presentation, the webinterface was modified to include buttons to access the citation response.
You can install the latest version using pip. After installing pip you can install pydap with this command:
$ pip install pydap
This will install pydap together with all the required dependencies. You can now open any remotely served dataset, and pydap will download the accessed data on-the-fly as needed:
This for can be installed by
pip git+https://github.com/NiklasPhabian/pydap
One can use the pydap.wsgi.app:DapServer class, initialized with the path to your data files (a DapServer object is a WSGI callable). Then you can expose that as your "app" to any WSGI framework (from pydap#46):
>>> from pydap.client import open_url
>>> dataset = open_url('http://test.opendap.org/dap/data/nc/coads_climatology.nc')
>>> var = dataset['SST']
>>> var.shape
(12, 90, 180)
>>> var.dtype
dtype('>f4')
>>> data = var[0,10:14,10:14] # this will download data from the server
>>> data
<GridType with array 'SST' and maps 'TIME', 'COADSY', 'COADSX'>
>>> print(data.data)
[array([[[ -1.26285708e+00, -9.99999979e+33, -9.99999979e+33,
-9.99999979e+33],
[ -7.69166648e-01, -7.79999971e-01, -6.75454497e-01,
-5.95714271e-01],
[ 1.28333330e-01, -5.00000156e-02, -6.36363626e-02,
-1.41666666e-01],
[ 6.38000011e-01, 8.95384610e-01, 7.21666634e-01,
8.10000002e-01]]], dtype=float32), array([ 366.]), array([-69., -67., -65., -63.]), array([ 41., 43., 45., 47.])]
For more information, please check the documentation on using pydap as a client. pydap also comes with a simple server, implemented as a WSGI application. To use it, you first need to install the server and optionally a data handler:
$ pip install pydap[server,handlers.netcdf]
This will install support for netCDF files; more handlers for different formats are available, if necessary. Now create a directory for your server data.
To run the server just issue the command:
$ pydap --data ./myserver/data/ --port 8001
This will start a standalone server running on http://localhost:8001/,
serving netCDF files from ./myserver/data/
, similar to the test
server at http://test.pydap.org/. Since the server uses the
WSGI standard, it can easily be run behind
Apache. The server
documentation has
more information on how to better deploy pydap.
For more information, see the pydap documentation.
If you need any help with pydap, please feel free to send an email to the mailing list
An examplary pydap application is available at https://github.com/NiklasPhabian/occur_pydap