Skip to content

Commit

Permalink
Merge pull request #122 from nyuhsl/feature/improve-readme
Browse files Browse the repository at this point in the history
Feature/improve readme
  • Loading branch information
ianlamb77 authored Mar 7, 2019
2 parents d0537b0 + fbc83fc commit 0b8b68f
Showing 1 changed file with 13 additions and 9 deletions.
22 changes: 13 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,15 @@
Welcome to the NYU Health Sciences Library's Data Catalog project. Our aim is to encourage the sharing and reuse of research data among insitutions and individuals by providing a simple yet powerful search platform to expose existing datasets to the researchers who can use it. There is a basic backend interface for administrators to manage the metadata which describes these datasets.

## Components
The Data Catalog runs on **Symfony2**, a popular PHP application framework. Installation and management of this package is best performed by a PHP developer familiar with this framework.
The Data Catalog runs on **Symfony2**, a popular PHP application framework. Installation and management of this package should be performed by a PHP developer familiar with this framework. Typically, Symfony is run with a HTTP server such as **Apache**, and a database such as **MySQL**. Installation of the Data Catalog will require a working knowledge of these packages.

The search functionality is powered by **Solr**, which will need to be running and accessible by the server hosting the website. A sample Solr schema is included with this package. The Solr index can be updated regularly by setting up a cron job which calls an update script. A sample update script is also included with this package.
The search functionality is powered by **Apache Solr**, which will need to be installed separately from this project. Solr comes packaged with its own web server (Jetty) and can be run on the same machine as this website, or on its own machine. We recommend using Solr version 6; version 7 should also work but we have not tested this. Detailed information on installing Solr is outside the scope of this documentation, but the basic steps are as follows (this is also covered in the general installation steps below):
1. [Download and install the Solr package](https://lucene.apache.org/solr/guide/6_6/getting-started.html#getting-started)
2. [Start the Solr server and create a Solr core for this project](https://lucene.apache.org/solr/guide/6_6/running-solr.html#RunningSolr-CreateaCore)
3. [Configure Solr to use our custom schema](https://lucene.apache.org/solr/guide/6_6/schema-factory-definition-in-solrconfig.html#SchemaFactoryDefinitioninSolrConfig-SwitchingfromManagedSchematoManuallyEditedschema.xml), which is included in the root directory of this project (SolrV6SchemaExample.xml)
4. Add the URL of your new Solr core to Symfony's parameters.yml file (step 4 in the install instructions below).

The metadata and some information about users is stored in a database. We used **MySQL** and there's a good chance you will too.
Datasets are added using the Data Catalog's administrative interface, and then sent to Solr for indexing. Solr's index therefore needs to be kept in sync with any changes made in the Data Catalog. We've provided a sample indexing script ("SolrIndexerExample") in the root directory of this project. We recommend setting this up to run automatically either daily or weekly depending on your usage.

**IMPORTANT NOTE:** This package comes with a very basic form of authentication that should only be used in a local development environment. There are methods in place to use your institution's LDAP server, or you can use Symfony's built-in user management. Please read `app/config/common/security.yml` for more info.

Expand All @@ -19,18 +23,18 @@ This repository is essentially a Symfony2 distribution (i.e. it is not simply a
```
git clone https://github.com/nyuhsl/data-catalog.git
```
3. [Create a Solr core for your project](https://lucene.apache.org/solr/guide/6_6/running-solr.html#RunningSolr-CreateaCore). Your core's name will become part of the URL that goes into the parameters.yml file in the next step. For example, if you create a core called "datacatalog" your Solr URL would look something like "http://localhost:8983/solr/datacatalog".
4. Read `app/config/parameters.yml.example`. Fill in the information about your MySQL server, and the URL where your Solr installation lives (the `solrsearchr_server` parameter). You'll need a version of this in `app/config/dev` and `app/config/prod`. Remember to choose a "secret" according to the documentation [here](http://symfony.com/doc/current/reference/configuration/framework.html#secret). Then read through `app/config/security.yml.example` and copy it to `app/config/common/security.yml`. Please also read the README file in `app/config` which contains more information.
3. [Start Solr and create a new core for your project](https://lucene.apache.org/solr/guide/6_6/running-solr.html#RunningSolr-CreateaCore). Your core's name will become part of the URL that goes into the parameters.yml file in the next step. For example, if you create a core called "datacatalog" your Solr URL would look something like "http://localhost:8983/solr/datacatalog".
4. Next we'll be setting up the Symfony configuration files. Check the [Symfony documentation](https://symfony.com/doc/2.8/configuration.html) for some background information about how these files work. In this project, we have additional info in our `app/config/parameters.yml.example`. Fill in the information about your MySQL server, and the URL where your Solr installation lives (the `solrsearchr_server` parameter). You'll need a version of this file in `app/config/dev` and, later, in `app/config/prod`. Remember to choose a "secret" according to the documentation [here](http://symfony.com/doc/current/reference/configuration/framework.html#secret). Then read through `app/config/security.yml.example` and copy it to `app/config/common/security.yml`. Please also read the README file in `app/config` which contains some more information.
5. On a command line, navigate to your project's root directory and run `composer install` to install Symfony and any dependencies.
6. [Configure your web server](http://symfony.com/doc/current/cookbook/configuration/web_server_configuration.html) to work with Symfony. NOTE: You will eventually have to require HTTPS connections on the login and administrative pages (at least), so remember to set up an SSL certificate for your server when you move the site to production. There is some sample code in app/config/common/security.yml that will tell Symfony to require HTTPS connections.
7. [Configure the file
system](https://symfony.com/doc/2.8/setup/file_permissions.html). This
means at the very least that `app/config/cache` and `app/config/logs` is
writeable by Apache and by your account.
8. To set up the database, there are two options. First, there is a "starter database" prepopulated with several public datasets which can be loaded directly into the empty database schema you created in step 1. We recommend this option. Just extract the file `starterDatabase.sql.tar.gz` which is in the root of this repo, and [import the \*.sql file into your schema](https://stackoverflow.com/a/17666279). However, if you'd prefer to start totally from scratch, navigate to the root of your Symfony installation and run `php app/console doctrine:schema:update --force`. If you have configured your database correctly in parameters.yml, this will set up your empty database to match the data model used in this application. If you haven't configured it correctly, this command will let you know.
writeable by the Apache web server and by your account.
8. To set up the database, there are two options. First, there is a "starter database" prepopulated with several public datasets which can be loaded directly into the empty database schema you created in step 1. We recommend this option. Just extract the file `starterDatabase.sql.tar.gz` which is in the root of this repo, and [import the \*.sql file into your schema](https://stackoverflow.com/a/17666279). However, due to updates to the metadata, this file may become out of date. In this case, or if you'd just prefer to start with an empty database, you can create the table structure using [a Symfony console command](https://symfony.com/doc/2.8/doctrine.html#creating-the-database-tables-schema). Navigate to the root of your Symfony installation and run `php app/console doctrine:schema:update --force`. If you have configured your database correctly in parameters.yml, this will set up your empty database to match the data model used in this application. If you haven't configured it correctly, this command will let you know.
9. If using Solr v6+, you will need to switch from the "managed-schema" to use our custom schema, which is defined in `SolrV6SchemaExample.xml`. This involves some minor changes to `solrconfig.xml` as described [here](https://cwiki.apache.org/confluence/display/solr/Schema+Factory+Definition+in+SolrConfig#SchemaFactoryDefinitioninSolrConfig-Classicschema.xml) and [here](http://stackoverflow.com/a/31721587). Then place `SolrV6SchemaExample.xml` in the Solr config directory, named `schema.xml`. Perform any customizations you require, or leave as is.
10. At this point, the site should function, but you won't see any search results because there is nothing in the database, and thus nothing to be indexed by Solr. Click on the "Admin" tab, click "Add a New Dataset" in the sidebar menu, and get going!
11. Once you've added some test data, you'll want to index it in Solr. Navigate to your site's base directory and edit the file `SolrIndexerExample.py` (or `SolrIndexerExample.php` if you only run PHP) to specify the URL of your Solr server where indicated. Then, run the script.
10. At this point, the site should function, but it still may not look right. Chances are you won't see any search results because there is nothing in the database, and nothing has been indexed in Solr. Click on the "Admin" tab, click "Add a New Dataset" in the sidebar menu, and get going!
11. Once you've added some test datasets, you'll have to index them in Solr for them to become visible in the search interface. Navigate to your site's base directory and edit the file `SolrIndexerExample.py` (or `SolrIndexerExample.php` if you run PHP) to specify the URL of your Solr server where indicated. Then, run the script.

### Follow-up Tasks
1. You'll most likely want to regularly re-index Solr to account for datasets you add, delete, or update using the Admin section. In the root directory of this repo, there are PHP and Python examples of a script which can update a Solr index, called `SolrIndexerExample`. You'll probably want to call this script or something similar with a cron job every Sunday or every night or whatever seems appropriate, depending on much updating you do. I recommend weekly, since you can also run this script on-demand from the command line if you want.
Expand Down

0 comments on commit 0b8b68f

Please sign in to comment.