Skip to content
This repository has been archived by the owner on Aug 27, 2019. It is now read-only.
Victoria Van Hyning edited this page Mar 23, 2016 · 10 revisions

Zooniverse Aggregation Code (mostly for Panoptes; for further details see http://zooniverse-aggregation.readthedocs.org/en/latest/)

The main bit of code is in the algorithms folder. The main code is in the aggregation_api.py file. In aggregation_api.py, there is a class Aggregation. The constructor for Aggregation takes three params

  • project_name
  • environment
  • project_id

The first two options are mostly just for development - in production they can be ignored. Project_id is the numerical id of the project according to Panoptes. Note if project_name and project_id are both none, Aggregation tries to work for Ourboros projects (currently just Penguins).

There are two main functions in Aggregation migrate() aggregate()

_migrate_ is used to transfer classifications from postgres over to Cassandra until we can get the classifications going directly to Cassandra.

__aggregate__ is the function you call to do the actual aggregation. The function has the following params (= means default value if none given, note that Python is restrictive in what can be a default value, so the default values given below may be different than what you actually see in the function def'n in the code, in which case the default values are assigned later) (For details of this function see)

workflows = self.workflows

  • which workflows you want to aggregate for the given project. Default is all workflows

subject_set = self.get_retired_subjects(workflow_id)

  • for a given project and workflow, what subjects you want to aggregate. Default is all retired subjects in that project/workflow

gold_standard_clusters = ([],[])

  • gold standard clusters are clusters we know for certain exist (or don't exist). This mostly just for IBCC use, so still very much in development.

expert = None

  • a list of experts for whom we want to ignore their classifications. This is useful when testing the accuracy of other users. So also still in development.

store_results = True

  • do we want to store the results back to postgres? Default is yes - if false will return the values instead of storing them. Useful for exploring results (see marmot.py) - still in development.

AggregationAPI maintains connections to the Panoptes API, Postgres and Cassandra. In addition, the two most important instances variables; self.cluster_algs and self.classification_alg

self.cluster_algs

  • is a dictionary which maps from shape to corresponding clustering algorithm. (Note - we cluster based on shape not on tool type.) The dictionary set up means that we can use different clustering algorithms for different shapes.

self.classification_alg

  • what classification algorithm we are going to use.
Clone this wiki locally