Skip to content

Latest commit

 

History

History
53 lines (36 loc) · 5 KB

notation.md

File metadata and controls

53 lines (36 loc) · 5 KB

Problem Notation

Supervised Scenario

  • Consider an input pattern observed with probability distribution and a ground-truth label observed with conditional probability distribution .
  • Given a finite sample , where .
  • Objective: estimate a predictive model that maps or learn statistics of , where .

Crowdsourcing scenario

  • Same objective that supervised scenario, but the ground-truth labels corresponding to the input patterns are not directly observed.
  • Consider labels that do not follow the ground-truth distribution . Instead, they are generated from an unknown process that represents the annotator ability to detect the ground truth.

Individual

  • Consider multiple noise labels given by annotators.
  • These annotations come from a subset of the set of all the annotators participating in the labelling process. ( )
  • The annotator identity could be define as a input variable: , with
    • Then
  • Given a sample

Global

  • Consider that we do not known or do not care which annotators provided the labels: we know but not
  • Consider the number of times that all the annotators gives each possible labels:
  • Given a sample .

Focus

In this implementation, we study the pattern recognition case, that is, we let be a small set of K categories or classes .


One also can define two scenarios based on the annotation density and assumptions:

  • Dense:
    • All the annotators labels each data:
    • The implementation is simpler since fixed size matrices are assumed.
  • Sparse:
    • The number of labels collected by data point and annotator varies:
    • An appropiate implementation lead to computational efficiency.

Confusion Matrices

  • Individual confusion matrix (for an annotator t):
  • Global confusion matrix (for all the annotations):