One of the core operations in Snorkel is _annotating_ the candidates in various ways. We can think of generating features for the candidates as annotating them (creating a Feature object), and can also view supervision via labeling functions as annotating them (creating a Label object).

Core Data Models

class snorkel.models.annotation.AnnotationKeyMixin[source]

Mixin class for defining annotation key tables. An AnnotationKey is the unique name associated with a set of Annotations, corresponding e.g. to a single labeling or feature function. An AnnotationKey may have an associated weight (Parameter) associated with it.

class snorkel.models.annotation.AnnotationMixin[source]

Mixin class for defining annotation tables. An annotation is a value associated with a Candidate. Examples include labels, features, and predictions. New types of annotations can be defined by creating an annotation class and corresponding annotation, for example:

from snorkel.models.annotation import AnnotationMixin
from snorkel.models.meta import SnorkelBase

class NewAnnotation(AnnotationMixin, SnorkelBase):
    value = Column(Float, nullable=False)

# The entire storage schema, including NewAnnotation, can now be initialized with the following import
import snorkel.models

The annotation class should include a Column attribute named value.

class snorkel.models.annotation.Feature(**kwargs)[source]

An element of a representation of a Candidate in a feature space.

A Feature’s annotation key identifies the definition of the Feature, e.g., a function that implements it or the library name and feature name in an automatic featurization library.

class snorkel.models.annotation.GoldLabel(**kwargs)[source]

A separate class for labels from human annotators or other gold standards.

class snorkel.models.annotation.Label(**kwargs)[source]

A discrete label associated with a Candidate, indicating a target prediction value.

Labels are used to represent the output of labeling functions.

A Label’s annotation key identifies the labeling function that provided the Label.

class snorkel.models.annotation.Prediction(**kwargs)[source]

A probability associated with a Candidate, indicating the degree of belief that the Candidate is true.

A Prediction’s annotation key indicates which process or method produced the Prediction, e.g., which model with which ParameterSet.

class snorkel.models.annotation.StableLabel(**kwargs)[source]

A special secondary table for preserving labels created by human annotators (e.g. in the Viewer) in a stable format that does not cascade, and is independent of the Candidate ids.

Core Objects for Annotations (Features, Labels)

class snorkel.annotations.Annotator(annotation_class, annotation_key_class, f_gen)[source]

Abstract class for annotating candidates and persisting these annotations to DB

apply_existing(split=0, key_group=0, cids_query=None, **kwargs)[source]

Alias for apply that emphasizes we are using an existing AnnotatorKey set.

clear(session, split=0, key_group=0, replace_key_set=True, cids_query=None, **kwargs)[source]

Deletes the Annotations for the Candidates in the given split. If replace_key_set=True, deletes all Annotations (of this Annotation sub-class) and also deletes all AnnotationKeys (of this sub-class)

class snorkel.annotations.FeatureAnnotator(f=<function get_span_feats>)[source]

Apply feature generators to the candidates, generating Feature annotations

class snorkel.annotations.LabelAnnotator(lfs=None, label_generator=None)[source]

Apply labeling functions to the candidates, generating Label annotations

Parameters:lfs – A _list_ of labeling functions (LFs)
snorkel.annotations.load_marginals(session, X=None, split=0, training=True)[source]

Load the marginal probs. for a given split of Candidates

snorkel.annotations.load_matrix(matrix_class, annotation_key_class, annotation_class, session, split=0, cids_query=None, key_group=0, key_names=None, zero_one=False, load_as_array=False)[source]

Returns the annotations corresponding to a split of candidates with N members and an AnnotationKey group with M distinct keys as an N x M CSR sparse matrix.

snorkel.annotations.save_marginals(session, X, marginals, training=True)[source]

Save marginal probabilities for a set of Candidates to db.

  • X – Either an M x N csr_AnnotationMatrix-class matrix, where M is number of candidates, N number of LFs/features; OR a list of arbitrary objects with candidate ids accessible via a .id attrib
  • marginals – A dense M x K matrix of marginal probabilities, where K is the cardinality of the candidates, OR a M-dim list/array if K=2.
  • training – If True, these are training marginals / labels; else they are saved as end model predictions.

Note: The marginals for k=0 are not stored, only for k = 1,...,K