class snorkel.analysis.Scorer(metrics=None, custom_metric_funcs=None, abstain_label=-1)[source]

Bases: object

Calculate one or more scores from user-specified and/or user-defined metrics.

  • metrics (Optional[List[str]]) – A list of metric names, all of which are defined in METRICS

  • custom_metric_funcs (Optional[Mapping[str, Callable[…, float]]]) – An optional dictionary mapping the names of custom metrics to the functions that produce them. Each custom metric function should accept golds, preds, and probs as input (just like the standard metrics in METRICS) and return either a single score (float) or a dictionary of metric names to scores (if the function calculates multiple values, for example). See the unit tests for an example.

  • abstain_label (Optional[int]) – The gold label for which examples will be ignored. By default, follow convention that abstains are -1.


ValueError – If a specified standard metric is not found in the METRICS dictionary


A dictionary mapping metric names to the corresponding functions for calculating that metric

__init__(metrics=None, custom_metric_funcs=None, abstain_label=-1)[source]

Initialize self. See help(type(self)) for accurate signature.

Return type



__init__([metrics, custom_metric_funcs, …])

Initialize self.

score(golds[, preds, probs])

Calculate scores for one or more user-specified metrics.

score_slices(S, golds, preds, probs[, …])

Calculate user-specified and/or user-defined metrics overall + slices.

score(golds, preds=None, probs=None)[source]

Calculate scores for one or more user-specified metrics.

  • golds (ndarray) – An array of gold (int) labels to base scores on

  • preds (Optional[ndarray]) – An [n_datapoints,] or [n_datapoints, 1] array of (int) predictions to score

  • probs (Optional[ndarray]) – An [n_datapoints, n_classes] array of probabilistic (float) predictions

  • most metrics require either preds or probs, but not both, these (Because) –

  • are optional; it is up to the metric function that will be called to (values) –

  • an exception if a field it requires is not passed to the score() method. (raise) –


A dictionary mapping metric names to metric scores

Return type

Dict[str, float]


ValueError – If no gold labels were provided

score_slices(S, golds, preds, probs, as_dataframe=False)[source]

Calculate user-specified and/or user-defined metrics overall + slices.

  • S (recarray) – A recarray with entries of length n_examples corresponding to slice names

  • golds (ndarray) – Gold (aka ground truth) labels (integers)

  • preds (ndarray) – Predictions (integers)

  • probs (ndarray) – Probabilities (floats)

  • as_dataframe (bool) – A boolean indicating whether to return results as pandas DataFrame (True) or dict (False)


A dictionary mapping slice_name to metric names to metric scores or metrics formatted as pandas DataFrame

Return type

Union[Dict, pd.DataFrame]