snorkel.analysis.get_label_instances¶

snorkel.analysis.
get_label_instances
(bucket, x, *y)[source]¶ Return instances in x with the specified combination of labels.
 Parameters
bucket (
Tuple
[int
, …]) – A tuple of label values corresponding to which instances from x are returnedx (
ndarray
) – NumPy array of data instances to be returned*y – A list of np.ndarray of (int) labels
 Returns
NumPy array of instances from x with the specified combination of labels
 Return type
np.ndarray
Example
A common use case is calling
get_label_instances(bucket, x.to_numpy(), Y_gold, Y_pred)
wherex
is a NumPy array of data instances that the labels correspond to,Y_gold
is a list of gold (i.e. ground truth) labels, andY_pred
is a corresponding list of predicted labels.>>> import pandas as pd >>> x = pd.DataFrame(data={'col1': ["this is a string", "a second string", "a third string"], 'col2': ["1", "2", "3"]}) >>> Y_gold = np.array([1, 1, 1]) >>> Y_pred = np.array([1, 0, 0]) >>> bucket = (1, 0)
The returned NumPy array of data instances from
x
will correspond to the rows where the first list had a 1 and the second list had a 0. >>> get_label_instances(bucket, x.to_numpy(), Y_gold, Y_pred) array([[‘a second string’, ‘2’],[‘a third string’, ‘3’]], dtype=object)
More generally, given bucket
(i, j, ...)
and listsy1, y2, ...
the returned data instances fromx
will correspond to the rows where y1 had label i, y2 had label j, and so on. Note thatx
andy
must all be the same length.