snorkel.analysis.get_label_instances¶
-
snorkel.analysis.get_label_instances(bucket, x, *y)[source]¶ Return instances in x with the specified combination of labels.
- Parameters
bucket (
Tuple[int, …]) – A tuple of label values corresponding to which instances from x are returnedx (
ndarray) – NumPy array of data instances to be returned*y – A list of np.ndarray of (int) labels
- Returns
NumPy array of instances from x with the specified combination of labels
- Return type
np.ndarray
Example
A common use case is calling
get_label_instances(bucket, x.to_numpy(), Y_gold, Y_pred)wherexis a NumPy array of data instances that the labels correspond to,Y_goldis a list of gold (i.e. ground truth) labels, andY_predis a corresponding list of predicted labels.>>> import pandas as pd >>> x = pd.DataFrame(data={'col1': ["this is a string", "a second string", "a third string"], 'col2': ["1", "2", "3"]}) >>> Y_gold = np.array([1, 1, 1]) >>> Y_pred = np.array([1, 0, 0]) >>> bucket = (1, 0)
The returned NumPy array of data instances from
xwill correspond to the rows where the first list had a 1 and the second list had a 0. >>> get_label_instances(bucket, x.to_numpy(), Y_gold, Y_pred) array([[‘a second string’, ‘2’],[‘a third string’, ‘3’]], dtype=object)
More generally, given bucket
(i, j, ...)and listsy1, y2, ...the returned data instances fromxwill correspond to the rows where y1 had label i, y2 had label j, and so on. Note thatxandymust all be the same length.