snorkel.labeling.LFAnalysis¶

class
snorkel.labeling.
LFAnalysis
(L, lfs=None)[source]¶ Bases:
object
Run analyses on LFs using label matrix.
 Parameters
L (
ndarray
) – Label matrix where L_{i,j} is the label given by the jth LF to the ith candidate (using 1 for abstain)lfs (
Optional
[List
[LabelingFunction
]]) – Labeling functions used to generateL
 Raises
ValueError – If number of LFs and number of LF matrix columns differ

__init__
(L, lfs=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
 Return type
None
Methods
__init__
(L[, lfs])Initialize self.
Compute the fraction of data points with conflicting (nonabstain) labels.
Compute the fraction of data points with at least one label.
Compute the fraction of data points with at least two (nonabstain) labels.
lf_conflicts
([normalize_by_overlaps])Compute frac.
Compute frac.
Compute empirical accuracy against a set of labels Y for each LF.
lf_empirical_probs
(Y, k)Estimate conditional probability tables for each LF.
lf_overlaps
([normalize_by_coverage])Compute frac.
Infer the polarities of each LF based on evidence in a label matrix.
lf_summary
([Y, est_weights])Create a pandas DataFrame with the various perLF statistics.

label_conflict
()[source]¶ Compute the fraction of data points with conflicting (nonabstain) labels.
 Returns
Fraction of data points with conflicting labels
 Return type
float
Example
>>> L = np.array([ ... [1, 0, 0], ... [1, 1, 1], ... [1, 0, 1], ... [1, 0, 1], ... [0, 0, 0], ... ]) >>> LFAnalysis(L).label_conflict() 0.2

label_coverage
()[source]¶ Compute the fraction of data points with at least one label.
 Returns
Fraction of data points with labels
 Return type
float
Example
>>> L = np.array([ ... [1, 0, 0], ... [1, 1, 1], ... [1, 0, 1], ... [1, 0, 1], ... [0, 0, 0], ... ]) >>> LFAnalysis(L).label_coverage() 0.8

label_overlap
()[source]¶ Compute the fraction of data points with at least two (nonabstain) labels.
 Returns
Fraction of data points with overlapping labels
 Return type
float
Example
>>> L = np.array([ ... [1, 0, 0], ... [1, 1, 1], ... [1, 0, 1], ... [1, 0, 1], ... [0, 0, 0], ... ]) >>> LFAnalysis(L).label_overlap() 0.6

lf_conflicts
(normalize_by_overlaps=False)[source]¶ Compute frac. of examples each LF labels and labeled differently by another LF.
A conflicting example is one that at least one other LF returns a different (nonabstain) label for.
Note that the maximum possible conflict fraction for an LF is the LF’s overlaps fraction, unless
normalize_by_overlaps=True
, in which case it is 1. Parameters
normalize_by_overlaps (
bool
) – Normalize by overlaps of the LF, so that it returns the percent of LF overlaps that have conflicts. Returns
Fraction of conflicting examples for each LF
 Return type
numpy.ndarray
Example
>>> L = np.array([ ... [1, 0, 0], ... [1, 1, 1], ... [1, 0, 1], ... [1, 0, 1], ... [0, 0, 0], ... ]) >>> LFAnalysis(L).lf_conflicts() array([0.2, 0.2, 0. ]) >>> LFAnalysis(L).lf_conflicts(normalize_by_overlaps=True) array([0.5 , 0.33333333, 0. ])

lf_coverages
()[source]¶ Compute frac. of examples each LF labels.
 Returns
Fraction of labeled examples for each LF
 Return type
numpy.ndarray
Example
>>> L = np.array([ ... [1, 0, 0], ... [1, 1, 1], ... [1, 0, 1], ... [1, 0, 1], ... [0, 0, 0], ... ]) >>> LFAnalysis(L).lf_coverages() array([0.4, 0.8, 0.4])

lf_empirical_accuracies
(Y)[source]¶ Compute empirical accuracy against a set of labels Y for each LF.
Usually, Y represents development set labels.
 Parameters
Y (
ndarray
) – [n] or [n, 1] np.ndarray of gold labels Returns
Empirical accuracies for each LF
 Return type
numpy.ndarray

lf_empirical_probs
(Y, k)[source]¶ Estimate conditional probability tables for each LF.
Computes conditional probability tables, P(L  Y), for each LF using the provided true labels Y.
 Parameters
Y (
ndarray
) – The ndim array of true labels in {1,…,k}k (
int
) – The cardinality i.e. number of classes
 Returns
An m x (k+1) x k np.ndarray representing the m (k+1) x k conditional probability tables P_i, where P_i[l,y] represents P(LF_i = l  Y = y) empirically calculated
 Return type
np.ndarray

lf_overlaps
(normalize_by_coverage=False)[source]¶ Compute frac. of examples each LF labels that are labeled by another LF.
An overlapping example is one that at least one other LF returns a (nonabstain) label for.
Note that the maximum possible overlap fraction for an LF is the LF’s coverage, unless
normalize_by_coverage=True
, in which case it is 1. Parameters
normalize_by_coverage (
bool
) – Normalize by coverage of the LF, so that it returns the percent of LF labels that have overlaps. Returns
Fraction of overlapping examples for each LF
 Return type
numpy.ndarray
Example
>>> L = np.array([ ... [1, 0, 0], ... [1, 1, 1], ... [1, 0, 1], ... [1, 0, 1], ... [0, 0, 0], ... ]) >>> LFAnalysis(L).lf_overlaps() array([0.4, 0.6, 0.4]) >>> LFAnalysis(L).lf_overlaps(normalize_by_coverage=True) array([1. , 0.75, 1. ])

lf_polarities
()[source]¶ Infer the polarities of each LF based on evidence in a label matrix.
 Returns
Unique output labels for each LF
 Return type
List[List[int]]
Example
>>> L = np.array([ ... [1, 0, 0], ... [1, 1, 1], ... [1, 0, 1], ... [1, 0, 1], ... [0, 0, 0], ... ]) >>> LFAnalysis(L).lf_polarities() [[0, 1], [0], [0]]

lf_summary
(Y=None, est_weights=None)[source]¶ Create a pandas DataFrame with the various perLF statistics.
 Parameters
Y (
Optional
[ndarray
]) – [n] or [n, 1] np.ndarray of gold labels. If provided, the empirical weight for each LF will be calculated.est_weights (
Optional
[ndarray
]) – Learned weights for each LF
 Returns
Summary statistics for each LF
 Return type
pandas.DataFrame