snorkel.labeling.LFAnalysis¶
-
class
snorkel.labeling.
LFAnalysis
(L, lfs=None)[source]¶ Bases:
object
Run analyses on LFs using label matrix.
- Parameters
L (
ndarray
) – Label matrix where L_{i,j} is the label given by the jth LF to the ith candidate (using -1 for abstain)lfs (
Optional
[List
[LabelingFunction
]]) – Labeling functions used to generateL
- Raises
ValueError – If number of LFs and number of LF matrix columns differ
-
__init__
(L, lfs=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
- Return type
None
Methods
__init__
(L[, lfs])Initialize self.
Compute the fraction of data points with conflicting (non-abstain) labels.
Compute the fraction of data points with at least one label.
Compute the fraction of data points with at least two (non-abstain) labels.
lf_conflicts
([normalize_by_overlaps])Compute frac.
Compute frac.
Compute empirical accuracy against a set of labels Y for each LF.
lf_empirical_probs
(Y, k)Estimate conditional probability tables for each LF.
lf_overlaps
([normalize_by_coverage])Compute frac.
Infer the polarities of each LF based on evidence in a label matrix.
lf_summary
([Y, est_weights])Create a pandas DataFrame with the various per-LF statistics.
-
label_conflict
()[source]¶ Compute the fraction of data points with conflicting (non-abstain) labels.
- Returns
Fraction of data points with conflicting labels
- Return type
float
Example
>>> L = np.array([ ... [-1, 0, 0], ... [-1, -1, -1], ... [1, 0, -1], ... [-1, 0, -1], ... [0, 0, 0], ... ]) >>> LFAnalysis(L).label_conflict() 0.2
-
label_coverage
()[source]¶ Compute the fraction of data points with at least one label.
- Returns
Fraction of data points with labels
- Return type
float
Example
>>> L = np.array([ ... [-1, 0, 0], ... [-1, -1, -1], ... [1, 0, -1], ... [-1, 0, -1], ... [0, 0, 0], ... ]) >>> LFAnalysis(L).label_coverage() 0.8
-
label_overlap
()[source]¶ Compute the fraction of data points with at least two (non-abstain) labels.
- Returns
Fraction of data points with overlapping labels
- Return type
float
Example
>>> L = np.array([ ... [-1, 0, 0], ... [-1, -1, -1], ... [1, 0, -1], ... [-1, 0, -1], ... [0, 0, 0], ... ]) >>> LFAnalysis(L).label_overlap() 0.6
-
lf_conflicts
(normalize_by_overlaps=False)[source]¶ Compute frac. of examples each LF labels and labeled differently by another LF.
A conflicting example is one that at least one other LF returns a different (non-abstain) label for.
Note that the maximum possible conflict fraction for an LF is the LF’s overlaps fraction, unless
normalize_by_overlaps=True
, in which case it is 1.- Parameters
normalize_by_overlaps (
bool
) – Normalize by overlaps of the LF, so that it returns the percent of LF overlaps that have conflicts.- Returns
Fraction of conflicting examples for each LF
- Return type
numpy.ndarray
Example
>>> L = np.array([ ... [-1, 0, 0], ... [-1, -1, -1], ... [1, 0, -1], ... [-1, 0, -1], ... [0, 0, 0], ... ]) >>> LFAnalysis(L).lf_conflicts() array([0.2, 0.2, 0. ]) >>> LFAnalysis(L).lf_conflicts(normalize_by_overlaps=True) array([0.5 , 0.33333333, 0. ])
-
lf_coverages
()[source]¶ Compute frac. of examples each LF labels.
- Returns
Fraction of labeled examples for each LF
- Return type
numpy.ndarray
Example
>>> L = np.array([ ... [-1, 0, 0], ... [-1, -1, -1], ... [1, 0, -1], ... [-1, 0, -1], ... [0, 0, 0], ... ]) >>> LFAnalysis(L).lf_coverages() array([0.4, 0.8, 0.4])
-
lf_empirical_accuracies
(Y)[source]¶ Compute empirical accuracy against a set of labels Y for each LF.
Usually, Y represents development set labels.
- Parameters
Y (
ndarray
) – [n] or [n, 1] np.ndarray of gold labels- Returns
Empirical accuracies for each LF
- Return type
numpy.ndarray
-
lf_empirical_probs
(Y, k)[source]¶ Estimate conditional probability tables for each LF.
Computes conditional probability tables, P(L | Y), for each LF using the provided true labels Y.
- Parameters
Y (
ndarray
) – The n-dim array of true labels in {1,…,k}k (
int
) – The cardinality i.e. number of classes
- Returns
An m x (k+1) x k np.ndarray representing the m (k+1) x k conditional probability tables P_i, where P_i[l,y] represents P(LF_i = l | Y = y) empirically calculated
- Return type
np.ndarray
-
lf_overlaps
(normalize_by_coverage=False)[source]¶ Compute frac. of examples each LF labels that are labeled by another LF.
An overlapping example is one that at least one other LF returns a (non-abstain) label for.
Note that the maximum possible overlap fraction for an LF is the LF’s coverage, unless
normalize_by_coverage=True
, in which case it is 1.- Parameters
normalize_by_coverage (
bool
) – Normalize by coverage of the LF, so that it returns the percent of LF labels that have overlaps.- Returns
Fraction of overlapping examples for each LF
- Return type
numpy.ndarray
Example
>>> L = np.array([ ... [-1, 0, 0], ... [-1, -1, -1], ... [1, 0, -1], ... [-1, 0, -1], ... [0, 0, 0], ... ]) >>> LFAnalysis(L).lf_overlaps() array([0.4, 0.6, 0.4]) >>> LFAnalysis(L).lf_overlaps(normalize_by_coverage=True) array([1. , 0.75, 1. ])
-
lf_polarities
()[source]¶ Infer the polarities of each LF based on evidence in a label matrix.
- Returns
Unique output labels for each LF
- Return type
List[List[int]]
Example
>>> L = np.array([ ... [-1, 0, 0], ... [-1, -1, -1], ... [1, 0, -1], ... [-1, 0, -1], ... [0, 0, 0], ... ]) >>> LFAnalysis(L).lf_polarities() [[0, 1], [0], [0]]
-
lf_summary
(Y=None, est_weights=None)[source]¶ Create a pandas DataFrame with the various per-LF statistics.
- Parameters
Y (
Optional
[ndarray
]) – [n] or [n, 1] np.ndarray of gold labels. If provided, the empirical weight for each LF will be calculated.est_weights (
Optional
[ndarray
]) – Learned weights for each LF
- Returns
Summary statistics for each LF
- Return type
pandas.DataFrame