# snorkel.labeling.LFAnalysis¶

class `snorkel.labeling.``LFAnalysis`(L, lfs=None)[source]

Bases: `object`

Run analyses on LFs using label matrix.

Parameters
• L (`ndarray`) – Label matrix where L_{i,j} is the label given by the jth LF to the ith candidate (using -1 for abstain)

• lfs (`Optional`[`List`[`LabelingFunction`]]) – Labeling functions used to generate `L`

Raises

ValueError – If number of LFs and number of LF matrix columns differ

`L`[source]

See above.

`__init__`(L, lfs=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Return type

`None`

Methods

 `__init__`(L[, lfs]) Initialize self. Compute the fraction of data points with conflicting (non-abstain) labels. Compute the fraction of data points with at least one label. Compute the fraction of data points with at least two (non-abstain) labels. `lf_conflicts`([normalize_by_overlaps]) Compute frac. Compute frac. Compute empirical accuracy against a set of labels Y for each LF. Estimate conditional probability tables for each LF. `lf_overlaps`([normalize_by_coverage]) Compute frac. Infer the polarities of each LF based on evidence in a label matrix. `lf_summary`([Y, est_weights]) Create a pandas DataFrame with the various per-LF statistics.
`label_conflict`()[source]

Compute the fraction of data points with conflicting (non-abstain) labels.

Returns

Fraction of data points with conflicting labels

Return type

float

Example

```>>> L = np.array([
...     [-1, 0, 0],
...     [-1, -1, -1],
...     [1, 0, -1],
...     [-1, 0, -1],
...     [0, 0, 0],
... ])
>>> LFAnalysis(L).label_conflict()
0.2
```
`label_coverage`()[source]

Compute the fraction of data points with at least one label.

Returns

Fraction of data points with labels

Return type

float

Example

```>>> L = np.array([
...     [-1, 0, 0],
...     [-1, -1, -1],
...     [1, 0, -1],
...     [-1, 0, -1],
...     [0, 0, 0],
... ])
>>> LFAnalysis(L).label_coverage()
0.8
```
`label_overlap`()[source]

Compute the fraction of data points with at least two (non-abstain) labels.

Returns

Fraction of data points with overlapping labels

Return type

float

Example

```>>> L = np.array([
...     [-1, 0, 0],
...     [-1, -1, -1],
...     [1, 0, -1],
...     [-1, 0, -1],
...     [0, 0, 0],
... ])
>>> LFAnalysis(L).label_overlap()
0.6
```
`lf_conflicts`(normalize_by_overlaps=False)[source]

Compute frac. of examples each LF labels and labeled differently by another LF.

A conflicting example is one that at least one other LF returns a different (non-abstain) label for.

Note that the maximum possible conflict fraction for an LF is the LF’s overlaps fraction, unless `normalize_by_overlaps=True`, in which case it is 1.

Parameters

normalize_by_overlaps (`bool`) – Normalize by overlaps of the LF, so that it returns the percent of LF overlaps that have conflicts.

Returns

Fraction of conflicting examples for each LF

Return type

numpy.ndarray

Example

```>>> L = np.array([
...     [-1, 0, 0],
...     [-1, -1, -1],
...     [1, 0, -1],
...     [-1, 0, -1],
...     [0, 0, 0],
... ])
>>> LFAnalysis(L).lf_conflicts()
array([0.2, 0.2, 0. ])
>>> LFAnalysis(L).lf_conflicts(normalize_by_overlaps=True)
array([0.5       , 0.33333333, 0.        ])
```
`lf_coverages`()[source]

Compute frac. of examples each LF labels.

Returns

Fraction of labeled examples for each LF

Return type

numpy.ndarray

Example

```>>> L = np.array([
...     [-1, 0, 0],
...     [-1, -1, -1],
...     [1, 0, -1],
...     [-1, 0, -1],
...     [0, 0, 0],
... ])
>>> LFAnalysis(L).lf_coverages()
array([0.4, 0.8, 0.4])
```
`lf_empirical_accuracies`(Y)[source]

Compute empirical accuracy against a set of labels Y for each LF.

Usually, Y represents development set labels.

Parameters

Y (`ndarray`) – [n] or [n, 1] np.ndarray of gold labels

Returns

Empirical accuracies for each LF

Return type

numpy.ndarray

`lf_empirical_probs`(Y, k)[source]

Estimate conditional probability tables for each LF.

Computes conditional probability tables, P(L | Y), for each LF using the provided true labels Y.

Parameters
• Y (`ndarray`) – The n-dim array of true labels in {1,…,k}

• k (`int`) – The cardinality i.e. number of classes

Returns

An m x (k+1) x k np.ndarray representing the m (k+1) x k conditional probability tables P_i, where P_i[l,y] represents P(LF_i = l | Y = y) empirically calculated

Return type

np.ndarray

`lf_overlaps`(normalize_by_coverage=False)[source]

Compute frac. of examples each LF labels that are labeled by another LF.

An overlapping example is one that at least one other LF returns a (non-abstain) label for.

Note that the maximum possible overlap fraction for an LF is the LF’s coverage, unless `normalize_by_coverage=True`, in which case it is 1.

Parameters

normalize_by_coverage (`bool`) – Normalize by coverage of the LF, so that it returns the percent of LF labels that have overlaps.

Returns

Fraction of overlapping examples for each LF

Return type

numpy.ndarray

Example

```>>> L = np.array([
...     [-1, 0, 0],
...     [-1, -1, -1],
...     [1, 0, -1],
...     [-1, 0, -1],
...     [0, 0, 0],
... ])
>>> LFAnalysis(L).lf_overlaps()
array([0.4, 0.6, 0.4])
>>> LFAnalysis(L).lf_overlaps(normalize_by_coverage=True)
array([1.  , 0.75, 1.  ])
```
`lf_polarities`()[source]

Infer the polarities of each LF based on evidence in a label matrix.

Returns

Unique output labels for each LF

Return type

List[List[int]]

Example

```>>> L = np.array([
...     [-1, 0, 0],
...     [-1, -1, -1],
...     [1, 0, -1],
...     [-1, 0, -1],
...     [0, 0, 0],
... ])
>>> LFAnalysis(L).lf_polarities()
[[0, 1], , ]
```
`lf_summary`(Y=None, est_weights=None)[source]

Create a pandas DataFrame with the various per-LF statistics.

Parameters
• Y (`Optional`[`ndarray`]) – [n] or [n, 1] np.ndarray of gold labels. If provided, the empirical weight for each LF will be calculated.

• est_weights (`Optional`[`ndarray`]) – Learned weights for each LF

Returns

Summary statistics for each LF

Return type

pandas.DataFrame