snorkel.labeling.LabelModel

class snorkel.labeling.LabelModel(cardinality=2, **kwargs)[source]

Bases: torch.nn.modules.module.Module

A model for learning the LF accuracies and combining their output labels.

This class learns a model of the labeling functions’ conditional probabilities of outputting the true (unobserved) label Y, P(lf | Y), and uses this learned model to re-weight and combine their output labels.

This class is based on the approach in [Training Complex Models with Multi-Task Weak Supervision](https://arxiv.org/abs/1810.02840), published in AAAI‘19. In this approach, we compute the inverse generalized covariance matrix of the junction tree of a given LF dependency graph, and perform a matrix completion-style approach with respect to these empirical statistics. The result is an estimate of the conditional LF probabilities, P(lf | Y), which are then set as the parameters of the label model used to re-weight and combine the labels output by the LFs.

Currently this class uses a conditionally independent label model, in which the LFs are assumed to be conditionally independent given Y.

Examples

>>> label_model = LabelModel()
>>> label_model = LabelModel(cardinality=3)
>>> label_model = LabelModel(cardinality=3, device='cpu')
>>> label_model = LabelModel(cardinality=3)
Parameters
  • cardinality (int) – Number of classes, by default 2

  • **kwargs – Arguments for changing config defaults

Raises

ValueError – If config device set to cuda but only cpu is available

cardinality[source]

Number of classes, by default 2

config[source]

Training configuration

seed[source]

Random seed

__init__(cardinality=2, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

Return type

None

Methods

__init__([cardinality])

Initialize self.

add_module(name, module)

Adds a child module to the current module.

apply(fn)

Applies fn recursively to every submodule (as returned by .children()) as well as self.

buffers([recurse])

Returns an iterator over module buffers.

children()

Returns an iterator over immediate children modules.

cpu()

Moves all model parameters and buffers to the CPU.

cuda([device])

Moves all model parameters and buffers to the GPU.

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Sets the module in evaluation mode.

extra_repr()

Set the extra representation of the module

fit(L_train[, Y_dev, class_balance])

Train label model.

float()

Casts all floating point parameters and buffers to float datatype.

forward(*input)

Defines the computation performed at every call.

get_conditional_probs()

Return the estimated conditional probabilities table.

get_weights()

Return the vector of learned LF weights for combining LFs.

half()

Casts all floating point parameters and buffers to half datatype.

load(source)

Load existing label model.

load_state_dict(state_dict[, strict])

Copies parameters and buffers from state_dict into this module and its descendants.

modules()

Returns an iterator over all modules in the network.

named_buffers([prefix, recurse])

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix])

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse])

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Returns an iterator over module parameters.

predict(L[, return_probs, tie_break_policy])

Return predicted labels, with ties broken according to policy.

predict_proba(L)

Return label probabilities P(Y | lambda).

register_backward_hook(hook)

Registers a backward hook on the module.

register_buffer(name, tensor)

Adds a persistent buffer to the module.

register_forward_hook(hook)

Registers a forward hook on the module.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the module.

register_parameter(name, param)

Adds a parameter to the module.

save(destination)

Save label model.

score(L, Y[, metrics, tie_break_policy])

Calculate one or more scores from user-specified and/or user-defined metrics.

share_memory()

state_dict([destination, prefix, keep_vars])

Returns a dictionary containing a whole state of the module.

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

train([mode])

Sets the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

zero_grad()

Sets gradients of all model parameters to zero.

Attributes

dump_patches

fit(L_train, Y_dev=None, class_balance=None, **kwargs)[source]

Train label model.

Train label model to estimate mu, the parameters used to combine LFs.

Parameters
  • L_train (ndarray) – An [n,m] matrix with values in {-1,0,1,…,k-1}

  • Y_dev (Optional[ndarray]) – Gold labels for dev set for estimating class_balance, by default None

  • class_balance (Optional[List[float]]) – Each class’s percentage of the population, by default None

  • **kwargs – Arguments for changing train config defaults

Raises

Exception – If loss in NaN

Examples

>>> L = np.array([[0, 0, -1], [-1, 0, 1], [1, -1, 0]])
>>> Y_dev = [0, 1, 0]
>>> label_model = LabelModel(verbose=False)
>>> label_model.fit(L)
>>> label_model.fit(L, Y_dev=Y_dev)
>>> label_model.fit(L, class_balance=[0.7, 0.3])
Return type

None

get_conditional_probs()[source]

Return the estimated conditional probabilities table.

Return the estimated conditional probabilites table cprobs, where cprobs is an (m, k+1, k)-dim np.ndarray with:

cprobs[i, j, k] = P(lf_i = j-1 | Y = k)

where m is the number of LFs, k is the cardinality, and cprobs includes the conditional abstain probabilities P(lf_i = -1 | Y = y).

Returns

An [m, k + 1, k] np.ndarray conditional probabilities table.

Return type

np.ndarray

get_weights()[source]

Return the vector of learned LF weights for combining LFs.

Returns

[m,1] vector of learned LF weights for combining LFs.

Return type

np.ndarray

Example

>>> L = np.array([[1, 1, 1], [1, 1, -1], [-1, 0, 0], [0, 0, 0]])
>>> label_model = LabelModel(verbose=False)
>>> label_model.fit(L, seed=123)
>>> np.around(label_model.get_weights(), 2)  # doctest: +SKIP
array([0.99, 0.99, 0.99])
load(source)[source]

Load existing label model.

Parameters

source (str) – Filename to load model from

Example

Load parameters saved in saved_label_model

>>> label_model.load('./saved_label_model.pkl')  # doctest: +SKIP
Return type

None

predict(L, return_probs=False, tie_break_policy='abstain')[source]

Return predicted labels, with ties broken according to policy.

Policies to break ties include: “abstain”: return an abstain vote (-1) “true-random”: randomly choose among the tied options “random”: randomly choose among tied option using deterministic hash

NOTE: if tie_break_policy=”true-random”, repeated runs may have slightly different results due to difference in broken ties

Parameters
  • L (ndarray) – An [n,m] matrix with values in {-1,0,1,…,k-1}

  • return_probs (Optional[bool]) – Whether to return probs along with preds

  • tie_break_policy (str) – Policy to break ties when converting probabilistic labels to predictions

Return type

Union[ndarray, Tuple[ndarray, ndarray]]

Returns

  • np.ndarray – An [n,1] array of integer labels

  • (np.ndarray, np.ndarray) – An [n,1] array of integer labels and an [n,k] array of probabilistic labels

Example

>>> L = np.array([[0, 0, -1], [1, 1, -1], [0, 0, -1]])
>>> label_model = LabelModel(verbose=False)
>>> label_model.fit(L)
>>> label_model.predict(L)
array([0, 1, 0])
predict_proba(L)[source]

Return label probabilities P(Y | lambda).

Parameters

L (ndarray) – An [n,m] matrix with values in {-1,0,1,…,k-1}f

Returns

An [n,k] array of probabilistic labels

Return type

np.ndarray

Example

>>> L = np.array([[0, 0, 0], [1, 1, 1], [1, 1, 1]])
>>> label_model = LabelModel(verbose=False)
>>> label_model.fit(L, seed=123)
>>> np.around(label_model.predict_proba(L), 1)  # doctest: +SKIP
array([[1., 0.],
       [0., 1.],
       [0., 1.]])
save(destination)[source]

Save label model.

Parameters

destination (str) – Filename for saving model

Example

>>> label_model.save('./saved_label_model.pkl')  # doctest: +SKIP
Return type

None

score(L, Y, metrics=['accuracy'], tie_break_policy='abstain')[source]

Calculate one or more scores from user-specified and/or user-defined metrics.

Parameters
  • L (ndarray) – An [n,m] matrix with values in {-1,0,1,…,k-1}

  • Y (ndarray) – Gold labels associated with data points in L

  • metrics (Optional[List[str]]) – A list of metric names

  • tie_break_policy (str) – Policy to break ties when converting probabilistic labels to predictions

Returns

A dictionary mapping metric names to metric scores

Return type

Dict[str, float]

Example

>>> L = np.array([[1, 1, -1], [0, 0, -1], [1, 1, -1]])
>>> label_model = LabelModel(verbose=False)
>>> label_model.fit(L)
>>> label_model.score(L, Y=np.array([1, 1, 1]))
{'accuracy': 0.6666666666666666}
>>> label_model.score(L, Y=np.array([1, 1, 1]), metrics=["f1"])
{'f1': 0.8}