snorkel.slicing.SliceAwareClassifier

class snorkel.slicing.SliceAwareClassifier(base_architecture, head_dim, slice_names, input_data_key='input_data', task_name='task', scorer=<snorkel.analysis.scorer.Scorer object>, **multitask_kwargs)[source]

Bases: snorkel.classification.multitask_classifier.MultitaskClassifier

A slice-aware classifier that supports training + scoring on slice labels.

NOTE: This model currently only supports binary classification.

Parameters
  • base_architecture (Module) – A network architecture that accepts input data and outputs a representation

  • head_dim (int) – Output feature dimension of the base_architecture, and input dimension of the internal prediction head: nn.Linear(head_dim, 2).

  • slice_names (List[str]) – A list of slice names that the model will accept initialize as tasks and accept as corresponding labels

  • scorer (Scorer) – A Scorer to be used for initialization of the MultitaskClassifier superclass.

  • **multitask_kwargs – Arbitrary keyword arguments to be passed to the MultitaskClassifier superclass.

base_task[source]

A base snorkel.classification.Task that the model will learn. This becomes a master_head_module that combines slice tasks information. For more, see snorkel.slicing.convert_to_slice_tasks.

slice_names[source]

See above

__init__(base_architecture, head_dim, slice_names, input_data_key='input_data', task_name='task', scorer=<snorkel.analysis.scorer.Scorer object>, **multitask_kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Return type

None

Methods

__init__(base_architecture, head_dim, …[, …])

Initializes internal Module state, shared by both nn.Module and ScriptModule.

add_module(name, module)

Adds a child module to the current module.

add_task(task)

Add a single task to the network.

apply(fn)

Applies fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Returns an iterator over module buffers.

calculate_loss(X_dict, Y_dict)

Calculate the loss for each task and the number of data points contributing.

children()

Returns an iterator over immediate children modules.

cpu()

Moves all model parameters and buffers to the CPU.

cuda([device])

Moves all model parameters and buffers to the GPU.

double()

Casts all floating point parameters and buffers to double datatype.

eval()

Sets the module in evaluation mode.

extra_repr()

Set the extra representation of the module

float()

Casts all floating point parameters and buffers to float datatype.

forward(X_dict, task_names)

Do a forward pass through the network for all specified tasks.

half()

Casts all floating point parameters and buffers to half datatype.

load(model_path)

Load a saved model from the provided file path and moves it to a device.

load_state_dict(state_dict[, strict])

Copies parameters and buffers from state_dict into this module and its descendants.

make_slice_dataloader(dataset, S, …)

Create DictDataLoader with slice labels, initialized from specified dataset.

modules()

Returns an iterator over all modules in the network.

named_buffers([prefix, recurse])

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix])

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse])

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Returns an iterator over module parameters.

predict(dataloader[, return_preds, remap_labels])

Calculate probabilities, (optionally) predictions, and pull out gold labels.

register_backward_hook(hook)

Registers a backward hook on the module.

register_buffer(name, tensor[, persistent])

Adds a buffer to the module.

register_forward_hook(hook)

Registers a forward hook on the module.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the module.

register_full_backward_hook(hook)

Registers a backward hook on the module.

register_parameter(name, param)

Adds a parameter to the module.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

save(model_path)

Save the model to the specified file path.

score(dataloaders[, remap_labels, as_dataframe])

Calculate scores for the provided DictDataLoaders.

score_slices(dataloaders[, as_dataframe])

Scores appropriate slice labels using the overall prediction head.

share_memory()

rtype

~T

state_dict([destination, prefix, keep_vars])

Returns a dictionary containing a whole state of the module.

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

train([mode])

Sets the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

xpu([device])

Moves all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Sets gradients of all model parameters to zero.

Attributes

T_destination

dump_patches

add_task(task)[source]

Add a single task to the network.

Parameters

task (Task) – A Task to add

Return type

None

calculate_loss(X_dict, Y_dict)[source]

Calculate the loss for each task and the number of data points contributing.

Parameters
  • X_dict (Dict[str, Any]) – A dict of data fields

  • Y_dict (Dict[str, Tensor]) – A dict from task names to label sets

Returns

A dict of losses by task name and seen examples by task name

Return type

Dict[str, torch.Tensor], Dict[str, float]

forward(X_dict, task_names)[source]

Do a forward pass through the network for all specified tasks.

Parameters
  • X_dict (Dict[str, Any]) – A dict of data fields

  • task_names (Iterable[str]) – The names of the tasks to execute the forward pass for

Returns

A dict mapping each operation name to its corresponding output

Return type

OutputDict

Raises
  • TypeError – If an Operation input has an invalid type

  • ValueError – If a specified Operation failed to execute

load(model_path)[source]

Load a saved model from the provided file path and moves it to a device.

Parameters

model_path (str) – The path to a saved model

Return type

None

make_slice_dataloader(dataset, S, **dataloader_kwargs)[source]

Create DictDataLoader with slice labels, initialized from specified dataset.

Parameters
  • dataset (DictDataset) – A DictDataset that will be converted into a slice-aware dataloader

  • S (recarray) – A [num_examples, num_slices] slice matrix indicating whether each example is in every slice

  • slice_names – A list of slice names corresponding to columns of S

  • dataloader_kwargs (Any) – Arbitrary kwargs to be passed to DictDataLoader See DictDataLoader.__init__.

Return type

DictDataLoader

predict(dataloader, return_preds=False, remap_labels={})[source]

Calculate probabilities, (optionally) predictions, and pull out gold labels.

Parameters
  • dataloader (DictDataLoader) – A DictDataLoader to make predictions for

  • return_preds (bool) – If True, include predictions in the return dict (not just probabilities)

  • remap_labels (Dict[str, Optional[str]]) – A dict specifying which labels in the dataset’s Y_dict (key) to remap to a new task (value)

Returns

A dictionary mapping label type (‘golds’, ‘probs’, ‘preds’) to values

Return type

Dict[str, Dict[str, torch.Tensor]]

save(model_path)[source]

Save the model to the specified file path.

Parameters

model_path (str) – The path where the model should be saved

Raises

BaseException – If the torch.save() method fails

Return type

None

score(dataloaders, remap_labels={}, as_dataframe=False)[source]

Calculate scores for the provided DictDataLoaders.

Parameters
  • dataloaders (List[DictDataLoader]) – A list of DictDataLoaders to calculate scores for

  • remap_labels (Dict[str, Optional[str]]) – A dict specifying which labels in the dataset’s Y_dict (key) to remap to a new task (value)

  • as_dataframe (bool) – A boolean indicating whether to return results as pandas DataFrame (True) or dict (False)

Returns

A dictionary mapping metric names to corresponding scores Metric names will be of the form “task/dataset/split/metric”

Return type

Dict[str, float]

score_slices(dataloaders, as_dataframe=False)[source]

Scores appropriate slice labels using the overall prediction head.

In other words, uses base_task (NOT slice_tasks) to evaluate slices.

In practice, we’d like to use a final prediction from a _single_ task head. To do so, self.base_task leverages reweighted slice representation to make a prediction. In this method, we remap all slice-specific pred labels to self.base_task for evaluation.

Parameters
  • dataloaders (List[DictDataLoader]) – A list of DictDataLoaders to calculate scores for

  • as_dataframe (bool) – A boolean indicating whether to return results as pandas DataFrame (True) or dict (False)

  • eval_slices_on_base_task – A boolean indicating whether to remap slice labels to base task. Otherwise, keeps evaluation of slice labels on slice-specific heads.

Returns

A dictionary mapping metric¡ names to corresponding scores Metric names will be of the form “task/dataset/split/metric”

Return type

Dict[str, float]