snorkel.slicing.SliceAwareClassifier¶
-
class
snorkel.slicing.SliceAwareClassifier(base_architecture, head_dim, slice_names, input_data_key='input_data', task_name='task', scorer=<snorkel.analysis.scorer.Scorer object>, **multitask_kwargs)[source]¶ Bases:
snorkel.classification.multitask_classifier.MultitaskClassifierA slice-aware classifier that supports training + scoring on slice labels.
NOTE: This model currently only supports binary classification.
- Parameters
base_architecture (
Module) – A network architecture that accepts input data and outputs a representationhead_dim (
int) – Output feature dimension of the base_architecture, and input dimension of the internal prediction head:nn.Linear(head_dim, 2).slice_names (
List[str]) – A list of slice names that the model will accept initialize as tasks and accept as corresponding labelsscorer (
Scorer) – A Scorer to be used for initialization of theMultitaskClassifiersuperclass.**multitask_kwargs – Arbitrary keyword arguments to be passed to the
MultitaskClassifiersuperclass.
-
base_task[source]¶ A base
snorkel.classification.Taskthat the model will learn. This becomes amaster_head_modulethat combines slice tasks information. For more, seesnorkel.slicing.convert_to_slice_tasks.
-
__init__(base_architecture, head_dim, slice_names, input_data_key='input_data', task_name='task', scorer=<snorkel.analysis.scorer.Scorer object>, **multitask_kwargs)[source]¶ Initializes internal Module state, shared by both nn.Module and ScriptModule.
- Return type
None
Methods
__init__(base_architecture, head_dim, …[, …])Initializes internal Module state, shared by both nn.Module and ScriptModule.
add_module(name, module)Adds a child module to the current module.
add_task(task)Add a single task to the network.
apply(fn)Applies
fnrecursively to every submodule (as returned by.children()) as well as self.bfloat16()Casts all floating point parameters and buffers to
bfloat16datatype.buffers([recurse])Returns an iterator over module buffers.
calculate_loss(X_dict, Y_dict)Calculate the loss for each task and the number of data points contributing.
children()Returns an iterator over immediate children modules.
cpu()Moves all model parameters and buffers to the CPU.
cuda([device])Moves all model parameters and buffers to the GPU.
double()Casts all floating point parameters and buffers to
doubledatatype.eval()Sets the module in evaluation mode.
extra_repr()Set the extra representation of the module
float()Casts all floating point parameters and buffers to float datatype.
forward(X_dict, task_names)Do a forward pass through the network for all specified tasks.
half()Casts all floating point parameters and buffers to
halfdatatype.load(model_path)Load a saved model from the provided file path and moves it to a device.
load_state_dict(state_dict[, strict])Copies parameters and buffers from
state_dictinto this module and its descendants.make_slice_dataloader(dataset, S, …)Create DictDataLoader with slice labels, initialized from specified dataset.
modules()Returns an iterator over all modules in the network.
named_buffers([prefix, recurse])Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children()Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules([memo, prefix])Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters([prefix, recurse])Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters([recurse])Returns an iterator over module parameters.
predict(dataloader[, return_preds, remap_labels])Calculate probabilities, (optionally) predictions, and pull out gold labels.
register_backward_hook(hook)Registers a backward hook on the module.
register_buffer(name, tensor[, persistent])Adds a buffer to the module.
register_forward_hook(hook)Registers a forward hook on the module.
register_forward_pre_hook(hook)Registers a forward pre-hook on the module.
register_full_backward_hook(hook)Registers a backward hook on the module.
register_parameter(name, param)Adds a parameter to the module.
requires_grad_([requires_grad])Change if autograd should record operations on parameters in this module.
save(model_path)Save the model to the specified file path.
score(dataloaders[, remap_labels, as_dataframe])Calculate scores for the provided DictDataLoaders.
score_slices(dataloaders[, as_dataframe])Scores appropriate slice labels using the overall prediction head.
share_memory()- rtype
~T
state_dict([destination, prefix, keep_vars])Returns a dictionary containing a whole state of the module.
to(*args, **kwargs)Moves and/or casts the parameters and buffers.
train([mode])Sets the module in training mode.
type(dst_type)Casts all parameters and buffers to
dst_type.xpu([device])Moves all model parameters and buffers to the XPU.
zero_grad([set_to_none])Sets gradients of all model parameters to zero.
Attributes
T_destinationdump_patches-
add_task(task)[source]¶ Add a single task to the network.
- Parameters
task (
Task) – ATaskto add- Return type
None
-
calculate_loss(X_dict, Y_dict)[source]¶ Calculate the loss for each task and the number of data points contributing.
- Parameters
X_dict (
Dict[str,Any]) – A dict of data fieldsY_dict (
Dict[str,Tensor]) – A dict from task names to label sets
- Returns
A dict of losses by task name and seen examples by task name
- Return type
Dict[str, torch.Tensor], Dict[str, float]
-
forward(X_dict, task_names)[source]¶ Do a forward pass through the network for all specified tasks.
- Parameters
X_dict (
Dict[str,Any]) – A dict of data fieldstask_names (
Iterable[str]) – The names of the tasks to execute the forward pass for
- Returns
A dict mapping each operation name to its corresponding output
- Return type
OutputDict
- Raises
TypeError – If an Operation input has an invalid type
ValueError – If a specified Operation failed to execute
-
load(model_path)[source]¶ Load a saved model from the provided file path and moves it to a device.
- Parameters
model_path (
str) – The path to a saved model- Return type
None
-
make_slice_dataloader(dataset, S, **dataloader_kwargs)[source]¶ Create DictDataLoader with slice labels, initialized from specified dataset.
- Parameters
dataset (
DictDataset) – A DictDataset that will be converted into a slice-aware dataloaderS (
recarray) – A [num_examples, num_slices] slice matrix indicating whether each example is in every sliceslice_names – A list of slice names corresponding to columns of
Sdataloader_kwargs (
Any) – Arbitrary kwargs to be passed to DictDataLoader SeeDictDataLoader.__init__.
- Return type
DictDataLoader
-
predict(dataloader, return_preds=False, remap_labels={})[source]¶ Calculate probabilities, (optionally) predictions, and pull out gold labels.
- Parameters
dataloader (
DictDataLoader) – A DictDataLoader to make predictions forreturn_preds (
bool) – If True, include predictions in the return dict (not just probabilities)remap_labels (
Dict[str,Optional[str]]) – A dict specifying which labels in the dataset’s Y_dict (key) to remap to a new task (value)
- Returns
A dictionary mapping label type (‘golds’, ‘probs’, ‘preds’) to values
- Return type
Dict[str, Dict[str, torch.Tensor]]
-
save(model_path)[source]¶ Save the model to the specified file path.
- Parameters
model_path (
str) – The path where the model should be saved- Raises
BaseException – If the torch.save() method fails
- Return type
None
-
score(dataloaders, remap_labels={}, as_dataframe=False)[source]¶ Calculate scores for the provided DictDataLoaders.
- Parameters
dataloaders (
List[DictDataLoader]) – A list of DictDataLoaders to calculate scores forremap_labels (
Dict[str,Optional[str]]) – A dict specifying which labels in the dataset’s Y_dict (key) to remap to a new task (value)as_dataframe (
bool) – A boolean indicating whether to return results as pandas DataFrame (True) or dict (False)
- Returns
A dictionary mapping metric names to corresponding scores Metric names will be of the form “task/dataset/split/metric”
- Return type
Dict[str, float]
-
score_slices(dataloaders, as_dataframe=False)[source]¶ Scores appropriate slice labels using the overall prediction head.
In other words, uses
base_task(NOTslice_tasks) to evaluate slices.In practice, we’d like to use a final prediction from a _single_ task head. To do so,
self.base_taskleverages reweighted slice representation to make a prediction. In this method, we remap all slice-specificpredlabels toself.base_taskfor evaluation.- Parameters
dataloaders (
List[DictDataLoader]) – A list of DictDataLoaders to calculate scores foras_dataframe (
bool) – A boolean indicating whether to return results as pandas DataFrame (True) or dict (False)eval_slices_on_base_task – A boolean indicating whether to remap slice labels to base task. Otherwise, keeps evaluation of slice labels on slice-specific heads.
- Returns
A dictionary mapping metric¡ names to corresponding scores Metric names will be of the form “task/dataset/split/metric”
- Return type
Dict[str, float]