snorkel.slicing.SliceAwareClassifier¶
-
class
snorkel.slicing.
SliceAwareClassifier
(base_architecture, head_dim, slice_names, input_data_key='input_data', task_name='task', scorer=<snorkel.analysis.scorer.Scorer object>, **multitask_kwargs)[source]¶ Bases:
snorkel.classification.multitask_classifier.MultitaskClassifier
A slice-aware classifier that supports training + scoring on slice labels.
NOTE: This model currently only supports binary classification.
- Parameters
base_architecture (
Module
) – A network architecture that accepts input data and outputs a representationhead_dim (
int
) – Output feature dimension of the base_architecture, and input dimension of the internal prediction head:nn.Linear(head_dim, 2)
.slice_names (
List
[str
]) – A list of slice names that the model will accept initialize as tasks and accept as corresponding labelsscorer (
Scorer
) – A Scorer to be used for initialization of theMultitaskClassifier
superclass.**multitask_kwargs – Arbitrary keyword arguments to be passed to the
MultitaskClassifier
superclass.
-
base_task
[source]¶ A base
snorkel.classification.Task
that the model will learn. This becomes amaster_head_module
that combines slice tasks information. For more, seesnorkel.slicing.convert_to_slice_tasks
.
-
__init__
(base_architecture, head_dim, slice_names, input_data_key='input_data', task_name='task', scorer=<snorkel.analysis.scorer.Scorer object>, **multitask_kwargs)[source]¶ Initializes internal Module state, shared by both nn.Module and ScriptModule.
- Return type
None
Methods
__init__
(base_architecture, head_dim, …[, …])Initializes internal Module state, shared by both nn.Module and ScriptModule.
add_module
(name, module)Adds a child module to the current module.
add_task
(task)Add a single task to the network.
apply
(fn)Applies
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Returns an iterator over module buffers.
calculate_loss
(X_dict, Y_dict)Calculate the loss for each task and the number of data points contributing.
children
()Returns an iterator over immediate children modules.
cpu
()Moves all model parameters and buffers to the CPU.
cuda
([device])Moves all model parameters and buffers to the GPU.
double
()Casts all floating point parameters and buffers to
double
datatype.eval
()Sets the module in evaluation mode.
extra_repr
()Set the extra representation of the module
float
()Casts all floating point parameters and buffers to float datatype.
forward
(X_dict, task_names)Do a forward pass through the network for all specified tasks.
half
()Casts all floating point parameters and buffers to
half
datatype.load
(model_path)Load a saved model from the provided file path and moves it to a device.
load_state_dict
(state_dict[, strict])Copies parameters and buffers from
state_dict
into this module and its descendants.make_slice_dataloader
(dataset, S, …)Create DictDataLoader with slice labels, initialized from specified dataset.
modules
()Returns an iterator over all modules in the network.
named_buffers
([prefix, recurse])Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix])Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse])Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Returns an iterator over module parameters.
predict
(dataloader[, return_preds, remap_labels])Calculate probabilities, (optionally) predictions, and pull out gold labels.
register_backward_hook
(hook)Registers a backward hook on the module.
register_buffer
(name, tensor[, persistent])Adds a buffer to the module.
register_forward_hook
(hook)Registers a forward hook on the module.
register_forward_pre_hook
(hook)Registers a forward pre-hook on the module.
register_full_backward_hook
(hook)Registers a backward hook on the module.
register_parameter
(name, param)Adds a parameter to the module.
requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
save
(model_path)Save the model to the specified file path.
score
(dataloaders[, remap_labels, as_dataframe])Calculate scores for the provided DictDataLoaders.
score_slices
(dataloaders[, as_dataframe])Scores appropriate slice labels using the overall prediction head.
share_memory
()- rtype
~T
state_dict
([destination, prefix, keep_vars])Returns a dictionary containing a whole state of the module.
to
(*args, **kwargs)Moves and/or casts the parameters and buffers.
train
([mode])Sets the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Moves all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Sets gradients of all model parameters to zero.
Attributes
T_destination
dump_patches
-
add_task
(task)[source]¶ Add a single task to the network.
- Parameters
task (
Task
) – ATask
to add- Return type
None
-
calculate_loss
(X_dict, Y_dict)[source]¶ Calculate the loss for each task and the number of data points contributing.
- Parameters
X_dict (
Dict
[str
,Any
]) – A dict of data fieldsY_dict (
Dict
[str
,Tensor
]) – A dict from task names to label sets
- Returns
A dict of losses by task name and seen examples by task name
- Return type
Dict[str, torch.Tensor], Dict[str, float]
-
forward
(X_dict, task_names)[source]¶ Do a forward pass through the network for all specified tasks.
- Parameters
X_dict (
Dict
[str
,Any
]) – A dict of data fieldstask_names (
Iterable
[str
]) – The names of the tasks to execute the forward pass for
- Returns
A dict mapping each operation name to its corresponding output
- Return type
OutputDict
- Raises
TypeError – If an Operation input has an invalid type
ValueError – If a specified Operation failed to execute
-
load
(model_path)[source]¶ Load a saved model from the provided file path and moves it to a device.
- Parameters
model_path (
str
) – The path to a saved model- Return type
None
-
make_slice_dataloader
(dataset, S, **dataloader_kwargs)[source]¶ Create DictDataLoader with slice labels, initialized from specified dataset.
- Parameters
dataset (
DictDataset
) – A DictDataset that will be converted into a slice-aware dataloaderS (
recarray
) – A [num_examples, num_slices] slice matrix indicating whether each example is in every sliceslice_names – A list of slice names corresponding to columns of
S
dataloader_kwargs (
Any
) – Arbitrary kwargs to be passed to DictDataLoader SeeDictDataLoader.__init__
.
- Return type
DictDataLoader
-
predict
(dataloader, return_preds=False, remap_labels={})[source]¶ Calculate probabilities, (optionally) predictions, and pull out gold labels.
- Parameters
dataloader (
DictDataLoader
) – A DictDataLoader to make predictions forreturn_preds (
bool
) – If True, include predictions in the return dict (not just probabilities)remap_labels (
Dict
[str
,Optional
[str
]]) – A dict specifying which labels in the dataset’s Y_dict (key) to remap to a new task (value)
- Returns
A dictionary mapping label type (‘golds’, ‘probs’, ‘preds’) to values
- Return type
Dict[str, Dict[str, torch.Tensor]]
-
save
(model_path)[source]¶ Save the model to the specified file path.
- Parameters
model_path (
str
) – The path where the model should be saved- Raises
BaseException – If the torch.save() method fails
- Return type
None
-
score
(dataloaders, remap_labels={}, as_dataframe=False)[source]¶ Calculate scores for the provided DictDataLoaders.
- Parameters
dataloaders (
List
[DictDataLoader
]) – A list of DictDataLoaders to calculate scores forremap_labels (
Dict
[str
,Optional
[str
]]) – A dict specifying which labels in the dataset’s Y_dict (key) to remap to a new task (value)as_dataframe (
bool
) – A boolean indicating whether to return results as pandas DataFrame (True) or dict (False)
- Returns
A dictionary mapping metric names to corresponding scores Metric names will be of the form “task/dataset/split/metric”
- Return type
Dict[str, float]
-
score_slices
(dataloaders, as_dataframe=False)[source]¶ Scores appropriate slice labels using the overall prediction head.
In other words, uses
base_task
(NOTslice_tasks
) to evaluate slices.In practice, we’d like to use a final prediction from a _single_ task head. To do so,
self.base_task
leverages reweighted slice representation to make a prediction. In this method, we remap all slice-specificpred
labels toself.base_task
for evaluation.- Parameters
dataloaders (
List
[DictDataLoader
]) – A list of DictDataLoaders to calculate scores foras_dataframe (
bool
) – A boolean indicating whether to return results as pandas DataFrame (True) or dict (False)eval_slices_on_base_task – A boolean indicating whether to remap slice labels to base task. Otherwise, keeps evaluation of slice labels on slice-specific heads.
- Returns
A dictionary mapping metric¡ names to corresponding scores Metric names will be of the form “task/dataset/split/metric”
- Return type
Dict[str, float]