snorkel.classification.DictDataset¶
-
class
snorkel.classification.DictDataset(name, split, X_dict, Y_dict)[source]¶ Bases:
torch.utils.data.dataset.DatasetA dataset where both the data fields and labels are stored in as dictionaries.
- Parameters
name (
str) – The name of the dataset (e.g., this will be used to report metrics on a per-dataset basis)split (
str) – The name of the split that the data in this object representsX_dict (
Dict[str,Any]) – A map from field name to values (e.g., {“tokens”: …, “uids”: …})Y_dict (
Dict[str,Tensor]) – A map from task name to its corresponding set of labels
- Raises
ValueError – All values in the
Y_dictmust be of type torch.Tensor
-
__init__(name, split, X_dict, Y_dict)[source]¶ Initialize self. See help(type(self)) for accurate signature.
- Return type
None
Methods
__init__(name, split, X_dict, Y_dict)Initialize self.
from_tensors(X_tensor, Y_tensor, split[, …])Initialize a
DictDatasetfrom PyTorch Tensors.-
classmethod
from_tensors(X_tensor, Y_tensor, split, input_data_key='input_data', task_name='task', dataset_name='SnorkelDataset')[source]¶ Initialize a
DictDatasetfrom PyTorch Tensors.- Parameters
X_tensor (
Tensor) – Input data of shape [num_examples, feature_dim]Y_tensor (
Tensor) – Labels of shape [num_samples, num_classes]split (
str) – Name of data split corresponding to this dataset.input_data_key (
str) – Name of data field to initialize inX_dicttask_name (
str) – Name of task and corresponding label key inY_dictdataset_name (
str) – Name of DictDataset to be initialized; See__init__above.
- Returns
Class initialized with single task and label corresponding to input data
- Return type