snorkel.labeling.PandasLFApplier

class snorkel.labeling.PandasLFApplier(lfs)[source]

Bases: snorkel.labeling.apply.core.BaseLFApplier

LF applier for a Pandas DataFrame.

Data points are stored as Series in a DataFrame. The LFs are executed via a pandas.DataFrame.apply call, which is single-process and can be slow for large DataFrames. For large datasets, consider DaskLFApplier or SparkLFApplier.

Parameters

lfs (List[LabelingFunction]) – LFs that this applier executes on examples

Example

>>> from snorkel.labeling import labeling_function
>>> @labeling_function()
... def is_big_num(x):
...     return 1 if x.num > 42 else 0
>>> applier = PandasLFApplier([is_big_num])
>>> applier.apply(pd.DataFrame(dict(num=[10, 100], text=["hello", "hi"])))
array([[0], [1]])
__init__(lfs)[source]

Initialize self. See help(type(self)) for accurate signature.

Return type

None

Methods

__init__(lfs)

Initialize self.

apply(df[, progress_bar, fault_tolerant, …])

Label Pandas DataFrame of data points with LFs.

apply(df, progress_bar=True, fault_tolerant=False, return_meta=False)[source]

Label Pandas DataFrame of data points with LFs.

Parameters
  • df (DataFrame) – Pandas DataFrame containing data points to be labeled by LFs

  • progress_bar (bool) – Display a progress bar?

  • fault_tolerant (bool) – Output -1 if LF execution fails?

  • return_meta (bool) – Return metadata from apply call?

Return type

Union[ndarray, Tuple[ndarray, ApplierMetadata]]

Returns

  • np.ndarray – Matrix of labels emitted by LFs

  • ApplierMetadata – Metadata, such as fault counts, for the apply call