snorkel.labeling.apply.dask.PandasParallelLFApplier

class snorkel.labeling.apply.dask.PandasParallelLFApplier(lfs)[source]

Bases: snorkel.labeling.apply.dask.DaskLFApplier

Parallel LF applier for a Pandas DataFrame.

Creates a Dask DataFrame from a Pandas DataFrame, then uses DaskLFApplier to label data in parallel. See DaskLFApplier.

__init__(lfs)[source]

Initialize self. See help(type(self)) for accurate signature.

Return type

None

Methods

__init__(lfs)

Initialize self.

apply(df[, n_parallel, scheduler, …])

Label Pandas DataFrame of data points with LFs in parallel using Dask.

apply(df, n_parallel=2, scheduler='processes', fault_tolerant=False)[source]

Label Pandas DataFrame of data points with LFs in parallel using Dask.

Parameters
  • df (DataFrame) – Pandas DataFrame containing data points to be labeled by LFs

  • n_parallel (int) – Parallelism level for LF application. Corresponds to npartitions in constructed Dask DataFrame. For scheduler="processes", number of processes launched. Recommended to be no more than the number of cores on the running machine.

  • scheduler (Union[str, dask.distributed.Client]) – A Dask scheduling configuration: either a string option or a Client. For more information, see https://docs.dask.org/en/stable/scheduling.html#

  • fault_tolerant (bool) – Output -1 if LF execution fails?

Returns

Matrix of labels emitted by LFs

Return type

np.ndarray