snorkel.slicing.apply.dask.PandasParallelSFApplier

class snorkel.slicing.apply.dask.PandasParallelSFApplier(lfs)[source]

Bases: snorkel.labeling.apply.dask.PandasParallelLFApplier

Parallel SF applier for a Pandas DataFrame.

See snorkel.labeling.apply.dask.PandasParallelLFApplier for details.

__init__(lfs)[source]

Initialize self. See help(type(self)) for accurate signature.

Return type

None

Methods

__init__(lfs)

Initialize self.

apply(df[, n_parallel, scheduler])

Label Pandas DataFrame of data points with LFs in parallel using Dask.

apply(df, n_parallel=2, scheduler='processes')[source]

Label Pandas DataFrame of data points with LFs in parallel using Dask.

Parameters
  • df (DataFrame) – Pandas DataFrame containing data points to be labeled by LFs

  • n_parallel (int) – Parallelism level for LF application. Corresponds to npartitions in constructed Dask DataFrame. For scheduler="processes", number of processes launched. Recommended to be no more than the number of cores on the running machine.

  • scheduler (Union[str, dask.distributed.Client]) – A Dask scheduling configuration: either a string option or a Client. For more information, see https://docs.dask.org/en/stable/scheduling.html#

Returns

Matrix of labels emitted by LFs

Return type

np.ndarray