snorkel.slicing.apply.dask.PandasParallelSFApplier¶
-
class
snorkel.slicing.apply.dask.
PandasParallelSFApplier
(lfs)[source]¶ Bases:
snorkel.labeling.apply.dask.PandasParallelLFApplier
Parallel SF applier for a Pandas DataFrame.
See
snorkel.labeling.apply.dask.PandasParallelLFApplier
for details.-
__init__
(lfs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
- Return type
None
Methods
__init__
(lfs)Initialize self.
apply
(df[, n_parallel, scheduler, …])Label Pandas DataFrame of data points with LFs in parallel using Dask.
-
apply
(df, n_parallel=2, scheduler='processes', fault_tolerant=False)[source]¶ Label Pandas DataFrame of data points with LFs in parallel using Dask.
- Parameters
df (
DataFrame
) – Pandas DataFrame containing data points to be labeled by LFsn_parallel (
int
) – Parallelism level for LF application. Corresponds tonpartitions
in constructed Dask DataFrame. Forscheduler="processes"
, number of processes launched. Recommended to be no more than the number of cores on the running machine.scheduler (
Union
[str
, dask.distributed.Client]) – A Dask scheduling configuration: either a string option or aClient
. For more information, see https://docs.dask.org/en/stable/scheduling.html#fault_tolerant (
bool
) – Output-1
if LF execution fails?
- Returns
Matrix of labels emitted by LFs
- Return type
np.ndarray
-