snorkel.labeling.apply.dask.PandasParallelLFApplier¶
-
class
snorkel.labeling.apply.dask.PandasParallelLFApplier(lfs)[source]¶ Bases:
snorkel.labeling.apply.dask.DaskLFApplierParallel LF applier for a Pandas DataFrame.
Creates a Dask DataFrame from a Pandas DataFrame, then uses
DaskLFApplierto label data in parallel. SeeDaskLFApplier.-
__init__(lfs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
- Return type
None
Methods
__init__(lfs)Initialize self.
apply(df[, n_parallel, scheduler, …])Label Pandas DataFrame of data points with LFs in parallel using Dask.
-
apply(df, n_parallel=2, scheduler='processes', fault_tolerant=False)[source]¶ Label Pandas DataFrame of data points with LFs in parallel using Dask.
- Parameters
df (
DataFrame) – Pandas DataFrame containing data points to be labeled by LFsn_parallel (
int) – Parallelism level for LF application. Corresponds tonpartitionsin constructed Dask DataFrame. Forscheduler="processes", number of processes launched. Recommended to be no more than the number of cores on the running machine.scheduler (
Union[str, dask.distributed.Client]) – A Dask scheduling configuration: either a string option or aClient. For more information, see https://docs.dask.org/en/stable/scheduling.html#fault_tolerant (
bool) – Output-1if LF execution fails?
- Returns
Matrix of labels emitted by LFs
- Return type
np.ndarray
-