snorkel.augmentation.PandasTFApplier

class snorkel.augmentation.PandasTFApplier(tfs, policy)[source]

Bases: snorkel.augmentation.apply.core.BaseTFApplier

TF applier for a Pandas DataFrame.

Data points are stored as Series in a DataFrame. The TFs run on data points obtained via a pandas.DataFrame.iterrows call, which is single-process and can be slow for large DataFrames. For large datasets, consider DaskTFApplier or SparkTFApplier.

__init__(tfs, policy)[source]

Initialize self. See help(type(self)) for accurate signature.

Return type

None

Methods

__init__(tfs, policy)

Initialize self.

apply(df[, progress_bar])

Augment a Pandas DataFrame of data points using TFs and policy.

apply_generator(df, batch_size)

Augment a Pandas DataFrame of data points using TFs and policy in batches.

apply(df, progress_bar=True)[source]

Augment a Pandas DataFrame of data points using TFs and policy.

Parameters
  • df (DataFrame) – Pandas DataFrame containing data points to be transformed

  • progress_bar (bool) – Display a progress bar?

Returns

Pandas DataFrame of data points in augmented data set

Return type

pd.DataFrame

apply_generator(df, batch_size)[source]

Augment a Pandas DataFrame of data points using TFs and policy in batches.

This method acts as a generator, yielding augmented data points for a given input batch of data points. This can be useful in a training loop when it is too memory-intensive to pregenerate all transformed examples.

Parameters
  • df (DataFrame) – Pandas DataFrame containing data points to be transformed

  • batch_size (int) – Batch size for generator. Yields augmented data points for the next batch_size input data points.

Returns

Pandas DataFrame of data points in augmented data set

Return type

pd.DataFrame