snorkel.augmentation.PandasTFApplier¶
-
class
snorkel.augmentation.PandasTFApplier(tfs, policy)[source]¶ Bases:
snorkel.augmentation.apply.core.BaseTFApplierTF applier for a Pandas DataFrame.
Data points are stored as Series in a DataFrame. The TFs run on data points obtained via a
pandas.DataFrame.iterrowscall, which is single-process and can be slow for large DataFrames. For large datasets, considerDaskTFApplierorSparkTFApplier.-
__init__(tfs, policy)[source]¶ Initialize self. See help(type(self)) for accurate signature.
- Return type
None
Methods
__init__(tfs, policy)Initialize self.
apply(df[, progress_bar])Augment a Pandas DataFrame of data points using TFs and policy.
apply_generator(df, batch_size)Augment a Pandas DataFrame of data points using TFs and policy in batches.
-
apply(df, progress_bar=True)[source]¶ Augment a Pandas DataFrame of data points using TFs and policy.
- Parameters
df (
DataFrame) – Pandas DataFrame containing data points to be transformedprogress_bar (
bool) – Display a progress bar?
- Returns
Pandas DataFrame of data points in augmented data set
- Return type
pd.DataFrame
-
apply_generator(df, batch_size)[source]¶ Augment a Pandas DataFrame of data points using TFs and policy in batches.
This method acts as a generator, yielding augmented data points for a given input batch of data points. This can be useful in a training loop when it is too memory-intensive to pregenerate all transformed examples.
- Parameters
df (
DataFrame) – Pandas DataFrame containing data points to be transformedbatch_size (
int) – Batch size for generator. Yields augmented data points for the nextbatch_sizeinput data points.
- Returns
Pandas DataFrame of data points in augmented data set
- Return type
pd.DataFrame
-