snorkel.preprocess.Preprocessor

class snorkel.preprocess.Preprocessor(name, field_names=None, mapped_field_names=None, pre=None, memoize=False, memoize_key=None)[source]

Bases: snorkel.map.core.Mapper

Base class for preprocessors.

See snorkel.map.core.Mapper for details.

__init__(name, field_names=None, mapped_field_names=None, pre=None, memoize=False, memoize_key=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Return type

None

Methods

__init__(name[, field_names, …])

Initialize self.

reset_cache()

Reset the memoization cache.

run(**kwargs)

Run the mapping operation using the input fields.

__call__(x)[source]

Run mapping function on input data point.

Deep copies the data point first so as not to make accidental in-place changes. If memoize is set to True, an internal cache is checked for results. If no cached results are found, the computed results are added to the cache.

Parameters

x (Any) – Data point to run mapping function on

Returns

Mapped data point of same format but possibly different fields

Return type

DataPoint

reset_cache()[source]

Reset the memoization cache.

Return type

None

run(**kwargs)[source]

Run the mapping operation using the input fields.

The inputs to this function are fed by extracting the fields of the input data point using the keys of field_names. The output field names are converted using mapped_field_names and added to the data point.

Returns

A mapping from canonical output field names to their values.

Return type

Optional[FieldMap]

Raises

NotImplementedError – Subclasses must implement this method