class snorkel.slicing.sf.nlp.NLPSlicingFunction(name, f, resources=None, pre=None, text_field='text', doc_field='doc', language='en_core_web_sm', disable=None, memoize=True)[source]

Bases: snorkel.labeling.lf.nlp.BaseNLPLabelingFunction

Special labeling function type for spaCy-based LFs.

This class is a special version of LabelingFunction. It has a SpacyPreprocessor integrated which shares a cache with all other NLPLabelingFunction instances. This makes it easy to define LFs that have a text input field and have logic written over spaCy Doc objects. Examples passed into an NLPLabelingFunction will have a new field which can be accessed which contains a spaCy Doc. By default, this field is called doc. A Doc object is a sequence of Token objects, which contain information on lemmatization, parts-of-speech, etc. Doc objects also contain fields like Doc.ents, a list of named entities, and Doc.noun_chunks, a list of noun phrases. For details of spaCy Doc objects and a full attribute listing, see

Simple NLPLabelingFunctions can be defined via a decorator. See nlp_labeling_function.

  • name (str) – Name of the LF

  • f (Callable[…, int]) – Function that implements the core LF logic

  • resources (Optional[Mapping[str, Any]]) – Labeling resources passed in to f via kwargs

  • pre (Optional[List[BaseMapper]]) – Preprocessors to run before SpacyPreprocessor is executed

  • text_field (str) – Name of data point text field to input

  • doc_field (str) – Name of data point field to output parsed document to

  • language (str) – spaCy model to load See

  • disable (Optional[List[str]]) – List of pipeline components to disable See

  • memoize (bool) – Memoize preprocessor outputs?


ValueError – Calling incorrectly defined preprocessors


>>> def f(x):
...     person_ents = [ent for ent in x.doc.ents if ent.label_ == "PERSON"]
...     return len(person_ents) > 0
>>> has_person_mention = NLPSlicingFunction(name="has_person_mention", f=f)
>>> has_person_mention
NLPSlicingFunction has_person_mention, Preprocessors: [SpacyPreprocessor...]
>>> from types import SimpleNamespace
>>> x = SimpleNamespace(text="The movie was good.")
>>> has_person_mention(x)

See above

__init__(name, f, resources=None, pre=None, text_field='text', doc_field='doc', language='en_core_web_sm', disable=None, memoize=True)[source]

Initialize self. See help(type(self)) for accurate signature.

Return type



__init__(name, f[, resources, pre, …])

Initialize self.


Label data point.

Runs all preprocessors, then passes preprocessed data point to LF.


x (Any) – Data point to label


Label for data point

Return type