snorkel.labeling.lf.nlp.NLPLabelingFunction

class snorkel.labeling.lf.nlp.NLPLabelingFunction(name, f, resources=None, pre=None, text_field='text', doc_field='doc', language='en_core_web_sm', disable=None, memoize=True, memoize_key=None, gpu=False)[source]

Bases: snorkel.labeling.lf.nlp.BaseNLPLabelingFunction

Special labeling function type for spaCy-based LFs.

This class is a special version of LabelingFunction. It has a SpacyPreprocessor integrated which shares a cache with all other NLPLabelingFunction instances. This makes it easy to define LFs that have a text input field and have logic written over spaCy Doc objects. Examples passed into an NLPLabelingFunction will have a new field which can be accessed which contains a spaCy Doc. By default, this field is called doc. A Doc object is a sequence of Token objects, which contain information on lemmatization, parts-of-speech, etc. Doc objects also contain fields like Doc.ents, a list of named entities, and Doc.noun_chunks, a list of noun phrases. For details of spaCy Doc objects and a full attribute listing, see https://spacy.io/api/doc.

Simple NLPLabelingFunctions can be defined via a decorator. See nlp_labeling_function.

Parameters
  • name (str) – Name of the LF

  • f (Callable[…, int]) – Function that implements the core LF logic

  • resources (Optional[Mapping[str, Any]]) – Labeling resources passed in to f via kwargs

  • pre (Optional[List[BaseMapper]]) – Preprocessors to run before SpacyPreprocessor is executed

  • text_field (str) – Name of data point text field to input

  • doc_field (str) – Name of data point field to output parsed document to

  • language (str) – spaCy model to load See https://spacy.io/usage/models#usage

  • disable (Optional[List[str]]) – List of pipeline components to disable See https://spacy.io/usage/processing-pipelines#disabling

  • memoize (bool) – Memoize preprocessor outputs?

  • memoize_key (Optional[Callable[[Any], Hashable]]) – Hashing function to handle the memoization (default to snorkel.map.core.get_hashable)

  • gpu (bool) – Prefer Spacy GPU processing?

Raises

ValueError – Calling incorrectly defined preprocessors

Example

>>> def f(x):
...     person_ents = [ent for ent in x.doc.ents if ent.label_ == "PERSON"]
...     return 0 if len(person_ents) > 0 else -1
>>> has_person_mention = NLPLabelingFunction(name="has_person_mention", f=f)
>>> has_person_mention
NLPLabelingFunction has_person_mention, Preprocessors: [SpacyPreprocessor...]
>>> from types import SimpleNamespace
>>> x = SimpleNamespace(text="The movie was good.")
>>> has_person_mention(x)
-1
name[source]

See above

__init__(name, f, resources=None, pre=None, text_field='text', doc_field='doc', language='en_core_web_sm', disable=None, memoize=True, memoize_key=None, gpu=False)[source]

Initialize self. See help(type(self)) for accurate signature.

Return type

None

Methods

__init__(name, f[, resources, pre, …])

Initialize self.

__call__(x)[source]

Label data point.

Runs all preprocessors, then passes preprocessed data point to LF.

Parameters

x (Any) – Data point to label

Returns

Label for data point

Return type

int