NLPSlicingFunction(name, f, resources=None, pre=None, fault_tolerant=False, text_field='text', doc_field='doc', language='en_core_web_sm', disable=None, memoize=True)¶
Special labeling function type for spaCy-based LFs.
This class is a special version of
LabelingFunction. It has a
SpacyPreprocessorintegrated which shares a cache with all other
NLPLabelingFunctioninstances. This makes it easy to define LFs that have a text input field and have logic written over spaCy
Docobjects. Examples passed into an
NLPLabelingFunctionwill have a new field which can be accessed which contains a spaCy
Doc. By default, this field is called
Docobject is a sequence of
Tokenobjects, which contain information on lemmatization, parts-of-speech, etc.
Docobjects also contain fields like
Doc.ents, a list of named entities, and
Doc.noun_chunks, a list of noun phrases. For details of spaCy
Docobjects and a full attribute listing, see https://spacy.io/api/doc.
NLPLabelingFunctions can be defined via a decorator. See
str) – Name of the LF
int]) – Function that implements the core LF logic
Any]]) – Labeling resources passed in to
BaseMapper]]) – Preprocessors to run before SpacyPreprocessor is executed
bool) – Output -1 if LF execution fails?
str) – Name of data point text field to input
str) – Name of data point field to output parsed document to
str) – spaCy model to load See https://spacy.io/usage/models#usage
str]]) – List of pipeline components to disable See https://spacy.io/usage/processing-pipelines#disabling
bool) – Memoize preprocessor outputs?
ValueError – Calling incorrectly defined preprocessors
>>> def f(x): ... person_ents = [ent for ent in x.doc.ents if ent.label_ == "PERSON"] ... return len(person_ents) > 0 >>> has_person_mention = NLPSlicingFunction(name="has_person_mention", f=f) >>> has_person_mention NLPSlicingFunction has_person_mention, Preprocessors: [SpacyPreprocessor...]
>>> from types import SimpleNamespace >>> x = SimpleNamespace(text="The movie was good.") >>> has_person_mention(x) False
__init__(name, f, resources=None, pre=None, fault_tolerant=False, text_field='text', doc_field='doc', language='en_core_web_sm', disable=None, memoize=True)¶
Initialize self. See help(type(self)) for accurate signature.
- Return type
__init__(name, f[, resources, pre, …])
Label data point.
Runs all preprocessors, then passes to LF. If an exception is encountered and the LF is in fault tolerant mode, the LF abstains from voting.
Any) – Data point to label
Label for data point
- Return type