snorkel.slicing.sf.nlp.NLPSlicingFunction¶
-
class
snorkel.slicing.sf.nlp.
NLPSlicingFunction
(name, f, resources=None, pre=None, text_field='text', doc_field='doc', language='en_core_web_sm', disable=None, memoize=True, memoize_key=None, gpu=False)[source]¶ Bases:
snorkel.labeling.lf.nlp.BaseNLPLabelingFunction
Special labeling function type for spaCy-based LFs.
This class is a special version of
LabelingFunction
. It has aSpacyPreprocessor
integrated which shares a cache with all otherNLPLabelingFunction
instances. This makes it easy to define LFs that have a text input field and have logic written over spaCyDoc
objects. Examples passed into anNLPLabelingFunction
will have a new field which can be accessed which contains a spaCyDoc
. By default, this field is calleddoc
. ADoc
object is a sequence ofToken
objects, which contain information on lemmatization, parts-of-speech, etc.Doc
objects also contain fields likeDoc.ents
, a list of named entities, andDoc.noun_chunks
, a list of noun phrases. For details of spaCyDoc
objects and a full attribute listing, see https://spacy.io/api/doc.Simple
NLPLabelingFunction
s can be defined via a decorator. Seenlp_labeling_function
.- Parameters
name (
str
) – Name of the LFf (
Callable
[…,int
]) – Function that implements the core LF logicresources (
Optional
[Mapping
[str
,Any
]]) – Labeling resources passed in tof
viakwargs
pre (
Optional
[List
[BaseMapper
]]) – Preprocessors to run before SpacyPreprocessor is executedtext_field (
str
) – Name of data point text field to inputdoc_field (
str
) – Name of data point field to output parsed document tolanguage (
str
) – spaCy model to load See https://spacy.io/usage/models#usagedisable (
Optional
[List
[str
]]) – List of pipeline components to disable See https://spacy.io/usage/processing-pipelines#disablingmemoize (
bool
) – Memoize preprocessor outputs?memoize_key (
Optional
[Callable
[[Any
],Hashable
]]) – Hashing function to handle the memoization (default to snorkel.map.core.get_hashable)
- Raises
ValueError – Calling incorrectly defined preprocessors
Example
>>> def f(x): ... person_ents = [ent for ent in x.doc.ents if ent.label_ == "PERSON"] ... return len(person_ents) > 0 >>> has_person_mention = NLPSlicingFunction(name="has_person_mention", f=f) >>> has_person_mention NLPSlicingFunction has_person_mention, Preprocessors: [SpacyPreprocessor...]
>>> from types import SimpleNamespace >>> x = SimpleNamespace(text="The movie was good.") >>> has_person_mention(x) False
-
__init__
(name, f, resources=None, pre=None, text_field='text', doc_field='doc', language='en_core_web_sm', disable=None, memoize=True, memoize_key=None, gpu=False)[source]¶ Initialize self. See help(type(self)) for accurate signature.
- Return type
None
Methods
__init__
(name, f[, resources, pre, …])Initialize self.