snorkel.labeling.lf.nlp_spark.spark_nlp_labeling_function¶
-
class
snorkel.labeling.lf.nlp_spark.
spark_nlp_labeling_function
(name=None, resources=None, pre=None, text_field='text', doc_field='doc', language='en_core_web_sm', disable=None, memoize=True, memoize_key=None, gpu=False)[source]¶ Bases:
snorkel.labeling.lf.nlp.base_nlp_labeling_function
Decorator to define a SparkNLPLabelingFunction object from a function.
- Parameters
name (
Optional
[str
]) – Name of the LFresources (
Optional
[Mapping
[str
,Any
]]) – Labeling resources passed in tof
viakwargs
pre (
Optional
[List
[BaseMapper
]]) – Preprocessors to run before SpacyPreprocessor is executedtext_field (
str
) – Name of data point text field to inputdoc_field (
str
) – Name of data point field to output parsed document tolanguage (
str
) – SpaCy model to load See https://spacy.io/usage/models#usagedisable (
Optional
[List
[str
]]) – List of pipeline components to disable See https://spacy.io/usage/processing-pipelines#disablingmemoize (
bool
) – Memoize preprocessor outputs?memoize_key (
Optional
[Callable
[[Any
],Hashable
]]) – Hashing function to handle the memoization (default to snorkel.map.core.get_hashable)
Example
>>> @spark_nlp_labeling_function() ... def has_person_mention(x): ... person_ents = [ent for ent in x.doc.ents if ent.label_ == "PERSON"] ... return 0 if len(person_ents) > 0 else -1 >>> has_person_mention SparkNLPLabelingFunction has_person_mention, Preprocessors: [SpacyPreprocessor...]
>>> from pyspark.sql import Row >>> x = Row(text="The movie was good.") >>> has_person_mention(x) -1
-
__init__
(name=None, resources=None, pre=None, text_field='text', doc_field='doc', language='en_core_web_sm', disable=None, memoize=True, memoize_key=None, gpu=False)[source]¶ Initialize self. See help(type(self)) for accurate signature.
- Return type
None
Methods
__init__
([name, resources, pre, text_field, …])Initialize self.