snorkel.labeling.apply.spark.SparkLFApplier¶
-
class
snorkel.labeling.apply.spark.
SparkLFApplier
(lfs)[source]¶ Bases:
snorkel.labeling.apply.core.BaseLFApplier
LF applier for a Spark RDD.
Data points are stored as
Row
s in an RDD, and a Sparkmap
job is submitted to execute the LFs. A common way to obtain an RDD is via a PySpark DataFrame. For an example usage with AWS EMR instructions, seetest/labeling/apply/lf_applier_spark_test_script.py
.-
__init__
(lfs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
- Return type
None
Methods
__init__
(lfs)Initialize self.
apply
(data_points[, fault_tolerant])Label PySpark RDD of data points with LFs.
-
apply
(data_points, fault_tolerant=False)[source]¶ Label PySpark RDD of data points with LFs.
- Parameters
data_points (pyspark.RDD) – PySpark RDD containing data points to be labeled by LFs
fault_tolerant (
bool
) – Output-1
if LF execution fails?
- Returns
Matrix of labels emitted by LFs
- Return type
np.ndarray
-