snorkel.labeling.apply.spark.SparkLFApplier¶
-
class
snorkel.labeling.apply.spark.SparkLFApplier(lfs)[source]¶ Bases:
snorkel.labeling.apply.core.BaseLFApplierLF applier for a Spark RDD.
Data points are stored as
Rows in an RDD, and a Sparkmapjob is submitted to execute the LFs. A common way to obtain an RDD is via a PySpark DataFrame. For an example usage with AWS EMR instructions, seetest/labeling/apply/lf_applier_spark_test_script.py.-
__init__(lfs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
- Return type
None
Methods
__init__(lfs)Initialize self.
apply(data_points[, fault_tolerant])Label PySpark RDD of data points with LFs.
-
apply(data_points, fault_tolerant=False)[source]¶ Label PySpark RDD of data points with LFs.
- Parameters
data_points (pyspark.RDD) – PySpark RDD containing data points to be labeled by LFs
fault_tolerant (
bool) – Output-1if LF execution fails?
- Returns
Matrix of labels emitted by LFs
- Return type
np.ndarray
-