snorkel.labeling.apply.spark.SparkLFApplier¶
-
class
snorkel.labeling.apply.spark.SparkLFApplier(lfs)[source]¶ Bases:
snorkel.labeling.apply.core.BaseLFApplierLF applier for a Spark RDD.
Data points are stored as
Rows in an RDD, and a Sparkmapjob is submitted to execute the LFs. A common way to obtain an RDD is via a PySpark DataFrame. For an example usage with AWS EMR instructions, seetest/labeling/apply/lf_applier_spark_test_script.py.-
__init__(lfs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
- Return type
None
Methods
__init__(lfs)Initialize self.
apply(data_points)Label PySpark RDD of data points with LFs.
-