Learning¶
In the Snorkel pipeline, the user writes labeling functions (LFs) and then uses the generative model to unify and denoise their labels. Then, the marginal predictions of this model are used as probabilistic training labels for the discriminative model. Currently we provide bindings for TensorFlow models, and two basic models: Logistic regression and an LSTM. See tutorials for a more indepth explanation.
Base Classifier Class¶

class
snorkel.learning.classifier.
Classifier
(cardinality=2, name=None, seed=None)[source]¶ Simple abstract base class for a probabilistic classifier.

error_analysis
(session, X_test, Y_test, gold_candidate_set=None, b=0.5, set_unlabeled_as_neg=True, display=True, scorer=<class 'snorkel.learning.utils.MentionScorer'>, **kwargs)[source]¶ Prints full score analysis using the Scorer class, and then returns the a tuple of sets conatining the test candidates bucketed for error analysis, i.e.:
 For binary: TP, FP, TN, FN
 For categorical: correct, incorrect
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 gold_candidate_set – Full set of TPs in the test set
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting
 display – Print score report
 scorer – The Scorer subclass to use

predictions
(X, b=0.5)[source]¶ Return numpy array of elements in {1,0,1} based on predicted marginal probabilities.

representation
= False¶

save_marginals
(session, X, training=False)[source]¶ Save the predicted marginal probabilities for the Candidates X.

score
(X_test, Y_test, b=0.5, set_unlabeled_as_neg=True)[source]¶  Returns the summary scores:
 For binary: precision, recall, F1 score
 For categorical: accuracy
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting.
Note: Unlike in self.error_analysis, this method assumes X_test and Y_test are properly collated!

Generative Model¶

class
snorkel.learning.gen_learning.
GenerativeModel
(class_prior=False, lf_prior=False, lf_propensity=False, lf_class_propensity=False, seed=271828, name=None)[source]¶ A generative model for data programming for binary classification.
Supports dependencies among labeling functions.
Parameters:  class_prior – whether to include class label prior factors
 lf_prior – whether to include labeling function prior factors
 lf_propensity – whether to include labeling function propensity factors
 lf_class_propensity – whether to include classspecific labeling function propensity factors
 seed – seed for initializing state of Numbskull variables

dep_names
= ('dep_similar', 'dep_fixing', 'dep_reinforcing', 'dep_exclusive')¶

error_analysis
(session, X_test, Y_test, gold_candidate_set=None, b=0.5, set_unlabeled_as_neg=True, display=True, scorer=<class 'snorkel.learning.utils.MentionScorer'>, **kwargs)¶ Prints full score analysis using the Scorer class, and then returns the a tuple of sets conatining the test candidates bucketed for error analysis, i.e.:
 For binary: TP, FP, TN, FN
 For categorical: correct, incorrect
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 gold_candidate_set – Full set of TPs in the test set
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting
 display – Print score report
 scorer – The Scorer subclass to use

learned_lf_stats
()[source]¶ Provides a summary of what the model has learned about the labeling functions. For each labeling function, estimates of the following are provided:
Abstain Accuracy Coverage
[Following are only available for binary tasks] True Positive (TP) False Positive (FP) True Negative (TN) False Negative (FN)
For scoped categoricals, the information provided is for the maximum observed cardinality of any single data point.
 WARNING: This uses Gibbs sampling to estimate these values. This will
 tend to mix poorly when there are many very accurate labeling functions. In this case, this function will assume that the classes are approximately balanced.

marginals
(L)[source]¶ Given an M x N label matrix, returns marginal probabilities for each candidate, depending on classification setting:
 Binary: Returns Mdim array representing the marginal probability
 of each candidate being True
 Categorical (cardinality = K): Returns M x K dense matrix
 representing the marginal probabilities of each candidate being each class.
 Scoped Categorical (cardinality = K, cardinality_ranges not None):
 Returns an M x K sparse matrix of marginals.
In the categorical setting, the K values (columns in the marginals matrix) correspond to indices of the Candidate values defined.

optional_names
= ('lf_prior', 'lf_propensity', 'lf_class_propensity')¶

predictions
(X, b=0.5)¶ Return numpy array of elements in {1,0,1} based on predicted marginal probabilities.

representation
= False¶

save_marginals
(session, X, training=False)¶ Save the predicted marginal probabilities for the Candidates X.

score
(X_test, Y_test, b=0.5, set_unlabeled_as_neg=True)¶  Returns the summary scores:
 For binary: precision, recall, F1 score
 For categorical: accuracy
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting.
Note: Unlike in self.error_analysis, this method assumes X_test and Y_test are properly collated!

train
(L, deps=(), LF_acc_prior_weights=None, LF_acc_prior_weight_default=1, labels=None, label_prior_weight=5, init_deps=0.0, init_class_prior=1.0, epochs=30, step_size=None, decay=1.0, reg_param=0.1, reg_type=2, verbose=False, truncation=10, burn_in=5, cardinality=None, timer=None, candidate_ranges=None, threads=1)[source]¶ Fits the parameters of the model to a data set. By default, learns a conditionally independent model. Additional unary dependencies can be set to be included in the constructor. Additional pairwise and higherorder dependencies can be included as an argument.
Results are stored as a member named weights, instance of snorkel.learning.gen_learning.GenerativeModelWeights.
Parameters:  L – M x N csr_AnnotationMatrixtype label matrix, where there are M candidates labeled by N labeling functions (LFs)
 deps – collection of dependencies to include in the model, each element is a tuple of the form (LF 1 index, LF 2 index, dependency type), see snorkel.learning.constants
 LF_acc_prior_weights – An Nelement list of prior weights for the LF accuracies (log scale)
 LF_acc_prior_weight_default – Default prior for the weight of each LF accuracy; if LF_acc_prior_weights is unset, each LF will have this accuracy prior weight (log scale)
 labels – Optional ground truth labels
 label_prior_weight – The prior probability that the ground truth labels (if provided) are correct (log scale)
 init_deps – initial weight for additional dependencies, except class prior (log scale)
 init_class_prior – initial class prior (in log scale), note only used if class_prior=True in constructor
 epochs – number of training epochs
 step_size – gradient step size, default is 1 / L.shape[0]
 decay – multiplicative decay of step size, step_size_(t+1) = step_size_(t) * decay
 reg_param – regularization strength
 reg_type – 1 = L1 regularization, 2 = L2 regularization
 verbose – whether to write debugging info to stdout
 truncation – number of iterations between truncation step for L1 regularization
 burn_in – number of burnin samples to take before beginning learning
 cardinality – number of possible classes; by default is inferred from the label matrix L
 timer – stopwatch for profiling, must implement start() and end()
 candidate_ranges – Optionally, a list of M sets of integer values, representing the possible categorical values that each of the M candidates can take. If a label is outside of this range throws an error. If None, then each candidate can take any value from 0 to cardinality.
 threads – the number of threads to use for sampling. Default is 1.
Discriminative Models¶

class
snorkel.learning.disc_learning.
TFNoiseAwareModel
(n_threads=None, **kwargs)[source]¶ Generic NoiseAwareModel class for TensorFlow models. Note that the actual network is built when train is called (to allow for model architectures which depend on the training data, e.g. vocab size).
Parameters: n_threads – Parallelism to use; singlethreaded if None 
error_analysis
(session, X_test, Y_test, gold_candidate_set=None, b=0.5, set_unlabeled_as_neg=True, display=True, scorer=<class 'snorkel.learning.utils.MentionScorer'>, **kwargs)¶ Prints full score analysis using the Scorer class, and then returns the a tuple of sets conatining the test candidates bucketed for error analysis, i.e.:
 For binary: TP, FP, TN, FN
 For categorical: correct, incorrect
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 gold_candidate_set – Full set of TPs in the test set
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting
 display – Print score report
 scorer – The Scorer subclass to use

load
(model_name=None, save_dir='checkpoints', verbose=True)[source]¶ Load model from file and rebuild in new graph / session.

marginals
(X, **kwargs)¶

predictions
(X, b=0.5)¶ Return numpy array of elements in {1,0,1} based on predicted marginal probabilities.

representation
= False¶

save
(model_name=None, save_dir='checkpoints', verbose=True, global_step=0)[source]¶ Save current model.

save_marginals
(session, X, training=False)¶ Save the predicted marginal probabilities for the Candidates X.

score
(X_test, Y_test, b=0.5, set_unlabeled_as_neg=True)¶  Returns the summary scores:
 For binary: precision, recall, F1 score
 For categorical: accuracy
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting.
Note: Unlike in self.error_analysis, this method assumes X_test and Y_test are properly collated!

train
(X_train, Y_train, n_epochs=25, lr=0.01, batch_size=256, rebalance=False, X_dev=None, Y_dev=None, print_freq=5, dev_ckpt=True, dev_ckpt_delay=0.75, save_dir='checkpoints', **kwargs)[source]¶ Generic training procedure for TF model
Parameters:  X_train – The training Candidates. If self.representation is True, then this is a list of Candidate objects; else is a csr_AnnotationMatrix with rows corresponding to training candidates and columns corresponding to features.
 Y_train – Array of marginal probabilities for each Candidate
 n_epochs – Number of training epochs
 lr – Learning rate
 batch_size – Batch size for SGD
 rebalance – Bool or fraction of positive examples for training  if True, defaults to standard 0.5 class balance  if False, no class balancing
 X_dev – Candidates for evaluation, same format as X_train
 Y_dev – Labels for evaluation, same format as Y_train
 print_freq – number of epochs at which to print status, and if present, evaluate the dev set (X_dev, Y_dev).
 dev_ckpt – If True, save a checkpoint whenever highest score on (X_dev, Y_dev) reached. Note: currently only evaluates at every @print_freq epochs.
 dev_ckpt_delay – Start dev checkpointing after this portion of n_epochs.
 save_dir – Save dir path for checkpointing.
 kwargs – All hyperparameters that change how the graph is built must be passed through here to be saved and reloaded to save / reload model. NOTE: If a parameter needed to build the network and/or is needed at test time is not included here, the model will not be able to be reloaded!


class
snorkel.learning.disc_models.logistic_regression.
LogisticRegression
(n_threads=None, **kwargs)[source]¶ 
error_analysis
(session, X_test, Y_test, gold_candidate_set=None, b=0.5, set_unlabeled_as_neg=True, display=True, scorer=<class 'snorkel.learning.utils.MentionScorer'>, **kwargs)¶ Prints full score analysis using the Scorer class, and then returns the a tuple of sets conatining the test candidates bucketed for error analysis, i.e.:
 For binary: TP, FP, TN, FN
 For categorical: correct, incorrect
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 gold_candidate_set – Full set of TPs in the test set
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting
 display – Print score report
 scorer – The Scorer subclass to use

load
(model_name=None, save_dir='checkpoints', verbose=True)¶ Load model from file and rebuild in new graph / session.

predictions
(X, b=0.5)¶ Return numpy array of elements in {1,0,1} based on predicted marginal probabilities.

representation
= False¶

save
(model_name=None, save_dir='checkpoints', verbose=True, global_step=0)¶ Save current model.

save_marginals
(session, X, training=False)¶ Save the predicted marginal probabilities for the Candidates X.

score
(X_test, Y_test, b=0.5, set_unlabeled_as_neg=True)¶  Returns the summary scores:
 For binary: precision, recall, F1 score
 For categorical: accuracy
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting.
Note: Unlike in self.error_analysis, this method assumes X_test and Y_test are properly collated!

train
(X_train, Y_train, n_epochs=25, lr=0.01, batch_size=256, rebalance=False, X_dev=None, Y_dev=None, print_freq=5, dev_ckpt=True, dev_ckpt_delay=0.75, save_dir='checkpoints', **kwargs)¶ Generic training procedure for TF model
Parameters:  X_train – The training Candidates. If self.representation is True, then this is a list of Candidate objects; else is a csr_AnnotationMatrix with rows corresponding to training candidates and columns corresponding to features.
 Y_train – Array of marginal probabilities for each Candidate
 n_epochs – Number of training epochs
 lr – Learning rate
 batch_size – Batch size for SGD
 rebalance – Bool or fraction of positive examples for training  if True, defaults to standard 0.5 class balance  if False, no class balancing
 X_dev – Candidates for evaluation, same format as X_train
 Y_dev – Labels for evaluation, same format as Y_train
 print_freq – number of epochs at which to print status, and if present, evaluate the dev set (X_dev, Y_dev).
 dev_ckpt – If True, save a checkpoint whenever highest score on (X_dev, Y_dev) reached. Note: currently only evaluates at every @print_freq epochs.
 dev_ckpt_delay – Start dev checkpointing after this portion of n_epochs.
 save_dir – Save dir path for checkpointing.
 kwargs – All hyperparameters that change how the graph is built must be passed through here to be saved and reloaded to save / reload model. NOTE: If a parameter needed to build the network and/or is needed at test time is not included here, the model will not be able to be reloaded!


class
snorkel.learning.disc_models.logistic_regression.
SparseLogisticRegression
(n_threads=None, **kwargs)[source]¶ 
error_analysis
(session, X_test, Y_test, gold_candidate_set=None, b=0.5, set_unlabeled_as_neg=True, display=True, scorer=<class 'snorkel.learning.utils.MentionScorer'>, **kwargs)¶ Prints full score analysis using the Scorer class, and then returns the a tuple of sets conatining the test candidates bucketed for error analysis, i.e.:
 For binary: TP, FP, TN, FN
 For categorical: correct, incorrect
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 gold_candidate_set – Full set of TPs in the test set
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting
 display – Print score report
 scorer – The Scorer subclass to use

get_weights
()¶ Get model weights and bias

load
(model_name=None, save_dir='checkpoints', verbose=True)¶ Load model from file and rebuild in new graph / session.

predictions
(X, b=0.5)¶ Return numpy array of elements in {1,0,1} based on predicted marginal probabilities.

representation
= False¶

save
(model_name=None, save_dir='checkpoints', verbose=True, global_step=0)¶ Save current model.

save_marginals
(session, X, training=False)¶ Save the predicted marginal probabilities for the Candidates X.

score
(X_test, Y_test, b=0.5, set_unlabeled_as_neg=True)¶  Returns the summary scores:
 For binary: precision, recall, F1 score
 For categorical: accuracy
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting.
Note: Unlike in self.error_analysis, this method assumes X_test and Y_test are properly collated!

train
(X_train, Y_train, n_epochs=25, lr=0.01, batch_size=256, rebalance=False, X_dev=None, Y_dev=None, print_freq=5, dev_ckpt=True, dev_ckpt_delay=0.75, save_dir='checkpoints', **kwargs)¶ Generic training procedure for TF model
Parameters:  X_train – The training Candidates. If self.representation is True, then this is a list of Candidate objects; else is a csr_AnnotationMatrix with rows corresponding to training candidates and columns corresponding to features.
 Y_train – Array of marginal probabilities for each Candidate
 n_epochs – Number of training epochs
 lr – Learning rate
 batch_size – Batch size for SGD
 rebalance – Bool or fraction of positive examples for training  if True, defaults to standard 0.5 class balance  if False, no class balancing
 X_dev – Candidates for evaluation, same format as X_train
 Y_dev – Labels for evaluation, same format as Y_train
 print_freq – number of epochs at which to print status, and if present, evaluate the dev set (X_dev, Y_dev).
 dev_ckpt – If True, save a checkpoint whenever highest score on (X_dev, Y_dev) reached. Note: currently only evaluates at every @print_freq epochs.
 dev_ckpt_delay – Start dev checkpointing after this portion of n_epochs.
 save_dir – Save dir path for checkpointing.
 kwargs – All hyperparameters that change how the graph is built must be passed through here to be saved and reloaded to save / reload model. NOTE: If a parameter needed to build the network and/or is needed at test time is not included here, the model will not be able to be reloaded!


class
snorkel.learning.disc_models.rnn.rnn_base.
RNNBase
(n_threads=None, **kwargs)[source]¶ 
error_analysis
(session, X_test, Y_test, gold_candidate_set=None, b=0.5, set_unlabeled_as_neg=True, display=True, scorer=<class 'snorkel.learning.utils.MentionScorer'>, **kwargs)¶ Prints full score analysis using the Scorer class, and then returns the a tuple of sets conatining the test candidates bucketed for error analysis, i.e.:
 For binary: TP, FP, TN, FN
 For categorical: correct, incorrect
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 gold_candidate_set – Full set of TPs in the test set
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting
 display – Print score report
 scorer – The Scorer subclass to use

load
(model_name=None, save_dir='checkpoints', verbose=True)¶ Load model from file and rebuild in new graph / session.

marginals
(test_candidates)[source]¶ Get likelihood of tagged sequences represented by test_candidates @test_candidates: list of lists representing test sentence

predictions
(X, b=0.5)¶ Return numpy array of elements in {1,0,1} based on predicted marginal probabilities.

representation
= True¶

save
(model_name=None, save_dir='checkpoints', verbose=True, global_step=0)¶ Save current model.

save_marginals
(session, X, training=False)¶ Save the predicted marginal probabilities for the Candidates X.

score
(X_test, Y_test, b=0.5, set_unlabeled_as_neg=True)¶  Returns the summary scores:
 For binary: precision, recall, F1 score
 For categorical: accuracy
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting.
Note: Unlike in self.error_analysis, this method assumes X_test and Y_test are properly collated!


snorkel.learning.disc_models.rnn.re_rnn.
mark
(l, h, idx)[source]¶ Produce markers based on argument positions
Parameters:  l – sentence position of first word in argument
 h – sentence position of last word in argument
 idx – argument index (1 or 2)

snorkel.learning.disc_models.rnn.re_rnn.
mark_sentence
(s, args)[source]¶ Insert markers around relation arguments in word sequence
Parameters:  s – list of tokens in sentence
 args – list of triples (l, h, idx) as per @_mark(...) corresponding to relation arguments
 Example: Then Barack married Michelle.
 > Then ~~[[1 Barack 1]]~~ married ~~[[2 Michelle 2]]~~.

class
snorkel.learning.disc_models.rnn.re_rnn.
reRNN
(n_threads=None, **kwargs)[source]¶ reRNN for relation extraction

error_analysis
(session, X_test, Y_test, gold_candidate_set=None, b=0.5, set_unlabeled_as_neg=True, display=True, scorer=<class 'snorkel.learning.utils.MentionScorer'>, **kwargs)¶ Prints full score analysis using the Scorer class, and then returns the a tuple of sets conatining the test candidates bucketed for error analysis, i.e.:
 For binary: TP, FP, TN, FN
 For categorical: correct, incorrect
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 gold_candidate_set – Full set of TPs in the test set
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting
 display – Print score report
 scorer – The Scorer subclass to use

load
(model_name=None, save_dir='checkpoints', verbose=True)¶ Load model from file and rebuild in new graph / session.

marginals
(test_candidates)¶ Get likelihood of tagged sequences represented by test_candidates @test_candidates: list of lists representing test sentence

predictions
(X, b=0.5)¶ Return numpy array of elements in {1,0,1} based on predicted marginal probabilities.

representation
= True¶

save
(model_name=None, save_dir='checkpoints', verbose=True, global_step=0)¶ Save current model.

save_marginals
(session, X, training=False)¶ Save the predicted marginal probabilities for the Candidates X.

score
(X_test, Y_test, b=0.5, set_unlabeled_as_neg=True)¶  Returns the summary scores:
 For binary: precision, recall, F1 score
 For categorical: accuracy
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting.
Note: Unlike in self.error_analysis, this method assumes X_test and Y_test are properly collated!

train
(X_train, Y_train, X_dev=None, max_sentence_length=None, **kwargs)¶ Perform preprocessing of data, construct datasetspecific model, then train.


class
snorkel.learning.disc_models.rnn.tag_rnn.
TagRNN
(n_threads=None, **kwargs)[source]¶ TagRNN for sequence tagging

CLOSE
= '~~]]~~'¶

OPEN
= '~~[[~~'¶

error_analysis
(session, X_test, Y_test, gold_candidate_set=None, b=0.5, set_unlabeled_as_neg=True, display=True, scorer=<class 'snorkel.learning.utils.MentionScorer'>, **kwargs)¶ Prints full score analysis using the Scorer class, and then returns the a tuple of sets conatining the test candidates bucketed for error analysis, i.e.:
 For binary: TP, FP, TN, FN
 For categorical: correct, incorrect
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 gold_candidate_set – Full set of TPs in the test set
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting
 display – Print score report
 scorer – The Scorer subclass to use

load
(model_name=None, save_dir='checkpoints', verbose=True)¶ Load model from file and rebuild in new graph / session.

marginals
(test_candidates)¶ Get likelihood of tagged sequences represented by test_candidates @test_candidates: list of lists representing test sentence

predictions
(X, b=0.5)¶ Return numpy array of elements in {1,0,1} based on predicted marginal probabilities.

representation
= True¶

save
(model_name=None, save_dir='checkpoints', verbose=True, global_step=0)¶ Save current model.

save_marginals
(session, X, training=False)¶ Save the predicted marginal probabilities for the Candidates X.

score
(X_test, Y_test, b=0.5, set_unlabeled_as_neg=True)¶  Returns the summary scores:
 For binary: precision, recall, F1 score
 For categorical: accuracy
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting.
Note: Unlike in self.error_analysis, this method assumes X_test and Y_test are properly collated!

train
(X_train, Y_train, X_dev=None, max_sentence_length=None, **kwargs)¶ Perform preprocessing of data, construct datasetspecific model, then train.


class
snorkel.learning.disc_models.rnn.text_rnn.
TextRNN
(n_threads=None, **kwargs)[source]¶ TextRNN for strings of text.

error_analysis
(session, X_test, Y_test, gold_candidate_set=None, b=0.5, set_unlabeled_as_neg=True, display=True, scorer=<class 'snorkel.learning.utils.MentionScorer'>, **kwargs)¶ Prints full score analysis using the Scorer class, and then returns the a tuple of sets conatining the test candidates bucketed for error analysis, i.e.:
 For binary: TP, FP, TN, FN
 For categorical: correct, incorrect
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 gold_candidate_set – Full set of TPs in the test set
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting
 display – Print score report
 scorer – The Scorer subclass to use

load
(model_name=None, save_dir='checkpoints', verbose=True)¶ Load model from file and rebuild in new graph / session.

marginals
(test_candidates)¶ Get likelihood of tagged sequences represented by test_candidates @test_candidates: list of lists representing test sentence

predictions
(X, b=0.5)¶ Return numpy array of elements in {1,0,1} based on predicted marginal probabilities.

representation
= True¶

save
(model_name=None, save_dir='checkpoints', verbose=True, global_step=0)¶ Save current model.

save_marginals
(session, X, training=False)¶ Save the predicted marginal probabilities for the Candidates X.

score
(X_test, Y_test, b=0.5, set_unlabeled_as_neg=True)¶  Returns the summary scores:
 For binary: precision, recall, F1 score
 For categorical: accuracy
Parameters:  X_test – The input test candidates, as a list or annotation matrix
 Y_test – The input test labels, as a list or annotation matrix
 b – Decision boundary for binary setting only
 set_unlabeled_as_neg – Whether to map 0 labels > 1, binary setting.
Note: Unlike in self.error_analysis, this method assumes X_test and Y_test are properly collated!

train
(X_train, Y_train, X_dev=None, max_sentence_length=None, **kwargs)¶ Perform preprocessing of data, construct datasetspecific model, then train.

Learning Utilities¶

class
snorkel.learning.utils.
GridSearch
(model_class, parameters, X_train, Y_train=None, **model_class_params)[source]¶ Runs hyperparameter grid search over a model object with train and score methods, training data (X), and training_marginals Selects based on maximizing F1 score on a supplied validation set Specify search space with Hyperparameter arguments

snorkel.learning.utils.
LF_accuracies
(L, labels)[source]¶ Given an N x M matrix where L_{i,j} is the label given by the jth LF to the ith candidate, and labels {1,1} Return the accuracy of each LF w.r.t. these labels

snorkel.learning.utils.
LF_conflicts
(L)[source]¶ Given an N x M matrix where L_{i,j} is the label given by the jth LF to the ith candidate: Return the fraction of candidates that each LF _conflicts with other LFs on_.

snorkel.learning.utils.
LF_coverage
(L)[source]¶ Given an N x M matrix where L_{i,j} is the label given by the jth LF to the ith candidate: Return the fraction of candidates that each LF labels.

snorkel.learning.utils.
LF_overlaps
(L)[source]¶ Given an N x M matrix where L_{i,j} is the label given by the jth LF to the ith candidate: Return the fraction of candidates that each LF _overlaps with other LFs on_.

class
snorkel.learning.utils.
ListParameter
(name, parameter_list)[source]¶ List of parameter values for searching

draw_values
(n)¶


class
snorkel.learning.utils.
MentionScorer
(test_candidates, test_labels, gold_candidate_set=None)[source]¶ Scorer for mention level assessment

score
(test_marginals, **kwargs)¶


class
snorkel.learning.utils.
ModelTester
(model_class, model_class_params, params_queue, scores_queue, X_train, X_valid, Y_valid, Y_train=None, b=0.5, set_unlabeled_as_neg=True, save_dir='checkpoints')[source]¶ 
authkey
¶

daemon
¶ Return whether process is a daemon

exitcode
¶ Return exit code of process or None if it has yet to stop

ident
¶ Return identifier (PID) of process or None if it has yet to start

is_alive
()¶ Return whether process is alive

join
(timeout=None)¶ Wait until child process terminates

name
¶

pid
¶ Return identifier (PID) of process or None if it has yet to start

start
()¶ Start child process

terminate
()¶ Terminate process; sends SIGTERM signal or uses TerminateProcess()


class
snorkel.learning.utils.
RandomSearch
(model_class, parameters, X_train, Y_train=None, n=10, **model_class_params)[source]¶ 
fit
(X_valid, Y_valid, b=0.5, set_unlabeled_as_neg=True, validation_kwargs={}, n_threads=1, save_dir='checkpoints', **model_hyperparams)¶


class
snorkel.learning.utils.
RangeParameter
(name, v1, v2, step=1, log_base=None)[source]¶ Range of parameter values for searching. min_value and max_value are the ends of the search range If log_base is specified, scale the search range in the log base step is range step size or exponent step size

draw_values
(n)¶


class
snorkel.learning.utils.
Scorer
(test_candidates, test_labels, gold_candidate_set=None)[source]¶ Abstract type for scorers

snorkel.learning.utils.
binary_scores_from_counts
(ntp, nfp, ntn, nfn)[source]¶ Precision, recall, and F1 scores from counts of TP, FP, TN, FN. Example usage:
p, r, f1 = binary_scores_from_counts(*map(len, error_sets))

snorkel.learning.utils.
calibration_plots
(train_marginals, test_marginals, gold_labels=None)[source]¶ Show classification accuracy and probability histogram plots

snorkel.learning.utils.
candidate_conflict
(L)[source]¶ Given an N x M matrix where L_{i,j} is the label given by the jth LF to the ith candidate: Return the fraction of candidates which have > 1 (nonzero) labels _which are not equal_.

snorkel.learning.utils.
candidate_coverage
(L)[source]¶ Given an N x M matrix where L_{i,j} is the label given by the jth LF to the ith candidate: Return the fraction of candidates which have > 0 (nonzero) labels.

snorkel.learning.utils.
candidate_overlap
(L)[source]¶ Given an N x M matrix where L_{i,j} is the label given by the jth LF to the ith candidate: Return the fraction of candidates which have > 1 (nonzero) labels.

snorkel.learning.utils.
reshape_marginals
(marginals)[source]¶ Returns correctly shaped marginals as np array