Etc: Viewing and Annotating Data, Writing LFs

Using the Viewer to Inspect and Annotate Data

Helpers for Writing Labeling Functions

snorkel.lf_helpers.contains_token(c, tok, attrib='words', case_sensitive=False)[source]

Checks if any of the contituent Spans contain a token :param attrib: The token attribute type (e.g. words, lemmas, poses)

snorkel.lf_helpers.get_between_tokens(c, attrib='words', n_max=1, case_sensitive=False)[source]

TODO: write doc_string

snorkel.lf_helpers.get_doc_candidate_spans(c)[source]

Get the Spans in the same document as Candidate c, where these Spans are arguments of Candidates.

snorkel.lf_helpers.get_left_tokens(c, window=3, attrib='words', n_max=1, case_sensitive=False)[source]

Return the tokens within a window to the _left_ of the Candidate. For higher-arity Candidates, defaults to the _first_ argument. :param window: The number of tokens to the left of the first argument to return :param attrib: The token attribute type (e.g. words, lemmas, poses)

snorkel.lf_helpers.get_matches(lf, candidate_set, match_values=[1, -1])[source]

A simple helper function to see how many matches (non-zero by default) an LF gets. Returns the matched set, which can then be directly put into the Viewer.

snorkel.lf_helpers.get_right_tokens(c, window=3, attrib='words', n_max=1, case_sensitive=False)[source]

Return the tokens within a window to the _right_ of the Candidate. For higher-arity Candidates, defaults to the _last_ argument. :param window: The number of tokens to the right of the last argument to return :param attrib: The token attribute type (e.g. words, lemmas, poses)

snorkel.lf_helpers.get_sent_candidate_spans(c)[source]

Get the Spans in the same Sentence as Candidate c, where these Spans are arguments of Candidates.

snorkel.lf_helpers.get_tagged_text(c)[source]

Returns the text of c’s parent context with c’s unary spans replaced with tags {{A}}, {{B}}, etc. A convenience method for writing LFs based on e.g. regexes.

snorkel.lf_helpers.get_text_between(c)[source]

Returns the text between the two unary Spans of a binary-Span Candidate, where both are in the same Sentence.

snorkel.lf_helpers.get_text_splits(c)[source]

Given a k-arity Candidate defined over k Spans, return the chunked parent context (e.g. Sentence) split around the k constituent Spans.

NOTE: Currently assumes that these Spans are in the same Context

snorkel.lf_helpers.is_inverted(c)[source]

Returns True if the ordering of the candidates in the sentence is inverted.

Helpers for Loading External Annotations

class snorkel.loaders.ExternalAnnotationsLoader(session, candidate_class, candidate_set, annotation_key, expand_candidate_set=False)[source]

Class to load external annotations.

add(temp_contexts)[source]

Adds a candidate to a new or existing candidate_set.

Parameters:temp_contexts – This is a dictionary of TemporaryContext objects corresponding to the args of

the Candidate class.

snorkel.loaders.create_or_fetch(session, set_class, instance_or_name)[source]

Returns a named set ORM object given an instance or name as string