Etc: Viewing and Annotating Data, Writing LFs

Using the Viewer to Inspect and Annotate Data

class snorkel.viewer.SentenceNgramViewer(candidates, session, gold=[], n_per_page=3, height=225, annotator_name=None)[source]

Viewer for Sentence objects and candidate Spans within them

class snorkel.viewer.Viewer(candidates, session, gold=[], n_per_page=3, height=225, annotator_name=None)[source]

Generic object for viewing and labeling Candidate objects in their rendered Contexts.

handle_label_event(_, content, buffers)[source]

Handles label event by persisting new label

render()[source]

Renders viewer pane

Helpers for Writing Labeling Functions

snorkel.lf_helpers.contains_token(c, tok, attrib='words', case_sensitive=False)[source]

Checks if any of the contituent Spans contain a token :param attrib: The token attribute type (e.g. words, lemmas, poses)

snorkel.lf_helpers.get_between_tokens(c, attrib='words', n_max=1, case_sensitive=False)[source]

TODO: write doc_string

snorkel.lf_helpers.get_doc_candidate_spans(c)[source]

Get the Spans in the same document as Candidate c, where these Spans are arguments of Candidates.

snorkel.lf_helpers.get_left_tokens(c, window=3, attrib='words', n_max=1, case_sensitive=False)[source]

Return the tokens within a window to the _left_ of the Candidate. For higher-arity Candidates, defaults to the _first_ argument. :param window: The number of tokens to the left of the first argument to

return
Parameters:attrib – The token attribute type (e.g. words, lemmas, poses)
snorkel.lf_helpers.get_matches(lf, candidate_set, match_values=[1, -1])[source]

A simple helper function to see how many matches (non-zero by default) an LF gets. Returns the matched set, which can then be directly put into the Viewer.

snorkel.lf_helpers.get_right_tokens(c, window=3, attrib='words', n_max=1, case_sensitive=False)[source]

Return the tokens within a window to the _right_ of the Candidate. For higher-arity Candidates, defaults to the _last_ argument. :param window: The number of tokens to the right of the last argument to

return
Parameters:attrib – The token attribute type (e.g. words, lemmas, poses)
snorkel.lf_helpers.get_sent_candidate_spans(c)[source]

Get the Spans in the same Sentence as Candidate c, where these Spans are arguments of Candidates.

snorkel.lf_helpers.get_tagged_text(c)[source]

Returns the text of c’s parent context with c’s unary spans replaced with tags {{A}}, {{B}}, etc. A convenience method for writing LFs based on e.g. regexes.

snorkel.lf_helpers.get_text_between(c)[source]

Returns the text between the two unary Spans of a binary-Span Candidate, where both are in the same Sentence.

snorkel.lf_helpers.get_text_splits(c)[source]

Given a k-arity Candidate defined over k Spans, return the chunked parent context (e.g. Sentence) split around the k constituent Spans.

NOTE: Currently assumes that these Spans are in the same Context

snorkel.lf_helpers.is_inverted(c)[source]

Returns True if the ordering of the candidates in the sentence is inverted.

snorkel.lf_helpers.test_LF(session, lf, split, annotator_name)[source]

Gets the accuracy of a single LF on a split of the candidates, w.r.t. annotator labels, and also returns the error buckets of the candidates.

Helpers for Loading External Annotations