pycognaize.model.Model

Bases: object

Inherit from this abstract class and implement predict method

The model inputs and outputs are available from the document attribute.

Methods

`compute_field_level_tp`	Given 2 extraction fields, actual and predicted, calculates tp, i.e. number of identical field matches.
`compute_tag_level_tp`	Given lists of actual and predicted tags calculates tp, i.e. number of identical matches.
`copy`
`detect_entity_matches`	Given actual and predicted groups/entities, finds pairs which have any match (even one tag in one of the fields) :return matched_keys: list of dicts, where each dict contains matched pairs' group keys
`eval_field_level`	Evaluation on field level. Comparison is between all actual and predicted fields of each extraction field. :rtype: `dict` :returns: dict with keys - field names, values - dict of computed metrics for that field.
`eval_group_level`	Evaluation on group level Comparison is between all actual and predicted groups/entities.
`eval_tag_level`	Evaluation on tag level. Comparison is between all actual and predicted tags of each extraction field :rtype: `dict` :returns: dict with keys - extraction field names and values - dict of computed metrics for that field.
`evaluate`	General evaluate functionality by tag, field and group level :type act_document: `Document` :param act_document: actual document (ground truth) :type pred_document: `Document` :param pred_document: predicted document :rtype: `dict`
`execute_based_on_match`
`execute_eval`	Execute evaluation for a given model_version
`execute_genie`	Alias for execute_genie_v2
`execute_genie_v2`	Execute genie for a given task_id
`get_field_level_conf_matrix`	Given two lists of actual and predicted fields, computes ConfusionMatrix.
`group_entities`	Group the entities by group keys.
`matches`	If tags are HTMLTag checks that two tags have the same html_id,
`predict`

Attributes

`DEFAULT_TIMEOUT`
`RETRIES`

compute_field_level_tp(act_field, pred_field, only_content=False)[source]

Given 2 extraction fields, actual and predicted, calculates tp, i.e. number of identical field matches

Return type:: int

compute_tag_level_tp(act_tags, pred_tags, only_content=False)[source]

Given lists of actual and predicted tags calculates tp, i.e. number of identical matches

Parameters:

act_tags (list)
pred_tags (list)

Return type:

int

detect_entity_matches(act_entities, pred_entities)[source]

Given actual and predicted groups/entities, finds pairs which have any match (even one tag in one of the fields) :return matched_keys: list of dicts, where each dict contains

matched pairs’ group keys

Parameters:

act_entities (dict)
pred_entities (dict)

Return type:

list

eval_field_level(act_document, pred_document, only_content=False)[source]

Evaluation on field level. Comparison is between all actual and predicted

fields of each extraction field.

Return type:

dict

Returns:

dict with keys - field names, values - dict of computed metrics for that

Parameters:

act_document (Document)
pred_document (Document)

field

Parameters:

act_document (Document)
pred_document (Document)

Return type:

dict

eval_group_level(act_document, pred_document, only_content=False)[source]

Evaluation on group level Comparison is between all actual and predicted groups/entities. :rtype: dict :returns: dict with keys - metrics names, values - computed scores

Parameters:

act_document (Document)
pred_document (Document)

Return type:

dict

eval_tag_level(act_document, pred_document, only_content=False)[source]

Evaluation on tag level. Comparison is between all actual and predicted tags

of each extraction field

Return type:

dict

Returns:

dict with keys - extraction field names and values - dict of computed metrics

Parameters:

act_document (Document)
pred_document (Document)

for that field

Parameters:

act_document (Document)
pred_document (Document)

Return type:

dict

abstract evaluate(act_document, pred_document, only_content=False)[source]

General evaluate functionality by tag, field and group level :type act_document: Document :param act_document: actual document (ground truth) :type pred_document: Document :param pred_document: predicted document :rtype: dict

Parameters:

only_content – if True evaluation ignores locations of tags, considers only content
act_document (Document)
pred_document (Document)

Return type:

dict

execute_eval(token, url, model_version, ground_truth_id=None)[source]

Execute evaluation for a given model_version

Parameters:

token (str)
url (str)
model_version (str)
ground_truth_id (str)

Return type:

List[Response]

execute_genie(task_id, token, url)[source]

Alias for execute_genie_v2

Parameters:

task_id (str)
token (str)
url (str)

Return type:

Response

execute_genie_v2(task_id, token, url)[source]

Execute genie for a given task_id

Parameters:

task_id (str)
token (str)
url (str)

Return type:

Response

get_field_level_conf_matrix(act_fields, pred_fields, only_content=False)[source]

Given two lists of actual and predicted fields, computes ConfusionMatrix. Takes into account both classification and extraction fields. Ignores empty fields

Parameters:

act_fields (list)
pred_fields (list)

Return type:

ConfusionMatrix

static group_entities(document)[source]

Group the entities by group keys.: Fields having the same group key belong to the same entity. Returns dict of dicts, where keys are group keys of found entities, values are dicts with field names (keys): fields (values) belonging to that entity

Parameters:: document (Document)
Return type:: dict
Returns:: dict of dicts

static matches(act_tag, pred_tag, threshold=0.6)[source]

If tags are HTMLTag checks that two tags have the same html_id,: otherwise detects if there is a match between two extraction tags having the same page number. Returns true if

intersection is greater than the threshold

Parameters:

act_tag (Union[ExtractionTag, HTMLTag, HTMLTableTag])
pred_tag (Union[ExtractionTag, HTMLTag, HTMLTableTag])
threshold (float)

Return type:

bool