pycognaize.model.Model

class Model[source]

Bases: object

Inherit from this abstract class and implement predict method

The model inputs and outputs are available from the document attribute.

Methods

compute_field_level_tp

Given 2 extraction fields, actual and predicted, calculates tp, i.e. number of identical field matches.

compute_tag_level_tp

Given lists of actual and predicted tags calculates tp, i.e. number of identical matches.

copy

detect_entity_matches

Given actual and predicted groups/entities, finds pairs which have any match (even one tag in one of the fields) :return matched_keys: list of dicts, where each dict contains matched pairs' group keys

eval_field_level

Evaluation on field level. Comparison is between all actual and predicted fields of each extraction field. :rtype: dict :returns: dict with keys - field names, values - dict of computed metrics for that field.

eval_group_level

Evaluation on group level Comparison is between all actual and predicted groups/entities.

eval_tag_level

Evaluation on tag level. Comparison is between all actual and predicted tags of each extraction field :rtype: dict :returns: dict with keys - extraction field names and values - dict of computed metrics for that field.

evaluate

General evaluate functionality by tag, field and group level :type act_document: Document :param act_document: actual document (ground truth) :type pred_document: Document :param pred_document: predicted document :rtype: dict

execute_based_on_match

execute_eval

Execute evaluation for a given model_version

execute_genie

Alias for execute_genie_v2

execute_genie_v2

Execute genie for a given task_id

get_field_level_conf_matrix

Given two lists of actual and predicted fields, computes ConfusionMatrix.

group_entities

Group the entities by group keys.

matches

If tags are HTMLTag checks that two tags have the same html_id,

predict

Attributes

DEFAULT_TIMEOUT

RETRIES

compute_field_level_tp(act_field, pred_field, only_content=False)[source]

Given 2 extraction fields, actual and predicted, calculates tp, i.e. number of identical field matches

Return type:

int

compute_tag_level_tp(act_tags, pred_tags, only_content=False)[source]

Given lists of actual and predicted tags calculates tp, i.e. number of identical matches

Parameters:
  • act_tags (list)

  • pred_tags (list)

Return type:

int

detect_entity_matches(act_entities, pred_entities)[source]

Given actual and predicted groups/entities, finds pairs which have any match (even one tag in one of the fields) :return matched_keys: list of dicts, where each dict contains

matched pairs’ group keys

Parameters:
  • act_entities (dict)

  • pred_entities (dict)

Return type:

list

eval_field_level(act_document, pred_document, only_content=False)[source]

Evaluation on field level. Comparison is between all actual and predicted

fields of each extraction field.

Return type:

dict

Returns:

dict with keys - field names, values - dict of computed metrics for that

Parameters:

field

Parameters:
Return type:

dict

eval_group_level(act_document, pred_document, only_content=False)[source]

Evaluation on group level Comparison is between all actual and predicted groups/entities. :rtype: dict :returns: dict with keys - metrics names, values - computed scores

Parameters:
Return type:

dict

eval_tag_level(act_document, pred_document, only_content=False)[source]

Evaluation on tag level. Comparison is between all actual and predicted tags

of each extraction field

Return type:

dict

Returns:

dict with keys - extraction field names and values - dict of computed metrics

Parameters:

for that field

Parameters:
Return type:

dict

abstract evaluate(act_document, pred_document, only_content=False)[source]

General evaluate functionality by tag, field and group level :type act_document: Document :param act_document: actual document (ground truth) :type pred_document: Document :param pred_document: predicted document :rtype: dict

Parameters:
  • only_content – if True evaluation ignores locations of tags, considers only content

  • act_document (Document)

  • pred_document (Document)

Return type:

dict

execute_eval(token, url, model_version, ground_truth_id=None)[source]

Execute evaluation for a given model_version

Parameters:
  • token (str)

  • url (str)

  • model_version (str)

  • ground_truth_id (str)

Return type:

List[Response]

execute_genie(task_id, token, url)[source]

Alias for execute_genie_v2

Parameters:
  • task_id (str)

  • token (str)

  • url (str)

Return type:

Response

execute_genie_v2(task_id, token, url)[source]

Execute genie for a given task_id

Parameters:
  • task_id (str)

  • token (str)

  • url (str)

Return type:

Response

get_field_level_conf_matrix(act_fields, pred_fields, only_content=False)[source]

Given two lists of actual and predicted fields, computes ConfusionMatrix. Takes into account both classification and extraction fields. Ignores empty fields

Parameters:
  • act_fields (list)

  • pred_fields (list)

Return type:

ConfusionMatrix

static group_entities(document)[source]
Group the entities by group keys.

Fields having the same group key belong to the same entity. Returns dict of dicts, where keys are group keys of found entities, values are dicts with field names (keys): fields (values) belonging to that entity

Parameters:

document (Document)

Return type:

dict

Returns:

dict of dicts

static matches(act_tag, pred_tag, threshold=0.6)[source]
If tags are HTMLTag checks that two tags have the same html_id,

otherwise detects if there is a match between two extraction tags having the same page number. Returns true if

intersection is greater than the threshold

Parameters:
Return type:

bool