pycognaize.model.Model
- class Model[source]
Bases:
object
Inherit from this abstract class and implement predict method
The model inputs and outputs are available from the document attribute.
Methods
Given 2 extraction fields, actual and predicted, calculates tp, i.e. number of identical field matches.
Given lists of actual and predicted tags calculates tp, i.e. number of identical matches.
copy
Given actual and predicted groups/entities, finds pairs which have any match (even one tag in one of the fields) :return matched_keys: list of dicts, where each dict contains matched pairs' group keys
Evaluation on field level. Comparison is between all actual and predicted fields of each extraction field. :rtype:
dict
:returns: dict with keys - field names, values - dict of computed metrics for that field.Evaluation on group level Comparison is between all actual and predicted groups/entities.
Evaluation on tag level. Comparison is between all actual and predicted tags of each extraction field :rtype:
dict
:returns: dict with keys - extraction field names and values - dict of computed metrics for that field.General evaluate functionality by tag, field and group level :type act_document:
Document
:param act_document: actual document (ground truth) :type pred_document:Document
:param pred_document: predicted document :rtype:dict
execute_based_on_match
Execute evaluation for a given model_version
Alias for execute_genie_v2
Execute genie for a given task_id
Given two lists of actual and predicted fields, computes ConfusionMatrix.
Group the entities by group keys.
If tags are HTMLTag checks that two tags have the same html_id,
predict
Attributes
DEFAULT_TIMEOUT
RETRIES
- compute_field_level_tp(act_field, pred_field, only_content=False)[source]
Given 2 extraction fields, actual and predicted, calculates tp, i.e. number of identical field matches
- Return type:
int
- compute_tag_level_tp(act_tags, pred_tags, only_content=False)[source]
Given lists of actual and predicted tags calculates tp, i.e. number of identical matches
- Parameters:
act_tags (
list
)pred_tags (
list
)
- Return type:
int
- detect_entity_matches(act_entities, pred_entities)[source]
Given actual and predicted groups/entities, finds pairs which have any match (even one tag in one of the fields) :return matched_keys: list of dicts, where each dict contains
matched pairs’ group keys
- Parameters:
act_entities (
dict
)pred_entities (
dict
)
- Return type:
list
- eval_field_level(act_document, pred_document, only_content=False)[source]
Evaluation on field level. Comparison is between all actual and predicted
fields of each extraction field.
- Return type:
dict
- Returns:
dict with keys - field names, values - dict of computed metrics for that
- Parameters:
field
- eval_group_level(act_document, pred_document, only_content=False)[source]
Evaluation on group level Comparison is between all actual and predicted groups/entities. :rtype:
dict
:returns: dict with keys - metrics names, values - computed scores
- eval_tag_level(act_document, pred_document, only_content=False)[source]
Evaluation on tag level. Comparison is between all actual and predicted tags
of each extraction field
- Return type:
dict
- Returns:
dict with keys - extraction field names and values - dict of computed metrics
- Parameters:
for that field
- abstract evaluate(act_document, pred_document, only_content=False)[source]
General evaluate functionality by tag, field and group level :type act_document:
Document
:param act_document: actual document (ground truth) :type pred_document:Document
:param pred_document: predicted document :rtype:dict
- execute_eval(token, url, model_version, ground_truth_id=None)[source]
Execute evaluation for a given model_version
- Parameters:
token (
str
)url (
str
)model_version (
str
)ground_truth_id (
str
)
- Return type:
List
[Response
]
- execute_genie(task_id, token, url)[source]
Alias for execute_genie_v2
- Parameters:
task_id (
str
)token (
str
)url (
str
)
- Return type:
Response
- execute_genie_v2(task_id, token, url)[source]
Execute genie for a given task_id
- Parameters:
task_id (
str
)token (
str
)url (
str
)
- Return type:
Response
- get_field_level_conf_matrix(act_fields, pred_fields, only_content=False)[source]
Given two lists of actual and predicted fields, computes ConfusionMatrix. Takes into account both classification and extraction fields. Ignores empty fields
- Parameters:
act_fields (
list
)pred_fields (
list
)
- Return type:
- static group_entities(document)[source]
- Group the entities by group keys.
Fields having the same group key belong to the same entity. Returns dict of dicts, where keys are group keys of found entities, values are dicts with field names (keys): fields (values) belonging to that entity
- Parameters:
document (
Document
)- Return type:
dict
- Returns:
dict of dicts
- static matches(act_tag, pred_tag, threshold=0.6)[source]
- If tags are HTMLTag checks that two tags have the same html_id,
otherwise detects if there is a match between two extraction tags having the same page number. Returns true if
intersection is greater than the threshold
- Parameters:
act_tag (
Union
[ExtractionTag
,HTMLTag
,HTMLTableTag
])pred_tag (
Union
[ExtractionTag
,HTMLTag
,HTMLTableTag
])threshold (
float
)
- Return type:
bool