pycognaize.common.table_utils.assign_indices_to_tables

assign_indices_to_tables(tables, all_tables=None, threshold=0.4)[source]
If the document is an XBRL document,

the function matches the tables based on the ordering of all tables.

If it’s not an XBRL document,

the tables are grouped by pages and for each page, the tables are left sorted and ordered horizontally and vertically.

Return dict where the keys are indices based above-mentioned ordering

and the values are the corresponding tables.

Parameters:
  • tables – a list of tables that need to be indexed

  • all_tables (Optional[list]) – a list of all tables in the document. This parameter is required if the tables are from an XBRL document

  • threshold (float) – intersection threshold

Return type:

dict