Leveraging Tables
Cognaize python SDK provides a wide functionality designed to handle
data in table forms of snapshot in the most efficient way. Two main
concepts for accessing and representing tables in Cognaize SDK
are TableField
and TableTag
classes.
See corresponding documentation in API reference Table Field and Table Tag.
Note
This tutorial requires understanding of the concepts of
Snapshot
Document
Field
ExtractionTag
If you are not familiar with these concepts, please refer to the Quick tutorial, API reference or Glossary first.
In the Quick tutorial we read a snapshot data using Snapshot.get()
and retrieved a TableField
object from document.x
.
>>> table_1
<TableField: table>
TableField
object can be used to extract the title of the table from the page,
or to convert the table to a JSON
format.
To get the table’s title data we use get_table_title()
method
which can be called without any additional parameters or with margin
and
n_lines_above
specified.
See the documentation of TableField.get_table_title() for more details.
>>> table_1.get_table_title()
'Sample table heading in form tutorial of Cognaize SDK'
In order to access the actual table structure, content and coordinates, we use the TableTag
class.
TableTag objects can be extracted from document
.
>>> table_1_tags = table_1.tags[0]
>>> table_1_tags
<TableTag: left: 8.6, right: 92.69999999999999, top: 12.0, bottom: 64.0998>
One of the most useful properties of TableTag
class is TableTag.df
method.
This property returns the table annotation in a pandas.DataFrame
format and afterwards we
can use the standard pandas functionality.
>>> table_1_tags.df
0 1 2
0 March 31, 2021 December 31, 2020
1 (unaudited)
2 Assets
3 Current assets:
4 Cash $ 10.9 $ —
5 Accounts receivable 11.6 8.6
6 Accounts receivable - related party 5.0 5.7
7 Prepaid expenses 0.3 0.4
8 Total current assets 27.8 14.7
9 Property, plant and equipment 377.6 371.8
10 Less: accumulated depreciation 51.1 46.5
11 Property, plant and equipment, net 326.5 325.3
12 Investment in unconsolidated affiliate 80.2 80.3
13 Other assets 0.5 0.6
14 Total assets $ 435.0 $ 420.9
15 Liabilities and members' equity
16 Current liabilities:
17 Accounts payable $ 29.2 $ 6.8
18 Accounts payable - related party 6.4 2.1
19 Accrued expenses and other liabilities 2.4 4.4
20 Accrued expenses - related party 0.1 0.3
21 Total current liabilities 38.1 13.6
22 Long-term debt 100.0 109.3
23 Deferred revenues 1.6 1.2
24 Other long-term liabilities 2.5 2.5
25 Members' equity 292.8 294.3
26 Total liabilities and members' equity $ 435.0 $ 420.9
Each TableTag
object consists of Cell. Access to Cells is provided through
properties cells
, cell_data
>>> cells = table_1_tags.cells
>>> f"{str(cells)[:400]}..."
'{(1, 1): <Cell: coords: (12.00000 , 61.80000 , 13.40000 , 8.60000 ) spans: (1 , 1 ) corner coords: (1 , 1 ) value: >, (2, 1): <Cell: coords: (12.00000 , 77.90000 , 13.40000 , 61.80000 ) spans: (1 , 1 ) corner coords: (2 , 1 ) value: ZZxCTGRLZeoIjx>, (3, 1): <Cell: coords: (12.00000 , 92.70000 , 13.40000 , 77.90000 ) spans: (1 , 1 ) corner coords: (3 , 1 ) value: kAMvVCPUBIACpWtZI>, (1...'