pycognaize.document.tag.html_tag.HTMLTableTag

class HTMLTableTag(tag_id, value, ocr_value, xpath, title, html_id, cell_data, html, source_ids, is_table=True)[source]

Bases: HTMLTagABC

Represents table’s coordinate data in XBRL document

Parameters:
  • tag_id (str)

  • value (str)

  • ocr_value (str)

  • xpath (str)

  • title (str)

  • html_id (Union[str, List[str]])

  • cell_data (dict)

  • html (HTML)

  • is_table (bool)

Methods

construct_from_raw

Builds HTMLTableTag objeTct from pycognaize raw data :type raw: dict :param raw: pycognaize field's tag info :type html: HTML :param html: HTML

replace_nans_with_empty_html_tags

Replaces NaN values in a DataFrame with empty HTML tags.

set_class_confidence

to_dict

Converts HTMLTableTag to dict

Attributes

cell_data

cells

df

html

html_id

is_table

ocr_value

raw_df

source_ids

tag_id

title

value

xpath

classmethod construct_from_raw(raw, html)[source]

Builds HTMLTableTag objeTct from pycognaize raw data :type raw: dict :param raw: pycognaize field’s tag info :type html: HTML :param html: HTML

Return type:

HTMLTableTag

Returns:

Parameters:
  • raw (dict)

  • html (HTML)

replace_nans_with_empty_html_tags(df)[source]

Replaces NaN values in a DataFrame with empty HTML tags.

Parameters:

df (DataFrame)

Return type:

DataFrame

to_dict()[source]

Converts HTMLTableTag to dict

Return type:

dict