pycognaize.common.utils

Functions

clean_ocr_data

Cleans the ocr data

compute_intersection_area

Given the word and coordinates of the area, computes area of intersection :type word: dict :param word: dictionary, containing coordinates, ocr_text and word_id_number of the word :type left: [<class 'int'>, <class 'float'>] :param left: left coordinate :type right: [<class 'int'>, <class 'float'>] :param right: right coordinate :type top: [<class 'int'>, <class 'float'>] :param top: top coordinate :type bottom: [<class 'int'>, <class 'float'>] :param bottom: bottom coordinate :rtype: [<class 'int'>, <class 'float'>] :return: area of intersection

compute_otsu_threshold

convert_coord_to_num

If the input is a string representation of a number (with or without a %), convert it into float.

convert_tag_coords_to_percentages

detect_python_shell

Detect what python shell is being used - Standard Interpreter, Jupyter Notebook, Interactive shell or other

directory_summary_hash

Computes hash of directory summary

empty_keys

filter_out_nested_lines

Filters out nested html text lines

find_first_word_coords

Detect the coordinates of the first occurrence

get_index_of_first_non_empty_list

group_sequence

Group sequence into list example -> test_list = [1, 2, 3, 8, 15, 23, 24, 25, 10, 11, 13, 15] sequence_list = [[1, 2, 3], [23, 24, 25], [10, 11]] :type list_of_integers: :param list_of_integers: list of integers :return: list of grouped integers

image_bytes_to_array

Convert image bytes into numpy array

image_string_to_array

Convert a bytestring into a numpy array with opencv

img_to_black_and_white

Image to only black and white image

infer_rows_from_words

Infer row coordinates by class crop (table, column, row)

intersects

Given the word and coordinates of the area,

is_float

Check if the string value is a valid number.

join_path

Join multiple path parts into a single path, optionally converting it to an S3-style path

load_bson_by_path

preview_img

Preview the given image in a window

replace_object_ids_with_string

stick_word_boxes

Stick boxes with OpenCV

Classes

ConfusionMatrix