pycognaize.common.utils.find_first_word_coords

find_first_word_coords(text, ocr_data, case_sensitive=False, sort=False, clean=True, cleanup_regex=re.compile('[^a-zA-Z\\\\d)\\\\[\\\\](-.,]'))[source]
Detect the coordinates of the first occurrence

of text in ocr_data if any.

If the text is not found in ocr_data return None :type text: str :param text: :type ocr_data: list :param ocr_data: List of dictionaries. Each dictionary contains information about a single word. Each word dictionary has the following keys: confidence, right,

left, top, bottom, ocr_text, word_id_number

Parameters:
  • case_sensitive (bool) – If True, the search will be case-sensitive

  • sort (bool) – If True, ocr_data will be ordered by word_id_number key before searching

  • clean (bool) – If true, disregard all non-alphanumeric character from the search

  • cleanup_regex (re._pattern_type) – Optional. Provide the regex for cleanup to be used (has effect only if clean=True)

  • text (str)

  • ocr_data (list)

Return type:

Optional[dict]

Returns:

Dictionary with word coordinates (keys: left, right, top, bottom, matched_words. matched_words includes the original word coordinate data for the matched words)