pycognaize.common.utils.find_first_word_coords
- find_first_word_coords(text, ocr_data, case_sensitive=False, sort=False, clean=True, cleanup_regex=re.compile('[^a-zA-Z\\\\d)\\\\[\\\\](-.,]'))[source]
- Detect the coordinates of the first occurrence
of text in ocr_data if any.
If the text is not found in ocr_data return None :type text:
str
:param text: :type ocr_data:list
:param ocr_data: List of dictionaries. Each dictionary contains information about a single word. Each word dictionary has the following keys: confidence, right,left, top, bottom, ocr_text, word_id_number
- Parameters:
case_sensitive (
bool
) – If True, the search will be case-sensitivesort (
bool
) – If True, ocr_data will be ordered by word_id_number key before searchingclean (
bool
) – If true, disregard all non-alphanumeric character from the searchcleanup_regex (re._pattern_type) – Optional. Provide the regex for cleanup to be used (has effect only if clean=True)
text (str)
ocr_data (list)
- Return type:
Optional
[dict
]- Returns:
Dictionary with word coordinates (keys: left, right, top, bottom, matched_words. matched_words includes the original word coordinate data for the matched words)