How it works
In this template, a web page screenshot is used as the OCR image, but you could use any web image or a local image file.
Firefox is used to navigate to a website. Entire web page will be captured and saved to a temporary local file (.jpg). The image is sent to Google Vision API for OCR analysis and a JSON is returned. The JSON response contains text and x, y coordinates for all words and characters in the screenshot, for example:
How to enable Google Cloud Vision API
Follow these steps to activate Google Vision AI and to use OCR with Power Automate Desktop.
- Line 1: Replace
https://unsplash.com/photos/jVcha8wHtg8 with any website URL.
- Line 2: Replace
C:\PowerAutomate\temporary.jpg with a temporary path and file name on your computer.
- Line 4: Replace
API_KEY with your Google Cloud Vision API key.
- Line 11: Words and characters found by OCR are now stored into
- Chrome and Edge web browsers seems to have issues with Take screenshot of web page action. The action may fail with following error message if the web page height is larger than the browser viewport:
Failed to capture image (error in communication with browser)
Google Vision AI includes two annotation features that support optical character recognition (OCR):
DOCUMENT_TEXT_DETECTION. The built in Google Vision action in Power Automate Desktop uses only
Do not expose your API key to anyone. Set API key restrictions to prevent unauthorized use and quota theft.
- 2022-01-01: Updated flow actions to work with Power Automate Desktop version 2.15.284.21354