Find text from images using Google Vision AI and OCR

This walkthrough shows all the steps it takes to analyze text from images with Optical Character Recognition (OCR) using Power Automate Desktop.
How it works

In this template, a web page screenshot is used as the OCR image, but you could use any web image or a local image file.

Firefox is used to navigate to a website. Entire web page will be captured and saved to a temporary local file (.jpg). The image is sent to Google Vision API for OCR analysis and a JSON is returned. The JSON response contains text and x, y coordinates for all words and characters in the screenshot, for example:

        ...
        {
          "description": "PIZZA",
          "boundingPoly": {
            "vertices": [
              {
                "x": 673,
                "y": 395
              },
              {
                "x": 716,
                "y": 395
              },
              {
                "x": 716,
                "y": 421
              },
              {
                "x": 673,
                "y": 421
              }
            ]
          }
        },
        ...

How to enable Google Cloud Vision API

Follow these steps to activate Google Vision AI and to use OCR with Power Automate Desktop.

Instructions
  • Line 1: Replace https://unsplash.com/photos/jVcha8wHtg8 with any website URL.
  • Line 2: Replace C:\PowerAutomate\temporary.jpg with a temporary path and file name on your computer.
  • Line 4: Replace API_KEY with your Google Cloud Vision API key.
  • Line 11: Words and characters found by OCR are now stored into Word, X and Y variables.
Notes
  • Chrome and Edge web browsers seems to have issues with Take screenshot of web page action. The action may fail with following error message if the web page height is larger than the browser viewport:

Failed to capture image (error in communication with browser)

  • Google Vision AI includes two annotation features that support optical character recognition (OCR): TEXT_DETECTION and DOCUMENT_TEXT_DETECTION. The built in Google Vision action in Power Automate Desktop uses only TEXT_DETECTION.

  • Do not expose your API key to anyone. Set API key restrictions to prevent unauthorized use and quota theft.

Changes
  • 2022-01-01: Updated flow actions to work with Power Automate Desktop version 2.15.284.21354
PAG Admin