Basic web scraper built with Power Automate for Desktop

A functional web scraper with complete flow actions. Showing how to scrape websites, traverse links and download content.
How it works

This is a non-interactive web scraper, meaning that it does not use browser automation (Chrome, Edge, Firefox) for scraping. Instead, all web page requests are sent with the Download from web flow action.

Instructions
  • Line 1: Insert the url to be scraped.
  • Line 2: Set a folder where files should be downloaded to.
  • Line 3: (Optional) Change file extensions that should be downloaded.
  • Line 4: (Optional) Change the custom user agent header that is sent to websites.
  • Line 5: (Optional) Change MaxPageRequests (default is 30)
  • Line 6: (Optional) Change MaxFileRequests (default is 50)
Limits

The scaper has following limits:

  • Scraping only static html content (so no JavaScript content).
  • The target website must use absolute urls (could be improved to understand relative urls).
Notes
  • This scraper does not obey robots.txt. If you plan to use this on a website you don’t have ownership to, then you should comply with their robots.txt. Do not scrape Disallowed Pages and obey Crawl-Delay.
  • Do not perform malicious activities.
  • Wait 1 flow action have been added after each webpage request (the flow will wait for one second after each request). Many websites do run on shared hosts with limited resources and they deserve our understanding. This is the minimum wait value that should be used.
PAG Admin