How it works
This is a non-interactive web scraper, meaning that it does not use browser automation (Chrome, Edge, Firefox) for scraping. Instead, all web page requests are sent with the Download from web flow action.
- Line 1: Insert the url to be scraped.
- Line 2: Set a folder where files should be downloaded to.
- Line 3: (Optional) Change file extensions that should be downloaded.
- Line 4: (Optional) Change the custom user agent header that is sent to websites.
- Line 5: (Optional) Change MaxPageRequests (default is
- Line 6: (Optional) Change MaxFileRequests (default is
The scaper has following limits:
- The target website must use absolute urls (could be improved to understand relative urls).
- This scraper does not obey robots.txt. If you plan to use this on a website you don’t have ownership to, then you should comply with their robots.txt. Do not scrape Disallowed Pages and obey Crawl-Delay.
- Do not perform malicious activities.
- Wait 1 flow action have been added after each webpage request (the flow will wait for one second after each request). Many websites do run on shared hosts with limited resources and they deserve our understanding. This is the minimum wait value that should be used.