Screaming Frog is a very powerful SEO Spider able to perform in-depth SEO OnSite analysis. In this guide, we will see some of the main features very useful during SEO analysis. The free version of Screaming Frog allows you to analyze up to 500 URLs.
Screaming Frog allows you to crawl a specific website, subdomain or directory.
In the paid version, the SEO Spider allows you to select the “Crawl all Subdomain” option if you have more than one subdomain. If you only need to crawl one subdomain, simply add the URL in the appropriate box.
The most commonly used features are monitoring status queues on a website (40x,50x,200 and 30x).
Screaming Frog by default crawls a directory by simply adding the address in the bar as presented in the image below.
If you need to perform an advanced crawling you can use the wildcard that tells the SEO Spider to crawl all pages that precede and/or follow the “Wildcard”. The path to use this feature is:
Spider > Include and add in the box that appears the desired syntax, for example with this syntax: https://www.bytekmarketing.com/about/.* the spider only crawls the sections of the website that are present in the “About Us” branch of the website, then all the resources that are after the “Jolly” character. Starting the crawl will extract all the “Daughters” URLs of the “About Us” section, for example: https://www.bytekmarketing.com/about/roberto-paolucci or https://www.bytekmarketing.com/about/mario-rossi.
This option is particularly useful with large websites where we do not have resources to work on very large data. Keep in mind that the crawling data will have to be (in most cases) processed in Excel, so the starting point will have to be a workable data in an easy way to “Search Vert”, work with filters and charts.
From the “Mode” tab you can select the crawling mode, in case you want to crawl a set of URLs the mode to set is “List” because you can import an Excel file with a column containing the list of URLs.
The other option to scan a list of URLs is “copy and paste”, then copy from an external source (Excel, CSV, TXT or HTML page) the list of URLs and click “Paste”.
N.B. It is necessary that each URL also contains the http or HTTPS protocol including the www, so the correct structure of each URL should be: http://www.test.it.
When you need to analyze a large website and it’s not enough to just crawl through HTML and images (in SEO perspective very often it’s good to also analyze the status queues of CSS and JS files to make sure that search engine spiders are able to correctly render pages) you can work on the settings:
1.Configuration > System > Memory and allocate more memory, for example 4GB
2.Set the storage to the database instead of RAM.
If even with these two configurations it is not possible to analyze a large website, the only settings that can be activated are:
1.Start crawling by website branches, one and more branches at a time:
2.Exclude from crawling: Images, CSS, JS and other non-HTML resources.
From an SEO perspective, it is essential to perform a single crawl because it allows you to have a complete view, for example the pair of URLs From and URL To in reference to 301, 404 or the monitoring of the distribution of internal links.
N.B. It may happen that Screaming Frog goes in time out or, in general, it can’t analyze resources (or it’s very slow) even on small websites; in this case the problem could be related to other factors, such as hosting performance or the fact that our IP address (from which we started Screaming Frog) has been blocked by the website owner (or by the dedicated IT resource).
Our IP address can be banned by a provider because the action of Screaming Frog is very similar to a hacker attack (e.g. DOS attack) aimed at running out of server resources and causing 50x errors.
After finishing crawling the website there are multiple export options:
The image below shows how to export schema.org structured data.
Screaming Frog allows you to export a configuration file that can be reused for future projects/customers. It is particularly useful if you perform SEO analysis for similar clients (similar website structure) and have configured advanced filters or special extraction options (filters, exclude/include or wildcard).
The configuration file is also useful if custom scripts have been programmed, for example in Python or from the command line to automate purely mechanical operations. For example, if we need to perform a series of purely technical SEO Audits and the output requires the same data, it would make no sense, for each website, to re-configure Screaming Frog.
Screaming Frog is “Robots.txt Compliant” so it is able to perfectly follow the guidelines indicated in robots.txt exactly like Google Search. Through the configuration options it is possible:
By default, Screaming Frog does not accept cookies, as do search engine spiders. This option is often underestimated or ignored but in fact, for some websites, it is of fundamental importance because by accepting cookies you can unlock features and add code that can give extremely useful SEO and performance information.
One of the best methods to create a sitemap is to use an SEO Tool like Screaming Frog, also the use of WordPress plugins like SEO Yoast are fine, but there may be update and non-compatibility problems, for example, it may happen that the URLs in the sitemap return status code 404.
It is recommended to generate a sitemap that contains only canonical URLs with status code 200. For large websites, it is recommended to create a sitemap for each type of content (PDF, images and HTML pages) and a sitemap for each branch of the information architecture.
Having specific sitemaps allows the search engine to better analyze URLs and file types and allows you to have full control and easily make a comparison between URLs in Google Search index (site operator:) and individual sitemaps.
Please note that the limit of URLs to add in a sitemap is 49,999. For details on standards see: https://www.sitemaps.org/protocol.html-
To generate a Screaming Frog sitemap follow the steps below:
Sitemaps (top bar) > XML Sitemap or Images Sitemap
Among the Screaming Frog options you can decide which pages to include based on:
With regard to the structure of the website with a particular focus on the information architecture, the “Visualisations” section is useful as it allows to have a graphic vision of the website structure, in diagrams or graphs.
During an internal linking analysis, this section is fundamental but it is recommended to integrate it with mind-map programs, such as XMind and with standard tools:
The configuration options of the SEO Spider are collected and organized in tabs, in this paragraph, we will examine the macro tabs without going into detail on all the individual options.
This tab is particularly useful for analyzing very large websites but not only. From this section you can:
In the main menu at the top of the tool there are a series of buttons (tabs) which open sections, see them in detail.
The internal tab combines all the data extracted during crawling and added in the other tabs (excluding external, hreflang and custom tabs). The usefulness of this tab lies in having an overview and the possibility to export and work the data externally, for example in Excel, with Data Studio or mind-map tools.
This tab shows information related to URLs outside the domain.
From this section you can see information related to HTTP and HTTPS protocols of both external and internal URLs. This tab is useful to verify, for example, the correct migration to HTTPS.
This tab provides information on response queues, both internal and external.
This tab provides information related to page titles, in particular for:
Provides meta description information, length (min and max in SEO optics), if duplicate or absent.
It provides information about the H1 tag heading, for example if it is equal to the title, because very often (especially in E-commerce) products have the H1 equal to the title. This criticality can be solved, for example, by concatenating the product variant to the current H1 and having an original tag.
Information on the length and originality of H2 tags.
The data provided in this tab are related both to the weight of the image and to the number of internal links it receives both to the Indexability Status. It is remembered that an image in SEO optics must be considered as an HTML page because, if well optimized, it is able to carry organic traffic, for example through searches by image.
This tab shows the list of cononic resources.
It provides information about paging and paged resources, particularly the use of Rel Next and Rel Prev tags.
This tab provides information on using the Hreflang tag for the correct setting of a multi-language or multi-language and multi-country website.
SEO audits for multi-language websites require effort aside from the complexity and analysis to be performed on multiple markets.
The Custom tab allows you to control the URLs obtained through the use of custom filters and extractions.
Analytics and Search Console
Through this tab, you can integrate your Google Analytics and Google Search Console accounts.
This is a basic guide to using the SEO Spider to understand its potential and areas of use. To date, Screaming Frog is one of the best tools to conduct technical SEO analysis. It is certainly very useful to integrate this guide with real case studies applied to clients during our SEO Audits in order to make it more enjoyable to follow.