Server log analysis has become one of the most crucial components of technical SEO audits. By analyzing your server logs, you can gain a valuable insight on your SEO. You can obtain a better understanding of how search engine crawlers interact with your company website. If you are proficient and skilled using Microsoft Excel, you can follow the BuiltVisible guide to conduct your analysis. This route may be the most ideal method because you have to upload your collected data to Excel later on in order to output the data to a format that is simple to compare to other data sources like Google Analytics.

Open source tools – such as ELK-stack from Elastic – are useful if you do not have a big budget for analysis tools but you do have the technical resources to configure the data. If you do not have support, paid tools like Splunk and Loggly are probably your best choice. When you begin your analysis, make sure that you focus on actionable SEO items. Check off the seven tasks below to ensure that you analysis is complete thorough.

7 Elements to Check in a Server Log Analysis

Check for spambots or scrapers to block

You should check out the activity of search bots and potential spam-bots when conducting a first-time analysis. These can cause performance issues, pollute your analytics erase your content. Observe suspicious bots’ behaviors watch their activity time, the number of events they have in the selected time period if their appearance coincides with performance spam issues. You should block any that are can or are presently causing these issues with .htaccess.

Check that your targeted search engine bots are accessing your pages.

Once you identify bots arriving at your site, you can apply a filter to focus on those you want to analyze. Your goal here is to make sure they are accessing your pages and resources successfully. Once you have applied the filter, select a graph option to visualize the activity over time. This will help you ensure they are crawling the right pages that they coincide with searches you aim to rank with.

Check for pages with 3xx, 4xx 5xx HTTP status.

You can select the HTTP values of the pages you wish to analyze by searching for your desired search bot and then applying the ‘status’ filter to narrow your options. Look for those with 3xx, 4xx 5xx status codes to see pages that have been redirected or shown as ‘error’ to the crawlers. After that you can identify top pages generating most of the redirects of errors, export data prioritize these pages for your SEO recommendations.

Check that the top crawled pages from each search bot coincide with your site’s most important bots.

You can select the ‘requestURl’ filter to get a list of the top web pages or resources that bots are requesting from your search results. The interface will allow you review these directly, or export them to Excel for you to check that they coincide with your high-priority pages. If these are not among the top crawled pages, take appropriate actions to improve your ranking with internal links and a new sitemap.

Check that search bots are not crawling pages they should not be.

You should check for pages that are not supposed to be indexed or crawled. Use the same filter to see your top requested pages then check that these pages and the directories you have blocked withy robots.txt are actually being crawled for pages that are not blocked but should not be prioritized for crawling.

Check your Google crawl rate over time track how it correlates with response times and serving errors.

The data obtained through Google’s ‘Crawl Stats’ is very generic, therefore analyzing your own logs to identify Googlebot crawl rate, you can validate the information and make it actionable. Knowing when what type of HTTP requests occurred will show if errors or redirects were triggered that could generate non-effective crawling behavior on Googlebot’s part.

Check the IPs Googlebot is using to crawl your site that they are correctly assessing the relevant pages and resources in each case.

The last thing you want to do is ensure that your content is being crawled on foreign IP addresses. Often times, websites will block access to these sites or dictating which version they can use. Filter your content by IP so that you can verify the site is serving the right version of each page to the crawlers from other countries.