The technical SEO audit checklist: How to analyze a website?
For many online marketers SEO means building authority links, and that's it. But what if you can’t rank? Your site can’t be crawled or indexed by Google or your pages are loading so slow that your users leave before hitting your first landing page? The first step for an SEO project should therefore be a technical SEO audit.
Not everyone gets SEO right first time. So if your online performance is not quite as you wanted, don’t worry. Most of us have been there at some point. But what should you do to correct the problem?
To start with, SEO is not black magic. There’s nothing in there you cannot do yourself if you spend the time to really learn it. Because SEO is all about consistent work throughout the lifetime of a site. It’s not waving a magic wand and POOF, you’re ranked #1.
As it is ongoing work, there is a basic flow that drives how SEO works.
And first comes Analysis.
Why we do a technical SEO audit
Every time we start a new SEO project, we start with a technical audit.
Many people simply associate SEO with building backlinks. And while backlinks are one of the most important parts of SEO, helping to build your authority, there are better places to start.
A site that cannot be crawled by Google, or one that loads so slowly your users click away, is no good. It doesn’t matter how many backlinks there are if the site is broken from a technical point of view. And that’s before we get to whether your content answers your audience’s questions or is even in the right place to answer them.
Why do we do so when everybody says that SEO is about how many backlinks you can build earn? Sure, backlinks are one of the most important signals for search engines about your website. You need to have links from authority sites to rank well.
But the number and quality of backlinks you have doesn’t really matter if no-one can see your page.
Your content also needs to be in the right place. It has to answer your users questions while entertaining them enough to keep their interest.
And it all takes time to have results. We, SEOs struggle many times to explain clients why we have to work a year before they could see results. John Doherty had some coins on it:
I’ve always struggled to pitch the longterm value of SEO and that it takes time for many reasons:
- You need to allow time to do the audit
- You need to allow time to get the work implemented
- You need to allow time for Google to recrawl everything
- Building links takes a lot of time if you’re doing it scalably as you build the rest of your business
If you are a subject matter expert just like me, here’s a great guide on how to write the perfect piece of content.
In this guide, I’ll show you how to analyze a website from a technical perspective. For the framework below, I used WebSite Auditor’s structure as the basis.
SEO can be split into three:
- Technical SEO (crawlability and indexability)
- On-page SEO (content optimization)
- Off-page SEO (authority building)
How to analyze a website from a technical perspective?
As a starting point, let’s start with a non-technical element. Simply open your browser in guest mode and search for your brand or business in Google. Or even better, use Google NCR (No Country Redirect) by searching from google.com/ncr.
Did you see what you wanted? Is your brand on top of the list? Do you have more than one page listed there?
(If your brand name is general, don’t worry if you are not ranking in first position, or even on the first page. In time you will get there.)
If your site ranks as expected, good. Although if it doesn’t, it might be because of technical issues, indexation issues, or even because of a penalty.
If you search for our brand, Intellyo, you will get the main page as the first result and connected services as following. (You won’t see the blog in top results because 1. it’s new, 2. it runs on a subdomain.)
Even if you are satisfied with the results that you have seen in the SERP (Search Engine Ranking Pages), it’s worth going through this guide and checking all the elements. While writing this article, we discovered a couple of minor possibilities we will fix on our blog.
Indexing and crawlability
I’m not going to dig into how Google Search works and why indexing and crawlability are important. Matt Cutts, former head of the webspam team at Google has a great video about the topic.
So you get it now for sure. If you want to crawl a website’s URLs by yourself, Screaming Frog is probably the best tool out there. It also helps to fetch onsite elements to analyze on-page SEO.
But how do you know if your site or a given page has or hasn’t been crawled and indexed? There’s an easy, special search for that. For the whole site type into Google:
site:intellyo.com. (Obviously, change intellyo.com to your site’s URL.)
You will see all the pages connected to your site that are indexed by Google. Even if it is a simple page or a subdomain. All of them.
If you want to use a more specialised search, for example to check if your newest blog article is available in Google, you can use the next format:
site:blog.intellyo.com how to analyze a website. (Change your URL and topic to the one relevant for you)
Now you will only see those articles connected to your search query. This tiny trick can help you to save a lot of time using Google Search.
Resources with 4xx and 5xx status code
You read an article which links to another that attracted your interest. You click. And the link is broken. Really annoying, right?
If you are the site owner you will only see that your users left at some point, seemingly for no reason. That can easily be because of a broken link. It’s even worse when it points to one of your own pieces of content.
So whenever you bump into an issue like the following, get your head around fixing it as soon as possible.
Resources with a 4xx status code refer to client errors. The most well known is 404 - not found. The user was able to communicate with a given server, but the server could not find what was requested.
Resources with a 5xx status code refer to server errors. The server is aware that it has encountered an error or is otherwise incapable of performing the request.
404 page set up correctly
A 404 page has two main roles:
- user: gives information that the URL is not found,
- site owner: it can help you keep users on your site.
It’s really important to let your users know what happened and why they couldn’t reach the content they originally wanted to. 404 error pages should return 404 response code. Sounds obvious, right? I’ll only leave this quote from Search Console Help:
It's like a giraffe wearing a name tag that says "dog." Just because it says it's a dog, doesn't mean it's actually a dog. Similarly, just because a page says 404, doesn't mean it's returning a 404.
It’s important to give your users the information they need when they hit a 404 page:
- The information that the page they are looking for doesn’t exist,
- a navigation bar (or at least a button leading to your home or category page),
- an HTML sitemap (not an XML one! While XML sitemaps are for search engine spiders, HTML sitemaps are for your users),
- a search field.
We haven’t added the HTML sitemap or search field for now, but will as the blog grows. Before designing a 404 page that helps you to retain your users, it’s a good idea to get some inspiration.
Robots.txt is a file automatically crawled by robots when they arrive at your website. This file contains commands for these robots, such as which pages should or should not be indexed. It must be formatted based on rules to ensure search engines can crawl and read it.
You can reach every site’s robots.txt by adding /robots.txt to their URL. For example here’s the Intellyo blog’s: https://blog.intellyo.com/robots.txt.
Here you can hide pages you don’t want users to find by using search engines. Typical hidden pages are the admin logins or thank you pages (for newsletter subscription).
Also, here you can link to your sitemap so crawlers find it easier. You can read more on robots.txt in Google’s official guide.
Robots.txt can be dangerous for your website if you modify it for some reason and you forget it. Earlier I had a client who disallowed to index their blog’s URL. It was under construction, it made sense. But after 3 months of work for 2 employees they still couldn’t rank in Google. That was when they asked for help. By simply deleting that one row from their robots.txt their number of sessions grew by more than 10% in a week.
You can restrict indexing in several ways:
- disallow indexing in your robots.txt,
- by Noindex X-Robots tag,
- by Noindex Meta tag.
If you feel like it’s all a bit overwhelming, we can help you with your site’s analysis.
A good XML sitemap contains all the pages you want to get indexed. Period. So it’s a list of pages accessible to crawlers. You need to update it every time new pages are added to your website. You can do it manually or if you are using WordPress or any other big CMS, you can use an XML sitemap generator plugin.
You can reach sitemaps by adding /sitemap.xml to the URL. Intellyo blog’s sitemap: https://blog.intellyo.com/sitemap.xml.
As you can see, you can add all your site’s pages to the sitemap and also set a frequency for when you would like crawlers to come to your site, the priority of your pages and when they were modified last. For more information follow this sitemap format guide.
You can test and submit Google your sitemap in Google Search Console.
(Yes, we are aware of the errors, it’s on our bug report list.)
With redirects you can send your users (and also crawlers) to a different URL. It’s useful when you move your content to another URL. By redirects we are mainly talking about 301 and 302 (or 307) redirects and meta refresh.
www and non-www versions fixation
Websites with www are so old school, right? Still, many users use it while trying to reach any sites. So it’s recommended to have a www and non-www version of your site at the same time. Yet that means you are duplicating your site, so that’s something you need to take care of.
Your developers can set this redirect pretty easily in your site’s .htaccess file (if you are running on Apache server) or in any type of configuration file.
HTTP and HTTPS protocol site versions
Why HTTPS matters?
- It protects the privacy and security of your users.
- HTTPS protects integrity of your website.
- In a test of HTTP vs HTTPS.com, the unsecure version of the page loads 334% slower than HTTPS.
HTTPS is the future of the web. Google even started to use it as a ranking factor.
Although while installing SSL certificates and setting up HTTP/HTTPS versions, webmasters are facing technical issues. If your site’s certification is untrusted or expired, browsers can even prevent users from reaching it.
Additionally, if HTTP/HTTPS versions are not setup correctly, your site will be indexed by Google twice which will mean duplicate content. Duplicate content means a penalty sooner or later.
So it’s really a high priority to set it up correctly with 301 redirects.
301 and 302 redirects
301 Moved permanently
302 Moved temporarily / Found
301 redirects are mainly used when you have duplicate content or your URL is changed and you still would like your users to reach the page. So when you move your content permanently.
302 redirect means that the move is only temporary. According to WebSite Auditor if you use 302s instead of 301s, search engines might continue to index the old URL, and disregard the new one as a duplicate, or they might divide the link popularity between the two versions, thus hurting search rankings.
I heard many times that you have to use 301s to keep 80% of your link juice, otherwise (with 302 redirects) you will lose all of them. Although according to John Mueller both redirects pass PageRank.
Long redirect chains
WebSite Auditor has a great summary about this SEO factor:
“In certain cases, either due to bad .htaccess file setup or due to some deliberately taken measures, a page may end up with having two or more redirects. It is strongly recommended to avoid such redirect chains longer than 2 redirects since they may be the reason of multiple issues:
- There is a high risk that a page will not be indexed as Google bots do not follow more than 5 redirects.
- Too many redirects will slow down your page speed. Every new redirect may add up to several seconds to the page load time.
- High bounce rate: users are not willing to stay on a page that takes more than 3 seconds to load.”
Pages with meta refresh
As a starting point, let’s go with this:
Do not use meta refresh!
What is meta refresh and why not to use it? MOZ’s redirection guide has a great answer for those questions:
Meta refreshes are a type of redirect executed on the page level rather than the server level. They are usually slower, and not a recommended SEO technique. They are most commonly associated with a five-second countdown with the text "If you are not redirected in five seconds, click here." Meta refreshes do pass some link juice, but are not recommended as an SEO tactic due to poor usability and the loss of link juice passed.
So instead, keep using the permanent 301 redirects.
As Yoast describes, the rel=canonical element, often called the “canonical link” or “canonical tag”, is an HTML element that helps webmasters prevent duplicate content issues. It does this by specifying the “canonical URL”, the “preferred” version of a web page.
So if you have duplicate content (you have the same content twice or more on you site, or your content is the same as it is on another page) and would like to avoid Google penalties, make sure that you use canonical tags to show the crawlers which one is the original content.
Also, be aware of multiple canonical URLs. If you use any SEO plugin, most of them automatically add a canonical tag to the URL. So if you would like to add a canonical tag, you should rewrite this tag and not add a new one.
Encoding and technical factors
HTTPS pages with mixed content issues
Mixed content occurs when initial HTML is loaded over a secure HTTPS connection, but other resources (such as images, videos, stylesheets, scripts) are loaded over an insecure HTTP connection. This is called mixed content because both HTTP and HTTPS content is being loaded to display the same page, and the initial request was secure over HTTPS. Modern browsers display warnings about this type of content to indicate to the user that this page contains insecure resources.
HTTPS is extremely important to protect privacy and security of your users as I mentioned above. HTTPS pages with mixed content degrades this security. To prevent that, browsers may even block that content. So it causes issues with user experience, too.
You have to make sure you don’t have mixed content on your site.
Mobile first. 2016 was a turning point in Google and SEOs life when Google announced mobile-first indexing. It means they use mobile content for all search rankings.
Side-note: they added that, if you don’t have a mobile-friendly page, don’t worry, the desktop version will be crawled.
Basically it doesn’t matter what mobile-friendly solution you use (responsive design, dynamic serving, or fully separated sites), the main point here that your site should support mobile users.
Back in time (until 2016) there was also a mobile friendly tag in Google for the results that supported it. Unfortunately it has since been dropped.
If you are not sure whether your site is optimized well for mobile, you can check it by using Google’s PageSpeed Insights tool. We have more on the topic in our article about competitor website analysis tools.
Frames have been removed from web standards. Though some browsers may still support it, it is in the process of being dropped.
<frame> is an HTML element which defines a particular area in which another HTML document can be displayed. A frame should be used within a <frameset>.
Using the <frame> element is not encouraged because of certain disadvantages such as performance problems and lack of accessibility for users with screen readers. Instead of the <frame> element, <iframe> may be preferred. So be sure you don’t have any <frame> elements on your site.
(Primary source of Frames part of the article was Mozilla Developer Network)
W3C HTML and CSS errors
W3C (World Wide Web Consortium is the main international standards organization for the World Wide Web. Sites have to be aligned to W3C’s standards.
Crawlers find it easier to crawl through semantically correct markup. This is why site's HTML markup should be valid and free of errors. Otherwise spiders may miss elements of your site, thus reducing the value of the page. To find the errors with your HTML, you can use W3C’s validator.
While HTML errors can cause serious issues for search spiders, W3C CSS errors can result in issues with the displayed versions of your pages. Basically, your users won’t see what you want them to. It can easily lead to a decrease in user experience. To find the errors with your CSS, you can use W3C’s validator.
Accessibility is difficult. We’re empowering all projects, just as The A11Y Project to help blind visually impaired people to live on a full living standard. Even in the online environment. They have checklists that help your site to be as user friendly as possible.
You can run accessibility auditors on your site to know how visually impaired-friendly it is.
Too large pages
Well, this is a topic where I can’t add a hard number to it as a KPI. Basically the larger your page is, the slower it loads. And as mentioned, the slower your page loads, the more users abandon.
So you have to find the balance. As a thumb of rule I used to recommend to keep your page below 1-2MBs, but definitely below 3 MBs. However, that’s not always feasible.
Try to keep the number of requests your page asks while loading below 50-60, too. If you would like to know how to find out how many requests you ask or how big your page is, follow this article.
A lot depends on your URL policy.
Often forgotten, but important nonetheless, what is your URL policy? Do you use subdomains? Slugs? How can your URL help your CTR from search engines? Dynamic vs static URLs? User-friendly (readable) URLs? Your URL policy can have a big effect on your SEO.
If the content of a site is stored in a database and pulled for display on demand, dynamic URLs may be used. In that case the site serves basically as a template for the content. (Read more on dynamic URLs.)
If you don’t know which (dynamic or static) URL type is for you, Rand Fishkin has a great summary.
Google Webmaster Guidelines says, "URLs should be clean coded for best practice, and not contain dynamic characters."
Also, I recommend to use readable URLs as they help your CTR from search engines.
The worst practice I’ve ever seen when a company (a webshop) used dynamic URLs where the slugs of the URLs were changing daily. The company was also paying a marketing company to build backlinks and get referral traffic for them. And obviously, every link was outdated after 1 day so it pointed to a URL that didn’t exist. When I figured out why they had so much traffic on their 404 page according to Google Analytics, I was shocked.
Put simply, make sure your URLs are easy to read for users and search engines.
Additionally, it’s better to avoid long URLs. Keep it under 115 characters and it will be readable by users and search engines.
This is a big topic. How many links and where they go, anchor texts, nofollow or dofollow tags, etc. For now I’m going to concentrate on the real technical elements.
Pages with excessive numbers of links
Too many links and only a few links on a page can be a red flag. Normally I recommend to have more than 5 links on a page but keep it under 100. Over 100 links it can easily be overwhelming for your users.
Dofollow external links
Normally there’s no issue with dofollow links. There’s only one case when it can be an issue: if you link extensively to irrelevant or low-quality sites. Search engines may conclude your site sells links or participates in other link schemes. So you can get penalized because of it.
Mostly it’s avoidable by only pointing to quality sites and contents or using nofollow (you simply have to add rel=”nofollow” attribute to your links) tags.
Broken links, even if it’s internal or external, are a bad sign for search engines. It means for them that your site is not up-to-date. It may cause a “penalty”, a downgrade in your rankings. If you’ve missed 1-2 broken links, or even 10-12, don’t worry, though. In reality, there needs to be many broken links for a penalty, just fix them whenever you find them.
You should regularly monitor your links. If it’s broken, resolve it by changing the link, deleting it, or any other way.
Images as visual elements are important parts of your pages. From a technical perspective they have 3 main roles:
- be able to load,
- load as fast as possible,
- have an alt text for those who can’t see the image.
Broken or slow loading images have no direct effect on SEO but reduce the user experience. And the worse the user experience, the higher the bounce rate, and that is a bad signal for search engines. To avoid these issues, you can check two factors:
- broken images and
- empty alt texts.
Both can be checked easily and the results from fixing them will be great. So you should minimize the number of broken images. If you do have a broken image, or a server can’t provide the picture for a user, you should have the alt text filled. Alt texts are especially important for those who are visually impaired.
Many SEOs add on-page elements to technical SEO audits. I prefer to have an eye on it while doing an on-page (content) audit. Even though these are the most common items it worth to check if you would add it to your technical audit checklist:
- empty title tags,
- duplicate titles,
- too long titles (preferably below 70 characters),
- empty meta descriptions,
- duplicate meta descriptions,
- too long meta descriptions (preferably below 165 characters).
I’m not going to go deep into this topic. There’s only one tool which can’t be missed: Google Search Console.
Google Search Console (GSC)
GSC, or Google Webmaster Tools (WMT) is great for all SEOs. One of the most important features is to see which keywords your visitors searched for before clicking through to your site.You can also see which keywords you rank for, how many impressions you got, how your click through rate was and in which position you rank for those keywords.
GSC is much more than a keyword ranking and traffic tool, though. You get suggestions to improve your site’s search appearance, and you can see your incoming links and internal linking. You can also adjust your indexation (block and remove resources, as well as check your index status) and set everything in connection with crawling, such as crawl stats and errors. Lastly, you can request fetching and rendering, and test your robots.txt and sitemap.
You can reach GSC here, while if you would like to read more about it, you can do so here. To add and verify a site to your GSC account is not difficult. Keep in mind that you can reach your ranking and traffic data for a period of 90 days, so it’s worth exporting this data every 3 months so you can keep hold of everything. Additionally, your data is not in real time, there is a 3-day-long delay.
In connection with GSC I used to check 2 main elements during a technical SEO audit:
- Is Search Console implemented well? (Do you get your search data?)
- Is Search Console connected to Google Analytics (GA)?
You can easily connect GSC to your GA from GA by following these steps:
- Sign in to your Analytics account.
- Click Admin, and navigate to the property in which you want to enable Search Console data sharing.
- In the PROPERTY column, click Property Settings.
- Scroll down to Search Console Settings (Adjust Search Console). You should see the URL of your website, which confirms that the website is verified in Search Console and that you have permission to make changes. If you do not see the URL, you need to add your site to Search Console.
- Under Search Console, select the reporting view(s) in which you want to see Search Console data.
- Click Save.
After you save these settings, you can see the results in Acquisition / Search Console menu.
Page load time
As mentioned before, page load time is not a direct ranking factor. Although the slower your site loads, the more users abandon.
Kissmerics’ infographic shows that an average user has no patience for a page that takes too long to load:
The more users abandon (the lower successful CTR your site has), the worse your ranking.
Back in time when Marissa Mayer was a VP of Google, they had an interesting experiment about showing 30 results (against 10) in a SERP. It was pretty catchy for users in theory, but the results were shocking.
“Pages that displayed 30 results each had traffic to them drop an astounding 20%. Google tested the loading difference between the 10 and 30 results pages and found that it was just half of a second.” As Kissmetrics said: Speed is a killer.
OK, I’ve learned how to analyze a website… What’s next?
For page load time and tips on how to improve your site’s code, I recommend GTMetrix and PageSpeed Insights. We are planning to write an article about page load time optimization. If you are interested, subscribe to our newsletter and be among the first to be notified.
Even if your site is perfect according to any tool, don’t forget that the most important factor for any SEO and any online marketer or developer is you have to make your users happy. Users are happier if your site loads fast, you don’t have broken links, and so on. And this is your main goal?
Here is another great source on Page Speed optimization, check it out!
Would you add any element to this list? Do you have additional thoughts? Let me know in the comments!