It’s time to get more into the technical area of SEO (called Technical SEO) because usually, executive managers judge Search Engine Optimization efforts by ranking in Google Search results or via reports Google Search Console (GSC) or Google Analytics reports.
But especially in the developing stage, before publishing any website an SEO specialist performs several steps in order to ensure proper indexing for the new website.
Some of these steps are performed straight in the file directory where the website is hosted, some require to edit website files (which usually require development skills), while some are completed in the admin area of the CMS (see for example WordPress Admin area).
What is Technical SEO?
Technical SEO covers the list of activities that are performed outside content pages and is mostly related to the structure of a website.
Alongside On-page SEO and Off-page SEO, Technical SEO is focusing on getting a website to rank high in Google as fast as possible, without writing a single line of editorial content.
Here’s the list of items in Volume 3 of my SEO dictionary, focused on Technical SEO:
- Preferred Domain
- URL structure
- XML sitemap
- Meta Description
- Protocol (http, https, SSL)
- HTTP 1.1 protocol
- Page Status Codes
- Accelerated Mobile Pages (AMP)
- Canonical URLs
The preferred domain refers to the version of a website (that includes “www” before the domain name or not) of your domain that you want to be shown in the address bar of your browser and in search results.
As you see above, my preferred domain is https://stefanstroe.com/en/ (without the www) because I wanted to keep it shorter.
To Google, it doesn’t matter if you use a “www” or “non-www” domain (according to John Mueller, Webmaster Trends Analyst @Google).
In order to set your preferred domain, a redirection rule needs to be set on the server in order to avoid having published two versions of the website (with and w/o www) and any duplicated content problems red flags in Google. Additionally, you should set your preferred domain in Google Search Console.
URL structure is the policy set to structure the links on a website and it needs to be clearly set for both structuring the content database. A technical SEO specialist will always plan the URL structure before editing any content.
URL structure can include:
- Subdomains (not recommended for SEO) – format like https://SUBDOMAIN.website.com/
- Directories – format like https://stefanstroe.com/category/NAME-OF-CATEGORY/
- Subdirectories – see an example of my multilingual URL structure: https://stefanstroe.com/LANGUAGE/CATEGORY/
- Query strings – some websites add query strings parameter at the end of a page URL: https://shop.com/product-name?name=sweater&size=2
A Sitemap is a hierarchical, structured list of URLs of a website that is used by search engine crawlers in order to index its content. It’s a very important area of technical SEO.
Sitemaps can be customized to suit a website’s SEO needs. I am currently using WordPress and, in case you also use it, there are plenty of plugins to customize sitemaps, creating rules of including and excluding categories of items (pages, media items) or individual items from being indexed by Google. The most popular tools are Yoast SEO and RankMath which will help you also create automatic sitemaps.
That is why sitemap management is crucial for easing the discovery job of a search engine spider to perform a website inventory.
However, Google may discover content even if it’s not present in a site’s sitemap through internal links of a website (let’s not forget that crawlers check what’s beyond any working link, not only sitemap links).
As for the format, the most common sitemap format is XML, which Google understands best.
For example, my sitemap is here: https://stefanstroe.com/sitemap_index.xml
I have chosen what to be included in the sitemap based on the current content.
Robots are a set of rules uploaded in a website’s root directory that are mandatory for web crawlers, indicating which parts of the site they should crawl or not. It is a very important tool for both SEO and website security as it cannot be overlooked by malicious crawlers.
Keyphrases are combinations of keywords that are important to link user search intent (what people search in Google Search) with the theme of a website page.
When planning your content, you should always search correspondent keyphrases with SEO tools or in Google Keywords Planner in order to increase your chances of being useful to users and to rank higher in Google Search.
Here’s an example of performing this keyword search in Ahrefs:
Meta Description is a list of text strings that indicate essentials elements of a page (what the page is about, author, last time the page was updated). They are very useful also for search engine users.
Sometimes these text fields are used by Google not only for listing in search but also to create rich snippets in SERP.
Tags are elements in a website’s pages (visible or hidden to the user) that give search engine crawlers important hints on how to perform indexing or how to interpret certain important elements, like hyperlinks. Setting the right tags is critical for Technical SEO.
Here are the most important types of tags:
1. Title tag
The Title tag is an element written in HTML that marks in a page’s code which is its headline. It is a mandatory field for SEO.
2. Alt tag
The Alt tag is another important HTML attribute mandatory in SEO, which needs to be specified inside an IMG tag, helping the browser to display an “alternate text” in case an image can’t be shown.
3. Meta robots tags (or Meta Robots Directives)
Meta robots tags, called also Meta Robots Directives are chunks of code parameters that tell search engine spiders how to crawl elements of a page or the entire page itself.
How can we identify a meta robots tag in a page’s code?
<meta name=”robots” content=”[PARAMETER]”>
Unlike robots.txt instructions, meta robots tags can be overlooked by malicious web crawlers that intend to browse or copy your website’s content.
The most important parameters we usually set in a page are the following:
- Index: It’s a default parameter for any page, or link, which allows search engines to index the page. We do not have to set Index parameters.
- Follow: It’s a parameter that tells search engines to follow all the links on a page and add indexing importance to all of them.
- NoIndex: Tells search engines not to index that specific page.
- NoFollow: Tells search engines not to follow what’s behind a hyperlink in a website (starting 2020 it is a rule ignored by Google)
- NoImageIndex: As it says, it tells crawlers not to index images included on the page.
- Noarchive (“NoCache” in IExplorer & Firefox): Tells crawlers not to cache and hence not display in SERP that specific link.
- NoSnippet: Tells crawlers not to display rich results of that page in SERP.
- NoOpener: An parameter specific to WordPress, which is accompanied by a “NoReferrer” parameter in order to prevent malicious websites to exploit a specific security vulnerability.
Protocol (http, https, SSL)
The protocol is the prefix of a web domain name. It can be “HTTP” (Hypertext Transfer Protocol) or “HTTPS” (Hypertext Transfer Protocol Secure).
As you probably know, https protocol can be used only by installing an SSL (“Secure Sockets Layer”) certificate on your server, that manages to encrypt data between the end user’s device and your website’s server.
It’s a written rule in the CMS or on the server (for example in cPanel Redirection section) when a URL has changed its location. In Technical SEO, redirection rules need to be set from the early beginning, but also continuously monitored.
There are five types of redirections:
- 301:Permanent Redirect (most commonly used). It tells crawlers to delete the old URL and to index the new one. Google recommends maintaining a 301 redirect in place for at least one year.
- 302: Temporary redirect. This redirect does not ask the web crawler to index the new URL, as the destination is temporary. A 302 redirect is commonly marked as a “Found” redirect.
- 303: It is also a temporary redirect like 302, but it tells crawlers not to cache the new address.
- 307: Similar to 303 and 302, the 307 redirection is newer, clearer instruction for crawlers that content was temporarily moved to a new location. It was introduced with the HTTP 1.1 protocol.
- 308: Permanent redirect
You can learn more about 3xx redirects in deepcrawl.com article.
HTTP 1.1 protocol
It is a new type of web data protocol, which delivers web pages faster than the original HTTP version, managing at the same time to lower the amount of data traffic that is being transferred.
Page Status Codes
There are four classes of page status codes:
- 2xx status codes: A class of status codes that indicate the request for a page has succeeded.
- 3xx status codes: They are redirection status codes, I just detailed them above, in the Redirection section.
- 4xx status codes: It’s a result following the request to browse a page or an element in a page (like an image, a video) that could not be found.
- 5xx status codes: They are server errors (such as “Server timeout”), that indicate browser requests that cannot be performed and usually affect the entire website. Most of the times these errors can be overcome only by web hosting specialists.
Caching is a very important User Experience (UX) tool, as it makes a website to load faster by copying it in a static version on the server. A big advantage when using caching is that it lowers the processing capacity of users’ mobile browsers, making thus pages to load quicker on small devices.
In WordPress, you could use free or paid caching plugins (there are at least 5 good alternatives) but also a CDN solution.
Accelerated Mobile Pages (AMP)
Accelerated Mobile Pages is a light version of a website initially created by Google as a competitor to Facebook Instant Articles and Apple News. This open-source HTML framework developed by AMP Open Source is designed to make the user experience faster for mobile visitors.
I personally tried it on several projects but came across many issues with SEO indexing, especially in multilingual websites. For speeding up a website’s loading time I would choose instead a world-class caching solution like Cloudflare.
The Canonical URL is an HTML parameter that indicates search engine crawlers which is the original version of a web page and which are duplicates. This action prevents receiving a “Duplicated content” red-flagging in Google Search Console and allows the original page to be successfully indexed.