A couple weeks into my blog, I did what every new blogger does, I google-stalked myself.
Alas my posts were so deeply buried in the Internet jungle that I could only find them by doing the ego-search: “How to Terminate a rat” site:lorenz.co.nz. And even then I got a link to a 404 page-not-found error.
I soon found out that my ranklessness was mostly caused by how I had configured wordpress. The default wordpress installation is particularly unfriendly to search engines. Fortunately just a few modifications were required to significantly improve my search engine results.
Below are some guidelines for improving page rank and search engine optimisation (SEO).
1. Enable permalinks
In wordpress, the default URL for a post is of the form http://lorenz.co.nz/?p=123. This is not particularly informative. Search engines put weight on URLs containing key words.
Under the wordpress administration panel go to Options->Permalinks and select one of the pre-configured structures. I use the date and name-based option, e.g.: http://lorenz.co.nz/2008/03/09/ruapehu-crater-climb
It’s worthwhile noting that search engines don’t put as much weight on pages that appear several directories deep. However, I nonetheless stuck with a name/date-based permalink as it follows the same directory structure I would use on my computer.
Note 1: If you decide to use a custom permalink, it is not possible to just use:
/%postname%, as this causes a configuration conflict.
Note 2: Changing the permalink structure from something other than the default, will also require installing a search engine migration plugin, as otherwise search engines will continue to link to the old permalinks and return a 404, thereby hurting your rank.
2. Get linked out – have your site linked to from other sites.
Google’s search engine uses their trademarked ‘PageRank’ algorithm. Google describes this as:
PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important”.
Fundamentally what this means is the more external links there are to your site, the higher the weighting the search engine will give your content.
The easiest way to get linked out, is to get your site added to your friends’ blogrolls. Bonus points are given if you can get your site linked to on a high traffic site.
3. Eliminate duplicated content
Search engines do not like duplicated content on a website. Unfortunately blogs are particularly amenable to this.
For example, currently http://lorenz.co.nz/ contains the post: ‘Caving Excitement at the Bay with the Dragon Boat Rock‘
This post can also be found under all archives, categories and any trackbacks, for example:
Such duplication causes the content to be penalized by search engines.
It also means that if someone is searching for ‘caving excitement at the bay with the dragon boat rock’, they are likely to be taken to any one of the duplicated pages which may not even contain the post anymore, further hurting your search-result rank.
Reducing duplicated content can be done a couple of ways
- Reduce the number of posts on your front page to around 5. Goto WordPress->Options->Reading and modify the ‘Show at most x posts’ value.
- Configure the robots.txt file.
Robots.txt is a URL exclusion protocol for search engines. It contains standard regular expressions and tells web-crawlers which files/directories to exclude from its search. My robots.txt looks something like:
Sitemap: http://lorenz.co.nz/sitemap.xml User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-content/ Disallow: /wp-includes/ Disallow: /category Disallow: */feed Disallow: */atom Disallow: */rss2 Disallow: */rss Disallow: */trackback Disallow: /*/feed Disallow: /*/atom Disallow: /*/rss2 Disallow: /*/rss Disallow: /*/trackback Disallow: /*?* Disallow: /?* Disallow: /*p= Disallow: /*s= Disallow: /*.gif$ Disallow: /*.jpg$ Disallow: /*.png$ Disallow: /*.swf$ Disallow: /200 Disallow: /*category Disallow: /page Allow: /200*/*/*/* Allow: /old-site/ Allow: /wp-content/plugins/falbum/wp/album.php?tags=
The ‘Sitemap’ specifies the location of the sitemap.xml file, and is described in the next section.
The ‘User-agent: *’ means that this applies to all web-crawlers. However, robots.txt can also be configured on a per-search engine basis
The Disallow values specify which paths not to search. In this case search engines will not trawl any of the wordpress admin pages, default posts, date archives, category, or ‘next’ pages.
I specifically only allow web-crawlers to read pages that match /200*/*/*/* which over-rides the catch-all ‘Disallow: /200*’ and thus will only match
http://lorenz.co.nz/2008/03/21/caving-excitement-at-the-bay-with-the-dragon-boat-rock but not any of the other duplicated links mentioned earlier.
Once configured the robots.txt file needs to be given read permissions and placed in the top-level (public_html) directory for your site. To confirm the robots.txt is working correctly, check the web-site traffic log files to ensure that search-engines are detecting it.
4. Configure a sitemap file
A sitemap file follows the Sitemap protocol. It is a URL inclusion protocol, telling search-engines which pages to trawl, and complements the robots.txt file.
Including a sitemap.xml file helps search-engines find the relevant pages.
The sitemap.xml file is an xml file, detailing all pages on your site. It uses five different attributes, which specify the page URL, last modified date, priority, and change frequency. For example it might look like:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://lorenz.co.nz/2008/03/21/caving-excitement-at-the-bay-with-the-dragon-boat-rock/</loc> <lastmod>2008-03-22T09:42:12+00:00</lastmod> <changefreq>daily</changefreq> <priority>0.2</priority> </url> <url> <loc>http://lorenz.co.nz/2008/03/19/new-world-self-service-check-outs/</loc> <lastmod>2008-03-19T08:13:02+00:00</lastmod> <changefreq>daily</changefreq> <priority>0.2</priority> </url> <url> <loc>http://lorenz.co.nz/2008/03/18/convenience-fees-and-the-new-internet-tax/</loc> <lastmod>2008-03-19T07:21:08+00:00</lastmod> <changefreq>daily</changefreq> <priority>0.2</priority> </url> </urlset>
Installing a sitemap.xml file is a two-part process, and can be configured to be automatically updated with each new blog post.
- Install a sitemap-generator plugin.
Configure it under WordPress->Options->XML-sitemap, and generate and view the sitemap.xml file.
- Tell google about the sitemap.xml file
- Create a google/gmail account
- Go to Web Master Tools and add/verify your site
- Go to Web Master Tools -> Sitemap -> Add sitemap, and provide the URL to your sitemap
- Repeat for other search engines
5. Set up meta-data
Adding key words to the meta data associated with each post will also bump up your search result rank.
There are many wordpress plugins that do this.
- Head Meta Description – This extracts the first few words from your posts and automatically adds it to the post’s meta-data. It assumes, that you don’t ramble, and make your point in the first paragraph. Possibly not as useful for my posts.
- SEO Title Tag – Search engines give considerable weighting to page title tags. This plugin allows you to configure and over-ride any page’s title tag via a new custom field.
So in summary for best search engine optimisation (SEO), enable permalinks, eliminate duplicated content with a robots.txt file, install a couple of plugins to add meta-data to each post, and create a sitemap file. You’ll reach the elusive google front page in no time.