You must have seen annoying error pages when you click on a website. It may have indicated that the link is invalid or doesn’t exist anymore. Usually site owners don’t delete any pages from their website because those pages are indexed in Google or other search engines but due to some technical fault those pages are showing invalid errors. This is where regular site maintenance is very important to ensure that links are valid and working properly irrespective of internal or external links.
Normally, web applications contain a vast number of links. These links may go to a resource within the site (internal links) or outside of the current application (external links). In addition, other sites may link to a site. First, I concentrate on links within a site and how these may be located and resolved.
Finding broken links
Its virtually impossible to test each of the site links manually because the number of pages in your site keeps growing making it very difficult to browse through all the pages. Its possible in case of very small websites though. Thankfully, there are a variety of tools available to automate this process, allowing you to concentrate on fixing the problem links. Basically, these tools crawl a site and verify all links found. Options are often included to define: what should be checked, links to ignore, and more. The following list provides a selection of these tools:
- Xenu Link Sleuth: This is a preferred tool. It is fast and free. It provides great output via a detailed report of problems encountered. In addition to checking links, it verifies a variety of linked resources including images, frames, plug-ins, backgrounds, local image maps, style sheets, scripts, and Java applets.
- LinkAlarm : This commercial service allows you to check the validity of all links within a site or page. It provides a very detailed report that is color-coded to highlight problems, as well as graphs (and everybody loves a good graph).
- W3C Link Checker: This online web application allows you to validate links within a web application. It dives into a web resource and provides information on invalid links or error messages.
Once a link is identified as a problem, you must decide how to address it.
Fixing broken links
The error message returned when trying to access a web resource can reveal a lot about what may be wrong. The following list provides information about error codes that may be returned when attempting to access it via a link:
- 301: This error says the target resource was permanently moved.
- 302: This error says the target resource was temporarily moved.
- 401: This signals an authorization error while trying to access a resource — meaning the resource may require a log-on for access.
- 404: The resource no longer exists, as this error signals the target resource was not found.
- 408: The request to access the resource timed out.
- 500: The most common error that is a generic catch-all. It signals there was a problem with the target resource. The platform for the target resource may provide more information.
- 904: This error signals a bad host name in the link.
A tool like Xenu Link Sleuth provides the error code returned for a broken link. A time-out error may mean the link is valid but busy when tested — you can retest manually, but the rest of the errors signal the link should be removed or replaced.
When dealing with internal links in an application, you may examine the target resource to identify what problems may exist within the page source code. An error code of 500 with an internal page usually signals a code error, so the error may be resolved with a code fix. The link will be fixed if the target page is fixed, but you will want to disable or remove the link until the problems with the target resource are addressed.
Unfortunately, there is not much you can do when dealing with external links on sites with which you have no control. In these instances, you will need to remove or replace the link to avoid user problems.
The beauty of the web is the ability to link to other sites. These inbound links from other sites may generate errors as well. This poses a greater threat to potential users or customers that will be quickly turned away when confronted with a broken link on another site. These broken links may be caused by a deleted or renamed page, an old entry in a search index, a bad bookmark, or an incorrect URL.
One way to approach these errors is to set up your web application to gracefully handle the errors outlined earlier. For example, a custom error page may be created for each error, so the custom page is displayed when/if the error occurs. This custom page can contain a user friendly message, as well as valid links within the application. A good example is creating a custom 404 page to circumvent situations where a linked resource no longer exists. The set-up for such pages will depend on the web application platform.
Another way to address errors is by implementing redirects that automatically send a user to another site resource when an error comes up. Again, the set-up and usage of redirects depends upon your platform.
Lastly, you may try to keep an eye on external sites with broken links through tools like the Google Webmaster Tools, which include the ability to crawl error sources and view sites with invalid links.
Pushing a site to production does not give you any time to relax, as regular maintenance must be performed to keep the site up and available. One part of regular maintenance should be link validation to make sure users don’t experience problems while using the application.
Do you or someone within your organisation regularly perform such maintenance on your web applications? If so, what tools or methods do you prefer? Leave me a comment and let me hear your opinion. If you’ve got any thoughts, comments or suggestions for things we could add, leave a comment! Also please Subscribe to our RSS for latest tips, tricks and examples on cutting edge stuff.