Developers should learn the basics of Meta Tags : Scott Clark
Contributed on Webdeveloper.com
With all the new HTML tags that are coming out, it’s easy to overlook some of the greatest tools in our arsenal of HTML tricks. There are still a few HTML goodies lying around that’ll help you keep your pages more up to date, make them easier to find, and even stop them from becoming framed. What’s more, some of these tags have been with us since the first Web browsers were released.
META tags can be very useful for Web developers. They can be used to identify the creator of the page, what HTML specs the page follows, the keywords and description of the page, and the refresh parameter (which can be used to cause the page to reload itself, or to load another page). And these are just a few of the common uses!
First, there are two types of META tags: HTTP-EQUIV and META tags with a NAME attribute.
META HTTP-EQUIV tags are the equivalent of HTTP headers. To understand what headers are, you need to know a little about what actually goes on when you use your Web browser to request a document from a Web server. When you click on a link for a page, the Web server receives your browser’s request via HTTP. Once the Web server has made sure that the page you’ve requested is indeed there, it generates an HTTP response. The initial data in that response is called the "HTTP header block." The header tells the Web browser information which may be useful for displaying this particular document
Back to META tags. Just like normal headers, META HTTP-EQUIV tags usually control or direct the actions of Web browsers, and are used to further refine the information which is provided by the actual headers. HTTP-EQUIV tags are designed to affect the Web browser in the same manner as normal headers. Certain Web servers may translate META HTTP-EQUIV tags into actual HTTP headers automatically so that the user’s Web browser would simply see them as normal headers. Some Web servers, such as Apache and CERN httpd, use a separate text file which contains meta-data. A few Web server-generated headers, such as "Date," may not be overwritten by META tags, but most will work just fine with a standard Web server.
META tags with a NAME attribute are used for META types which do not correspond to normal HTTP headers. This is still a matter of disagreement among developers, as some search engine agents (worms and robots) interpret tags which contain the keyword attribute whether they are declared as "name" or "http-equiv," adding fuel to the fires of confusion
Using META Tags
On to more important issues, like how to actually implement META tags in your Web pages. If you’ve ever had readers tell you that they’re seeing an old version of your page when you know that you’ve updated it, you may want to make sure that their browser isn’t caching the Web pages. Using META tags, you can tell the browser not to cache files, and/or when to request a newer version of the page. In this article, we’ll cover some of the META tags, their uses, and how to implement them.
This tells the browser the date and time when the document will be considered "expired." If a user is using Netscape Navigator, a request for a document whose time has "expired" will initiate a new network request for the document. An illegal Expires date such as "0" is interpreted by the browser as "immediately." Dates must be in the RFC850 format, (GMT format):
This is another way to control browser caching. To use this tag, the value must be "no-cache". When this is included in a document, it prevents Netscape Navigator from caching a page locally.
These two tags can be used as together as shown to keep your content current—but beware. Many users have reported that Microsoft’s Internet Explorer refuses the META tag instructions, and caches the files anyway. So far, nobody has been able to supply a fix to this "bug." As of the release of MSIE 4.01, this problem still existed.
This tag specifies the time in seconds before the Web browser reloads the document automatically. Alternatively, it can specify a different URL for the browser to load.
Be sure to remember to place quotation marks around the entire CONTENT attribute’s value, or the page will not reload at all.
This is one method of setting a "cookie" in the user’s Web browser. If you use an expiration date, the cookie is considered permanent and will be saved to disk (until it expires), otherwise it will be considered valid only for the current session and will be erased upon closing the Web browser.
This one specifies the "named window" of the current page, and can be used to prevent a page from appearing inside another framed page. Usually this means that the Web browser will force the page to go the top frameset.
Although you may not have heard of PICS-Label (PICS stands for Platform for Internet Content Selection), you probably will soon. At the same time that the Communications Decency Act was struck down, the World Wide Web Consortium (W3C) was working to develop a standard for labeling online content (see www.w3.org/PICS/ ). This standard became the Platform for Internet Content Selection (PICS). The W3C’s standard left the actual creation of labels to the "labeling services." Anything which has a URL can be labeled, and labels can be assigned in two ways. First, a third party labeling service may rate the site, and the labels are stored at the actual labeling bureau which resides on the Web server of the labeling service. The second method involves the developer or Web site host contacting a rating service, filling out the proper forms, and using the HTML META tag information that the service provides on their pages. One such free service is the PICS-Label generator that Vancouver-Webpages provides. It is based on the Vancouver Webpages Canadian PICS ratings, version 1.0, and can be used as a guideline for creating your own PICS-Label META tag.
Although PICS-Label was designed as a ratings label, it also has other uses, including code signing, privacy, and intellectual property rights management. PICS uses what is called generic and specific labels. Generic labels apply to each document whose URL begins with a specific string of characters, while specific labels apply only to a given file.
Keyword and Description attributes
Chances are that if you manually code your Web pages, you’re aware of the "keyword" and "description" attributes. These allow the search engines to easily index your page using the keywords you specifically tell it, along with a description of the site that you yourself get to write. Couldn’t be simpler, right? You use the keywords attribute to tell the search engines which keywords to use, like this:
By the way, don’t think you can spike the keywords by using the same word repeated over and over, as most search engines have refined their spiders to ignore such spam. Using the META description attribute, you add your own description for your page:
Make sure that you use several of your keywords in your description. While you are at it, you may want to include the same description enclosed in comment tags, just for the spiders that do not look at META tags. To do that, just use the regular comment tags, like this:
!–// This page is about the meaning of life, the universe, mankind and plants. //–
More about search engines can be found in our special report.
ROBOTs in the mist
On the other hand, there are probably some of you who do not wish your pages to be indexed by the spiders at all. Worse yet, you may not have access to the robots.txt file. The robots META attribute was designed with this problem in mind.
meta NAME="robots" CONTENT="all | none | index | noindex | follow | nofollow">
The default for the robot attribute is "all". This would allow all of the files to be indexed. "None" would tell the spider not to index any files, and not to follow the hyperlinks on the page to other pages. "Index" indicates that this page may be indexed by the spider, while "follow" would mean that the spider is free to follow the links from this page to other pages. The inverse is also true, thus this META tag:
meta NAME="robots" CONTENT=" noindex">
would tell the spider not to index this page, but would allow it to follow subsidiary links and index those pages. "nofollow" would allow the page itself to be indexed, but the links could not be followed. As you can see, the robots attribute can be very useful for Web developers. For more information about the robot attribute, visit the W3C’s robot paper.
Placement of META tags
META tags should always be placed in the head of the HTML document between the actual tags, before the BODY tag. This is very important with framed pages, as a lot of developers tend to forget to include them on individual framed pages. Remember, if you only use META tags on the frameset pages, you’ll be missing a large number of potential hits.
Obscure META Tags
If you’re a normal person (I’m not, and I don’t know any, but I heard they do exist), then you’re wondering just what, exactly, is Dublin Core? No, it’s not an Irish porno movie, but rather, it’s a simple resource description record that has come to be known as the Dublin Core Metadata element set, or rather, Dublin Core.
Thanks to a considerate reader, we now know how it got its name. Dublin Core is the core set of metadata elements which were identified by a working group (comprised of experts drawn from the library and Internet communities) which met in Dublin, Ohio.
Dublin Core was designed with several issues in mind, namely to:
* enable search engines to filter by standard fields, i.e. date and author
* Browsers could have the ability to display metadata fields in a separate window
* enhance cross-collection, repurposing and integrating of content
* enhance site management, as old pages may be located more easily, etc.
Rating is basically the same thing as PICS-Label, and can be used for the same purpose, but PICS-Label is recommended over rating, as it is currently recognized by more software than rating, although it couldn’t hurt to use both.
Many of the obscure META tags are produced by HTML authoring software. Microsoft Word supports a number of META attributes in its HTML export option, and if you create a document with Internet Assistant, FrontPage, etc, you’ll notice that they automatically insert certain META tags, such as Generator, Content-Type, etc. into the Web page source. Other META tags are organization or search engine specific. The RDU Metadata search engine uses many such tags, including: contributor, custodian, east_bounding_coordinate, north_bounding_coordinate and others. Other obscurities are government META tags, useful only if you are within a government intranet or system.
Statistics show that only about 21% of Web pages use keyword and description META tags. If you use them and your competitor doesn’t, that’s one in your favor. If your competitor is using them and you aren’t, you may now consider yourself armed with the knowledge. META tags are something that visitors to your Web site are usually not aware of, but ironically, a lot of times it was those same META tags which enabled them to find you in the first place. So for goodness’ sake, don’t tell anyone about this….let’s just keep this our own little secret (just kidding…make sure to send this URL to everyone you know!).
Before we leave the topic of META tags, keep in mind that there are several legal issues that surround the use of these tags on your Web site. Danny Goodman, editor of SearchEngineWatch, has put together a page detailing the lawsuits brought on revolving around META tags. At the present time there have already been at least five such suits, mainly focused on sites that utilized someone else’s keywords within their META tags. The largest of these suits brought a settlement of $3 million dollars. Bottom line: use your own keywords, and definitely not words that someone else has a copyright on.