Learning HTML 3.2 by Examples
Jukka Korpela

Preface

To whom?

This document is intended for people who have an idea of what the World Wide Web is and who produce, or intend to produce, information onto the Web. If HTML is something new to you, you will have to study some introductory texts (eg those mentioned in this document) before you can really take in this document. On the other hand, some people who "know HTML" may need to unlearn something, to convert from nonstandard HTML to standard.

This document tries to define the technical terms it uses or to provide links to definitions. If you find terms which are unknown to you and not defined here, please consult eg the Terms section of HTML 2.0 specification or some of the general Internet glossaries. (The most authoritative Internet glossary is probably RFC 1983.)

About what? What's HTML 3.2?

This document discusses HTML 3.2, which is currently the most recommendable version of the document description language HTML used on the Web. Its authoritative definition is W3C Recommendation HTML 3.2 Reference Specification. It is also known under the code name Wilbur.

People who have heard about HTML 3.0 should notice that HTML 3.2 is not an extension or a variant of HTML 3.0, which has now been withdrawn. (The version numbers 3.0 and 3.2 are misleading!) More exactly, HTML 3.2 contains

For a good summary of the new features in HTML 3.2 as compared with HTML 2.0, consult the article What's New in HTML 3.2 in the World Wide Web Journal, but please notice that it contains a few mistakes.

Why should you learn HTML?

It is possible to provide information on the Web without knowing the HTML language, since HTML can be produced by various specialized editors and converters. This document, however, was written for people who write HTML directly or at least occasionally check and modify HTML code. There are several good reasons to do so. Writing HTML directly isn't difficult - possibly it's easier than learning to use an HTML editor or converter. Moreover, the HTML editors and converters are often limited in their capabilities, or buggy, or produce bad HTML code which does not work on different platforms.

But why HTML 3.2?

The HTML language exists in several variants and continues to evolve, but the HTML 3.2 constructs will most probably be useable in the future, too. By learning HTML 3.2 and by sticking to it as far as possible, you can produce documents which can be browsed by a large variety of Web software now and in the future. This does not exclude the possibility of using other features, such as enhancements provided by Netscape Navigator or Internet Explorer or some other product, if it really serves your purposes and you are willing to accept the consequences (e.g. limitations on accessibility). But it is wise to adopt the habit of producing documents in a standardized language and using extensions only when really necessary.

HTML 3.2 has been defined by the World Wide Web Consortium. It is supported by several browsers to a large extent, and it will probably become the common basis understood by almost all relevant Web software. The next version, an extension to HTML 3.2, is being developed under the code name Cougar.

An older standard, HTML 2.0, is supported to an even larger extent, since HTML 3.2 is an extension of HTML 2.0.

However, to be exact, the following HTML 2.0 features have been removed in HTML 3.2:

It might be a good idea to try to write your documents in HTML 2.0 if possible (avoiding the above-mentioned omitted features, of course). For this reason, constructs (eg tags, tag attributes, or attribute values) which are legal HTML 3.2 but not HTML 2.0 are flagged in this document as follows: (Not in HTML 2.0!) Notice that even by sticking strictly to HTML 2.0 you cannot absolutely guarantee a proper rendering of your documents, since there are deficiencies in browser implementations. The HTML test set by Osma Ahvenlampi contains a large document RFC 1866 HTML 2.0 for testing a browser against the HTML 2.0 specification.

The scope of this document

This document provides material for a systematic study of HTML 3.2 starting from the basic structural features and illustrating them with examples. In addition it

This document does not discuss general issues of Web authoring, such as overall design of documents and document collections. As regards to them, see my list of suggested reading.

In addition to such issues, you need to know where to put your HTML document to make it accessible to the world; this may involve things like setting up directory and file protections suitably. Please consult your local Web support for information relevant at your site.

This document concentrates on basic HTML usage. In particular, this document does not give realistic examples about applets or image maps. (The main reason for this is that the author felt that a basic document was urgently needed, and providing good examples about such complicated and somewhat controversial issues would have taken too much time.)

On the versions of this document

This document exists both as a collection of interlinked smaller HTML files and as a single HTML file. The master (most up-to-date) copies are at

For printing on paper, you may wish to use the PostScript version (generated from the HTML version with Netscape), which also exists in a much smaller form, as compressed (with the Unix compress utility).

Best viewed on...

Of course, this document complies with the HTML 3.2 specification, to the best knowledge of the author. No attempt has been made to "optimize" the document for presentation on some particular browser.

In general, you should be able to read this document on any decent WWW browser. However, tables (TABLE elements) have been used in this document, mainly in the description of attributes, since they are essentially tabular information best presented so. Unfortunately this means that parts of this document are almost illegible when viewed with browsers which cannot present tables (eg most versions of Lynx).

Copyright notice

Copyright 1997 Jukka Korpela.

The author hereby gives general permission to copy and distribute this document or parts thereof in any medium, provided that all copies contain, in a manner appropriate for the medium, an acknowledgement of authorship and the URL of the original document, ie http://www.hut.fi/%7ejkorpela/HTML3.2/

The permission granted above does not imply permission to distribute this document in a modified form or as a translation. Please contact the author to discuss the conditions for such actions.

Explanation: The author wishes to preserve the integrity of the document. This includes specifying the context when distributing or using excerpts and informing the reader about the availability of the entire document in its most up-to-date form.

How to study HTML 3.2

Getting started with HTML in general

If you do not previously know HTML in any version, you should first read some introduction to the basic concepts and ideas behind HTML. You might consider one of the following options:

Please notice that most introductory texts on HTML do not present the language exactly as defined by HTML 3.2; some of them might differ a lot from it. This is understandable, since the language HTML evolves rapidly (and even divergently).

Learning HTML 3.2 systematically

When you know the very basics of HTML in general, a suggested order of studying HTML 3.2 is the following:
  1. Read The obligatory structure of a document and The recommended structure of a document. You may wish to compare this information with The structure of an HTML 3.2 document on the Wilbur - HTML 3.2 pages at
    http://www.htmlhelp.com/reference/wilbur/
    by the Web Design Group (the basic content should be the same, but you might prefer WDG's style of describing things to mine)
  2. Practise by creating an HTML document with the recommended structure but no contents so far; store this document under a name like template.html and use it as the basis for your HTML documents in the future; create a copy of it, add some plain text into the body and check that the document is readable using a Web browser.
  3. Read Fundamental structures in HTML 3.2, with examples of this document. Concentrate on studying (and possibly enjoying) the ideas and their application, not on memorizing technical details.
  4. Study the general remarks on the syntax of HTML in this document. You will need that information when writing HTML. However, you may at this phase ignore the subsection Miscellaneous notes
  5. Practise by creating useful HTML documents of your own, using the tags you have learned so far.
  6. Browse through the short descriptions part of this document, to get a picture of what is available in HTML 3.2, and following the links to get more information about elements that seem potentially useful to you.
  7. Then the world is open to further practising and studying. But beware: there are false prophets and a lot of misuse of HTML around. May the Structure be with you!

The official HTML 3.2 specification

When you have doubts about the exact form, meaning, and limitations of an HTML tag, you should consult the most official documents on HTML available: the World Wide Web Consortium documents at
http://www.w3.org/pub/WWW/MarkUp/Wilbur/
especially the W3C Recommendation HTML 3.2 Reference Specification

The specification is relatively short and technical, and consulting the older HTML 2.0 specification (also known as RFC 1866) can be useful, since the current HTML 3.2 specifications can sometimes be understood only be assuming HTML 2.0 as a background document.

In order to understand the HTML specifications exactly, some fluency in reading SGML (the metalanguage used to describe the syntax of HTML formally) is required. SGML as a whole is rather complicated, and the SGML standard is only available in printed form. However, for the purpose of understanding the SGML descriptions of the syntax of HTML (that is, HTML DTDs), the following material usually gives you enough information:

There are some minor internal inconsistencies in the HTML 3.2 specification.

Additional sources of information

There is a large number of good documents on HTML authoring in general. To mention a few of them: Some sources of information on HTML 3.2 in particular: You may encounter strange HTML tags or attributes in other people's documents, especially if you are given the task of maintaining documents written by other people. It's often difficult to find out what they are intended to do and widely they can be expected to work (there is a lot of variation in this!). It is not possible to write a description of "all HTML tags", since the situation keeps changing all the time and many proprietary tags are poorly documented. Traditionally, the HTML Elements List by Sandia National Laboratories has been referred to a description of various HTML elements and support for them in some popular browsers, but that document is pretty old now. There is better coverage in Oleg K.'s HTML shop.

Notice that documents on HTML (even some of the above-mentioned) very often contain information about features which do not belong to HTML 3.2.

Checking your HTML

When you have started creating and maintaining important HTML documents, you should learn to use a, validator, ie a program which checks your HTML code against the HTML 3.2 (or some other) specifications.

Even if you know HTML 3.2 well, you will by mistake violate the specification; for instance, just forgetting an ending quote can cause a lot of such violations. You may not notice the error in your environment but your readers may get confused.

It is not sufficient to check that "it works" on your browser. Other people will use that browser in a different environment or with different settings, different versions of the browser, or even quite different browsers. Browsers very often pass invalid HTML without giving error messages, perhaps even handling in such a way that things seem to work fine. For other people, it might be a mess. Looking at your document on a few different browsers may help to detect problems, but it would be too tedious to do that for all important browsing environments.

Therefore, validate your code. You can use eg HTML Validation Service of WebTechs which is easy to use.

Passing validation means that there are no violations of HTML syntax (providing that the validator does its job right). Checking the quality of the document is a different thing. There are some checkers such as WebLint which can be used to test the document for various common problems - for things which, although technically legal, are likely to provoke known browser bugs, etc. Checkers may of course perform an HTML syntax check too, but typically they are rougher than validators. They might declare a document legal syntax when it isn't, or declare it illegal when it is. Nevertheless, they are useful tools, both for alerting newcomers to potential problems, and for picking up errors made by even the most experienced.

For more information, Heikki Kantola's nice compact list of validators and checkers and WDG's (annotated) rather extensive list of validators and checkers.

General remarks on the syntax of HTML

Character set

The character repertoire available to the author of HTML documents is not fixed exactly but it should, according to specifications, contain the the ISO Latin 1 set, also known as ISO 8859-1, since it belongs to the ISO 8859 set of standards. Notice that the encoding of characters may vary, although the default encoding is the one specified in ISO 8859-1. (The HTTP protocols specifies how information about encoding is to passed along with a document.)

In addition to character repertoire and encoding (of characters by bit combinations), there is a special feature which is fixed in HTML: the interpretation of numerical character escapes of the form &#n; where n is a number. Such an escape is to be interpreted as the character corresponding to n in ISO 10646 and Unicode. In practice, browsers cannot represent all ISO 10646 characters, but the specifications imply that if a browser &#n; presents as a character, it must use the ISO 10646 character. (Unfortunately, browsers may violate this.)

In practise, you should use ISO Latin 1 characters only. Currently or in the near future you can hardly expect general support for extensions to it, although support to some national alphabets may exist nationally. Support for ISO Latin 1 should exist in all browsers, but there are problems even with this. You may of course decide to stick to the ASCII character set, which is a subset of ISO Latin 1, especially if you do not need letters with diacritic marks (or, in general, letters other than English a - z).

The printable characters of ASCII (with code values from 32 to 126 in decimal) are the following:

  ! " # $ % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^ _
` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~ 
The other printable characters of ISO Latin 1 (with code values from 160 to 255 in decimal) are the following:
  ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯
° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿
À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
à á â ã ä å æ ç è é ê ë ì í î ï
ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ
Note: The presentation of some characters in the copy of this document may be defective eg due to lack of font support. Naturally, the appearance of characters varies from one font to another.

If your keyboard or text editor does not allow you to enter (ie to type directly) some ISO Latin 1 characters such as ä or ñ, you can use the character escape conventions.

Some practical warnings to those who create HTML documents on microcomputers:

See also A. J. Flavell's Notes on ISO 8859-1 in the Web context.

HTML tags

An HTML tag consists of the following, in this order: Examples:
<H1>
<H1 ALIGN=LEFT>

HTML elements

Most, but not all, HTML tags are paired so that that an opening tag is followed by the corresponding closing tag, and there can be text or tags between them, as in
<H1>Foreword</H1>
In such cases the two tags and the part of the document enclosed by them forms a unit which is called HTML element. Some tags, eg <HR>, are HTML elements by themselves, and for them the corresponding end tag would be illegal. - In the sequel we will usually refer to tags by their name only, omitting the obligatory angle brackets.

For some elements which logically consist of a start tag, some content and an end tag, it is legal to omit the end tag, possibly even the start tag. For example, you can omit the end tag </P> and let browsers and other software imply it when necessary. The exact rules for allowable tag omission are given in the HTML specification, often only in the formal (SGML) syntac, so they can be hard to read. Moreover, some browsers are known to misbehave if you omit some end tags even when the specs allow it, and this can have drastic effects eg when nested tables are involved. Thus it is wisest to use explicit end tags always for all elements which logically have an end tag.

Attributes

For each tag, a set of possible attributes is defined. This set can be empty or rather large, but most tags accept one or a few attributes. In almost all cases the attributes are optional. An attribute specification consists of the following, in this order: It is always safe to enclose the attribute value in quotes, using either single quotes ('80') or double quotes ("80"), using matching quotes of course. The string in quotes must not contain the quote, so if the data contains a double quote, use single quotes for quoting, and vice versa. In general, using double quotes is preferable, since for the human eye single quotes are sometimes difficult to distinguish from other characters like accents.

You can also omit the quotes from an attribute value if the value consists of the following characters only (cf to the technical concept of name):

Thus, WIDTH=80 and ALIGN=CENTER are legal shorthands for WIDTH="80" and ALIGN="CENTER". A reference to a URL like HREF=foo.html is acceptable, but in general URLs must be quoted when used in attributes, eg HREF="http://www.hut.fi/". - Some browsers are more permissive. Some browsers may even accept elements with a starting quote but without any closing quote. Such use is very bad practise.

Within attribute values, no HTML tags are recognized. On the other hand, escape sequences are recognized and interpreted.

There is a minimized syntax for attributes when the attribute value is the same as the attribute name. For instance, <UL COMPACT="COMPACT"> can be abbreviated as <UL COMPACT> (and it is common practise to do so). Some user agents even require minization for some attributes (COMPACT, ISMAP, CHECKED, NOWRAP, NOSHADE, NOHREF), so perhaps it is best to use the minimized syntax when applicable.

Successive attribute specifications must be separated with blanks (or newlines).

URLs

Several HTML elements, most notably the A element, may contain an attribute which takes a URL as value. URLs, Uniform Resource Locators, are addresses of Web documents. More generally, URLs can be used on the Web to refer to "objects" on the Web or in other information systems.

The general syntax of URLs is the following:

scheme://host:port/path/filename

where

scheme
specifies the information system (technically speaking, the protocol) to be used to access the resource; possible values include the following:
httpa Web document (to be accessed using Hypertext Transfer Protocol, HTTP)
ftp a file in a so-called FTP server, to be retrieved using File Transfer Protocol
gophera file in a Gopher server
mailto electronic mail address
news a newsgroup or an article in Usenet news
telnetfor starting an interactive session via the Telnet protocol (which is part of TCP/IP)
host
is the Internet host name in the domain notation, eg www.hut.fi (or sometimes a numerical TCP/IP address); notice that typically, but not necessarily, Web servers have domain names starting with www
:port
is the port number part, which can usually be omitted since it has a reasonable default; that is, omit it, unless it is a part of a URL which you got somewhere (or you really know what you are doing)
path
is a directory path within the host
filename
is a file name within the directory.
Actually, this pattern is mainly for Web documents, ie http URLs. For other URLs, simplifications and special interpretations are applied. For example, a mailto URL is just of the form mailto:address where address is a normal Internet E-mail address like Jukka.Korpela@hut.fi. Please notice that appending anything to the E-mail address in a mailto URL is nonstandard and may result in lost mail without anyone noticing!

As explained above, it is safest to enclose URLs in quotes when writing them as attribute values in HTML.

For an overview of URLs, see W3C material on addressing.

As regards to the technical specifications of the syntax of URLs, see RFC 1738 (absolute URLs) and RFC 1808 (relative URLs).

In particular, the specifications say that within a URL only a limited set of characters can be used as such:

Other characters must be encoded. (The characters ;/?:@=&# must also be encoded, if they are not used in the special meaning.) This encoding (which is defined by URL specifications, not HTML specifications) consists of using the percent sign followed by two hexadecimal digits, presenting the code position. For example, tilde (~) should be presented as %7E and space as %20. (Violating the rules causes problems much more likely in the latter case than in the former.)

Case sensitivity

As regards to tag and attribute names and most keyword-like attribute values, HTML is case insensitive. You can, for example, type TITLE or Title or title or even tItLE if you like. As an exception, the value of a TYPE attribute in an OL element is case sensitive.

In this document, upper case letters are used for the above-mentioned constructs. This may help the reader distinguish HTML code from normal text.

However, the following constructs are (in general) case sensitive:

Division into lines and the use of blanks and tabs

With the exception of text enclosed in PRE tags (preformatted text) or TEXTAREA tags, blanks and newlines are not preserved when displaying the document. More technically, any sequence of blanks, tabs, and newlines is equivalent to a single blank in HTML file. On the other hand, a blank in the HTML file may be rendered using any number of empty space or replaced by newline(s).

The term newline is used to denote an end of line designation. Theoretically SGML specifies that a line (record) should begin with a record start character (line feed, LF, ASCII code 10) and end with a record end character (carriage return, CR, ASCII code 13). In practise, HTML documents are presented and transmitted using a newline presentation convention of the computer system used. Therefore, HTML browsers are encouraged to accept any of the three common representations, namely CR LF sequence, CR only, and LF only, as line separators and to infer the missing record end and start characters.

Thus, it does not matter how you divide the text into a lines, since a newline is equivalent to a blank. Notice, however, that you must not divide a word into two lines in HTML. If you eg divide the word international into two lines as follows:

inter-
national
it will be interpreted as equivalent to
inter- national
and the result is not what you want.

Thus, you must use HTML tags such as P or BR to force line breaks, if they are necessary for the logical representation of your document.

Browsers usually do not divide words into two lines, except possibly when a word contains a hyphen. The HTML 3.2 Reference Specification is not very explicit in this matter; it just says, in the discussion of tables, the following:

For some user agents it may be necessary or desirable to break text lines within words. In such cases a visual indication that this has occurred is advised.

Beware that the line length is outside your control. It depends on the browser, device, and settings used by the people who look at your document. You can force line breaks but not prevent line breaks between words, in general. (You can try to prevent line breaks by using non-breaking spaces.)

As regards to newlines in conjunction with HTML tags, there are special rules:

The horizontal tab character (HT) can appear in the HTML source. Within PRE elements, tabs have a special interpretation. Otherwise a tab is equivalent to a space. Thus, it does not imply tabulation of any kind. (In order to present tabular data, use the TABLE element.) It is best to avoid tabs in HTML code and to use a suitable number of spaces instead, if one wants to format the HTML source code into tabular form.

Classification of elements

The ways in which HTML tags can be combined are defined in terms of elements and their classification. It is much more convenient to define eg that an H1 element may contain (only) text elements than to give a long list of allowable elements, especially since the same list would appear in many contexts and it may change when new text elements are added to HTML in its future revisions.

Apart from the elements at the topmost levels, namely HTML, HEAD and BODY, the HTML elements are classified into three major categories:

Any text element (including plain text) can appear wherever a block element is allowed, by virtue of implicitly forming a paragraph (P element) when necessary.

A rule of thumb which may help in remembering which elements are block elements and which are text elements: block elements cause paragraph breaks, text elements do not.

Note: Often block elements can contain both text elements and other block elements, ie blocks can be nested. Text elements can be nested, too. On the other hand, text elements may not contain block elements. For example,
<CITE><H3>Origin of Species</H3></CITE>
is invalid (since CITE is text element and H3 is block element) and also illogical (you don't really mean that the heading as a structure is a citation, do you?) whereas
<H3><CITE>Origin of Species</CITE></H3>
would be legal, although different browsers might treat it differently (letting either H3 or CITE determine the rendering, or possibly using a mixture of the two). Similarly, don't embed headings into A NAME tags but vice versa. It is also illegal to have a paragraph break (P tag) within eg a STRONG element; although several browsers can handle it, the semantics is ambiguous and you should use separate start and end STRONG tags within each paragraph (if you really want to emphasize such large portions of text!).

Allowed nesting of elements

This section describes how elements may be nested in HTML 3.2. It does not describe the rules for the ordering or repeatability of elements. It simply answers questions of the form may element X appear within element Y?

The same information is presented in the individual tag descriptions, in their Allowed context and Contents parts. Here it is presented in a compact form. This form does not cover all details but might be more illustrative.

Legend:

HTML In order to simplify element descriptions, I will use the term text container to denote any element which may contain a text element directly (as opposite to containing an element which contains a text element). The following elements are text containers:

A, ADDRESS, APPLET, B, BIG, BLOCKQUOTE, BODY, CAPTION, CENTER, CITE, CODE, DD, DFN, DIV, DT, EM, FONT, FORM, H1, H2, H3, H4, H5, H6, HTML, I, KBD, LI, P, PRE (with restrictions), SAMP, SMALL, STRIKE, STRONG, SUB, SUP, TD, TH, TT, U, VAR.

The following are not text containers but may contain text elements indirectly, ie contain elements which are text containers:

DIR, DL, MENU, OL, TABLE, TR, UL.

The following may not contain text elements at all:

AREA, BASE, BASEFONT, BR, HEAD, HR, IMG, INPUT, ISINDEX, LINK, MAP, META, OPTION, PARAM, SCRIPT, SELECT, STYLE, TEXTAREA, TITLE,

Similarly I will use the term block container to denote any element which may contain a block element directly (as opposite to containing an element which contains a block element). Block containers are: BLOCKQUOTE, BODY, CENTER, DD, DIV FORM HTML, LI (when within UL or OL), TD, TH.

Miscellaneous notes: about escape sequences (character entities), names, colors, widths, pixels, non-breaking spaces (&nbsp;), comments

This subsection discusses some technical issues which are related to some HTML tags. Rather than presenting them in the descriptions of individual tags, they have been collected here. Please feel free to skip them in first reading, and consult them later when needed; the tag descriptions contain links to the relevant information here.

Escape sequences (character entities)

Escape sequences, more formally known as character entities, are a method of presenting special characters. For example, the escape sequence &lt; denotes the less than character (<).

Obviously, since some characters such as < are used with a very special meaning in HTML, there must be some way of expressing them as data characters, ie when they should appear eg as part of the document itself or in a URL. The convention is that the following notations are used:

character notation usual name(s) of the character
< &lt; less than character, left angle bracket
> &gt;greater than character, right angle bracket
& &amp; ampersand

There was notation &quot; for the double quote (") in HTML 2.0, but it does not belong to HTML 3.2 (for certain technical reasons). The double quote can be typed as such within normal text, and within quoted strings as well if the single quotes are used as the outermost quotes. (In the rare cases where this does not work, you can use &#34; to represent the double quote.)

Notice that the semicolon is part of the escape sequence. In principle, it is necessary only if the following character would otherwise be recognized as part of the name. In practice, it is best to adopt the habit of always terminating an escape sequence with a semicolon.

In escape sequences, the case of letters is significant. For example, the ampersand & may not be represented as &AMP; (this escape sequence is undefined), and the escape sequences &auml; and &Auml; denote two distinct characters, a umlaut (a dieresis, the letter a with two dots above it) in lower case and in upper case (ä and Ä); notice the principle of uppercasing only the first letter in the escape notation (&AUML; is undefined).

The need for the above-mentioned escape sequences arises from the syntax of HTML. In fact there are escape sequences for all characters in the ISO Latin 1 character set. There are

For a full list, see the appendix Character Entities for ISO Latin-1 of the HTML 3.2 Reference Specification. There is also perhaps slightly more readable presentation of that information: Table of Character Entities for ISO Latin-1.

However, there is usually little reason to use other escape sequences than &lt; and &gt; and &amp;. Using &auml; instead of ä might seem to give some character code independency, but it does not; if a browser can display &auml; correctly, it can also display correctly a document in which the character ä is specified directly. But notice that sometimes you cannot input some special characters directly due to keyboard restrictions, and in such cases you can have use for notations like &auml;.

And please notice that "character ä" means the ISO Latin 1 character with name "small letter a with diaeresis" (diaeresis = umlaut), with code 344 in octal, 228 in decimal. It can be entered into an HTML document in various ways. It is possible that pressing a key labeled with ä or Ä is not among those ways. For instance, on a Macintosh with Scandinavian keyboard the ä key normally produces a character quite different from ä in ISO Latin 1. Various programs may or may not handle this by performing character code conversions.

Some browsers support other escape sequences than those mentioned above, for example &trade; and &cbsp;. The use of such notations is strongly discouraged. (Notation &trade; refers to a symbol which does not belong to ISO Latin 1 at all; you may wish to use the HTML 3.2 conformant notation <SUP><SMALL>TM</SMALL></SUP> instead. Notation &cbsp; stands for "conditional breaking space", not in ISO Latin 1 and possibly not intended to be a character at all.)

Names

In some contexts in the definition of HTML, the word name appears as a technical term. (Perhaps a more appropriate term would be identifier, since the concept bears resemblance to identifiers in programming languages). A name is a sequence of characters containing only and beginning with a letter.

This name concept occurs in the description of HTTP-EQUIV and NAME attributes of the META element and in the description of NAME attribute of the PARAM element.

In other contexts, a string which is used to name something may contain other characters as well but then it must be quoted.

Colors

Some HTML constructs can be used to specify colors: by using an explicit BODY element one can specify the background color, default text color, and colors of link texts; and the FONT element can be used to set text color locally.

It is of course possible that due to software or hardware limitations all colors cannot be presented. On some devices, the actual rendering might be just black and white or different shades of grey.

When a color is specified as the value of an attribute, there are two possibilities:

Of course, the symbolic notations are much easier to use and more self-explanatory. On the other hand the numerical designations give much more possibilities.

It is not necessary to know the numerical equivalents of the predefined color names in order to use them. However, the following table specifies them as well, since they might help authors who wish to define colors by slightly modifying the predefined ones.

Color names and sRGB values
Black = "#000000" Green = "#008000"
Silver = "#C0C0C0" Lime = "#00FF00"
Gray = "#808080" Olive = "#808000"
White = "#FFFFFF" Yellow = "#FFFF00"
Maroon = "#800000" Navy = "#000080"
Red = "#FF0000" Blue = "#0000FF"
Purple = "#800080" Teal = "#008080"
Fuchsia = "#FF00FF" Aqua = "#00FFFF"

These colors were originally picked as being the standard 16 colors supported with the Windows VGA palette. The HTML 3.2 Reference Specification contains a section on colors with sample images in each of the 16 colors. Notice that these colors are rather striking in their brightness. Normally you should use paler colors.

See also:

Widths

The value of the WIDTH attribute in eg an HR or TABLE tag can specified in two alternative ways: The former, relative specification is more recommendable in general, since the author of a document cannot know the pixel size of the reader's screen.

Pixels

Pixel values used in several contexts like width specifications refer to screen pixels. The physical size of a pixel depends on the user's screen.

A browser should multiply the pixel values by an appropriate factor when rendering to very high resolution devices such as laser printers. For instance if a user agent has a display with 75 pixels per inch and is rendering to a laser printer with 600 dots per inch, then it should multiply the pixel values given in HTML attributes by a factor of 8.

Non-breaking spaces (&nbsp;)

The notation &nbsp; is the escape notation for a character which is in other contexts usually called non-breaking space, or NBSP for short. According to ISO 8859, this character should be presented as a normal space (blank) but so that it is not replaced by a newline (as normal spaces often are in text processing). This means that a &nbsp; between two words causes them to be presented at the same line with some inter-word space between them. (The actual width of inter-word space may vary and need not relate to the number of spaces in an HTML file.)

The question whether &nbsp; should prevent line breaks when rendering HTML documents is ambiguous. The HTML 2.0 specification says:

Use of the non-breaking space and soft hyphen indicator characters is discouraged because support for them is not widely deployed.
The soft hyphen should really be avoided; it serves no useful purpose in HTML. But as regards to non-breaking space, you can well use it to try to prevent line breaks where you don't want them. And although the HTML 3.2 Reference Specification is not explicit about the matter in general, it suggests, in the discussion of the NOWRAP attribute of TH and TD elements, that &nbsp; should act as non-breaking space within table cells at least.

If you use non-breaking spaces, use them instead of normal spaces, not in addition to them. For instance, if you wish to prevent a line break between version and 3, type version&nbsp;3 (not version&nbsp; 3).

On the other hand, within a table in HTML 3.2, &nbsp; can have quite different meaning, which can be described as non-empty space: when a table is presented with borders, cells with empty contents are drawn without them, and spaces only do not constitute contents - but &nbsp; does! This peculiar semantics does not prevent &nbsp; from acting as a non-break space as well.

For further confusion, some people use &nbsp; to force spaces into the visible presentation of a document, eg by putting an &nbsp; or a few of them into the beginning of a paragraph to get its first line intended. This may actually work on some browsers, but it is unwise to rely on that, and it is normally useless to try to enforce such presentation features anyway.

Comments

An HTML file can contain comments, which give explanations to human readers of the HTML code. Comments do not affect the rendering of a document in any way, ie they are ignored by a browser.

You can begin a comment with the four-character sequence <!-- (less than sign, exclamation sign, two hyphens) and terminate it with the three-character sequence --> (two hyphens, greater than sign). Don't use the character pair -- or the character > within a comment. For example:

<!-- Written by Jukka Korpela -->
(For a more thorough discussion of comment syntax, see document HTML comments by WDG.)

It is generally preferable to include metainformation about the document into HTML elements, such as META. Consider making information about purpose, author, creation and last update time etc a visible part of the document itself, too.

Thus, comments should be inserted in rare cases only, eg to comment the HTML code itself to explain things that may look odd. Remember that a comment is part of an HTML file, to be transmitted whenever the document is delivered. Therefore, to avoid wasting bandwidth, if you have a long story to tell, put it into a separate document and insert just its URL into a comment.

HTML editors and converters often insert a few comment lines into the beginning of an HTML file. Such indications can be helpful and should not be removed.

Fundamental structures in HTML 3.2, with examples

The obligatory structure of a document

First of all, let us start with an extremely primitive HTML document: one that only contains the words Hello world as plain text. In an HTML file, the contents must be preceded by a head section which minimally consists of two constructs. Our HTML code would be as follows:

Example hello.html:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<TITLE>Hello</TITLE>
Hello world
In fact, this document implicitly has the following structure, ie it is equivalent to the following:

Example hello2.html:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>Hello</TITLE>
</HEAD>
<BODY>
Hello world
</BODY>
</HTML>
This means that apart from the first line, the entire file is an HTML element which contains a HEAD element, with the TITLE element as contents, and a BODY element, with the plain text as contents.

Thus, in the absence of HTML, HEAD, and TITLE tags a browser implicitly assumes them in suitable places. Therefore, your document always contains a head and a body.

The recommended structure of a document

In addition to the obligatory structure, there are various structural features which are highly recommendable. There are various local recommendations at different sites, and you should study the applicable documents carefully.

Here we will simply emphasize that every HTML document should contain certain basic information about its origin. The local recommendations may specify in detail the form in which that information should be provided.

The importance of providing origin information becomes evident if we think how people find documents using search engines or link lists in an increasing amount. In such contexts the document pops up as such, in isolation, even if you may have intended that people find out following links which you have carefully designed so that they give background information. When a user has eg found your document using AltaVista, he most probably wants to know what kind of document it is. Therefore, each HTML file should provide the very basic information (or link to information) about its origin and nature. For example, in a book-like document collection divided into small files, every file should contain at least a link to the "front page" of the "book".

At least the following origin information should be provided:

The following document presents, in the form of a skeleton sample, one way of implementing such information; please study the applicable local recommendations before adopting this or some other particular style.

Example skel.html:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>A sample HTML document</TITLE>
<LINK REV="made" HREF="mailto:jukka.korpela@hut.fi">
</HEAD>
<BODY>
<H1>A sample HTML document</H1>

This is a sample HTML document exemplifying a suggested way
of presenting basic origin information.

<HR>
<P>
<A HREF="http://www.hut.fi/~jkorpela/">Jukka Korpela</A>,
<a href="mailto:Jukka.Korpela@hut.fi">Jukka.Korpela@hut.fi</a>
<BR>
This document belongs to the context of
<a href="index.html">Learning HTML 3.2 by Examples</a>
<BR>
The URL for this document is
<KBD>
http://www.hut.fi/~jkorpela/HTML3.2/skel.html
</KBD>
<BR>
Created: December 5, 1996
</BODY>
</HTML>

Information about the document - the HEAD section

As mentioned, there are two obligatory constructs in HTML 3.2 and they must appear in this order:

Most browsers don't complain if you omit these, but they are required by the HTML 3.2 definition. More importantly, there are good practical reasons to include them:

Formally, the TITLE element is (at least implicitly) part of a HEAD element whereas the !DOCTYPE clause precedes all HTML constructs.

Optionally, the HEAD element may contain the following elements in addition to a TITLE element:

Organizing the contents - headings, paragraphs, lists, etc

Generally, you divide your document into parts, which may in turn be divided into parts etc. In HTML, such division is expressed using headings of different level. The lowest-level parts in this hierarchy consist of one or more paragraphs. In addition to normal paragraphs and some special kinds of paragraphs like (long) quotations, HTML 3.2 supports lists and tables, which can be regarded as paragraph-like. The internal structure of paragraphs and paragraph-like elements is expressed using text level tags, to be discussed later.

The tags for expressing major structural features, so-called block level tags, are the following:

A recommendable approach, which may need adjustments to fit your local recommendations, is the following:

  1. Write a descriptive heading for the entire document and use H1 element with ALIGN=CENTER attribute for it.
  2. Divide the document into major parts (sections), write suitable titles for them, using H1 with ALIGN=LEFT. In this and further divisions, try to avoid having more than seven parts.
  3. If necessary, divide each major part into smaller parts with H2 headings, and if needed divide each of these subsections into subsubsections with H3 headings. Avoid using H4 headings and especially H5 and H6 headings, both because they are often rendered with a very small font and because more than three levels of structure tends to make the document hard to read. (If you still feel tempted to use H4, consider dividing the entire document into smaller documents.)
  4. If you have a section with, say, H2 heading and containing H3 headings, avoid inserting text between the H2 heading and the first H3 heading. Such "homeless" text can be acceptable if it only contains very short notes such as general orientation, some remarks about the section, or a motto. Long homeless texts confuse the reader who does not see your good intentions; therefore, use a subsection with a heading of the appropriate level and with text like Introductory remarks, Generalities or Summary.
  5. Divide the smallest parts of the above-mentioned structure into paragraphs or paragraph-like blocks (namely lists or tables). as described below. Notice that in HTML you must explicitly indicate paragraph division by HTML elements; leaving just an empty line does not cause a paragraph break.
  6. Within paragraphs, use text level markup, normally phrase markup, to distinguish special text segments from normal text, eg to indicate quotations of computer output or to emphasize key words.
  7. Add links and, if applicable, images or other illustrations.
As regards to the paragraph level, there are quite a many alternatives. The following list is intended to give some practical guidelines for selecting a suitable alternative:

List can be nested in the sense that an item in a list, i.e. an LI (or DD) element, may in turn contain a list element.

Notice that the basic paragraph element P is not nestable, ie you cannot have P elements within a P element to create subparagraphs. However, the various list elements effectively provide an itemization structure which essentially corresponds to subparagraph division. Moreover, the list elements are nestable.

Text markup - emphasis, citations, code, etc

Logical vs physical markup

There are two major classes of text markup: logical and physical. Logical markup indicates the role of a text segment, such as being more important than normal text or being a quotation. Physical markup is an instruction to present text in a particular manner, such as using a font of some specific kind or underlining.

Logical markup shall be preferred. Use physical markup only if it is really relevant that part of a text displayed in a particular physical way (if possible). The need for physical markup may arise when referring to information in fixed presentation form, such as text in a book or in an image. Such situations occur rarely.

For instance, use the STRONG element for strong emphasis, letting the various Web browsers express the emphasis in the way which is the best in the environment where they are used. Do not use the B element (indicating bolding), except in the rare occasions where you are writing about some text appearing in boldface somewhere.

When style sheets will be generally useable, both authors and readers will be able to affect the rendering (eg font, color, and background) of elements. For instance, someone might wish to have all program code extracts presented with yellow background and larger than normal font whereas someone might prefer some quite different methods of distinguishing them from normal text. Such operations will be much easier if logical markup has been used consistently.

In addition to being more flexible with respect to various browsers and rendering environments, logical markup has the following advantage over physical markup: In an increasing amount, computer programs are used for extracting information from HTML documents for various purposes like indexing. For this to work, it is much better to have logical markup indicating eg that some text is more important than the rest or a quotation of computer printout, rather than having designations of physical fonts.

Both logical and physical markup is done using HTML elements with start and end tags. It follows from the nature of HTML language that markups must not overlap. For instance, the following is in error:

  This has some <B>bold and <I></B>italic text</I>.
On the other hand, markup elements can be nested. User agents should do their best when rendering structures like the following:

Example nest.html:

This is <I>italic text which contains <U>underlined text</U>
within in </I> whereas <U>this is normal underlined text</U>.

Obviously, browsers with limited font repertoire can have difficulties in presenting text markup.

Phrase elements (logical text markup)

There are two phrase element for emphasis: EM and STRONG, and naturally STRONG is used for stronger emphasis.

Avoid emphasizing too much, since emphasizing everything is tantamount to saying everything with the same emphasis, ie not emphasizing anything! (The proverbial student who underlines everything in his textbook has not grasped the idea of emphasizing.)

Unfortunately there is no phrase element for "de-emphasis", ie for indicating segments of text as less important. If you really need that, you may consider using the SMALL element. But especially if the less important text is relatively long, it might often be a better idea to put it "behind hyperlinks", into separate documents to which there are links in the main document. A person who follows such a link is probably interested in the text, so he probably prefers seeing it as normal text, and there is no need for any de-emphasis.

The DFN element can be regarded as a special kind of emphasis, too, but logically it indicates that a term is used in a context where it is defined. This is a very useful element in principle but unfortunately many browsers, including Netscape, do not effectively support it.

The VAR element indicates that a piece of text (typically, a word) is a variable, ie a generic notation to be replaced by different actual expressions.

The other phrase elements involve different kinds of citations or quotations:

CITE citation (title of a book or article or equivalent)
CODE program code or equivalent (eg HTML code)
SAMP sample output from programs, scripts, commands etc
KBD text to be typed from a keyboard by a user; typically used when giving instructions

Please do not identify eg the concept of emphasis with its physical representation on your browser (or even its typical representation on several browsers). See below for notes and examples on rendering markup.

Font elements (physical text markup)

The available font elements - to be used very sparingly! - are:
TT "teletype" text, ie monospaced text
I italics
B bold
U underlined
STRIKE strike-through text
BIG large font
SMALL small font
SUB subscript
SUP superscript

Note: SUB and SUP might reasonable be regarded as phrase-level markup, and as mentioned above, SMALL might be used as a substitute for the missing phrase markup for de-emphasis.

The FONT (and BASEFONT) element offers more possibilities to control font sizes than BIG and SMALL. However, all use of font size control in HTML should be avoided.

Rendering of markup

You may wish to view a separate file to see the visual appearance of the different markup elements on your browser. But please do not assume that the rendering which you see is universal or the correct one.

For example, some browsers (eg Internet Explorer) render TT (and CODE) so that the font is significantly smaller than normal text font, and this disproportion is preserved when the setting for font size is changed; moreover, Internet Explorer renders VAR with monospaced font whereas most graphical browsers use (much more naturally) italics. On the other hand, in Netscape these font sizes are separately settable and by default the same font size is used for both, but "the same" is the technical size in points - in practise monospaced font looks bigger than normal proportional font!

Thus, avoid messing up with font sizes; use phrase markup and other structural elements and let the users, if they dislike the font sizes, define fonts in their browser settings the best they can.

The following table is intended for giving an idea of the variation. It (verbally) presents the rendering of markup elements in Netscape Navigator, Microsoft Internet Explorer, and Lynx. Notice that there is variation even within each of these programs - depending on version, platform, and system-wide or user's own configuration, so this is just a typical situation. Thus, consider this as what different things might happen rather than as a description of what actually happens in some particular program.

element Netscape Internet Explorer Lynx
EM italics italics underlined
DFN normal text italics normal (monospaced)
CODE monospaced monospaced small normal (monospaced)
SAMP monospaced monospaced small normal (monospaced)
KBD monospaced monospaced small normal (monospaced)
VAR italics monospaced small normal (monospaced)
CITE italics italics underlined
TT monospaced monospaced small normal (monospaced)
I italics italics underlined
B bold bold underlined
U normal text underlined underlined
STRIKE strike-through strike-through text between [DEL: and :DEL]
BIG larger than normal larger than normal normal text
SMALL smaller than normal slightly smaller than normal normal text
SUB lowered, slightly smaller lowered normal text
SUP raised, slightly larger raised normal text

These relate to unnested elements. Nesting of text elements may affect the rendering.

Presenting interaction with computer

In order to present text-based interaction between a human being and a computer, or similar situations, the following approach can be used: In all cases, the principles on division into lines and the use of blanks and tabs must be taken into account, and this may require the insertion of BR elements or the use of PRE elements. Notice that logical markup is allowed within a PRE element (although possibly not implemented in a quite satisfactory way).

The following example illustrates the approach in the context of an introduction to the Perl programming language.

Example interact.html:

<P>The following Perl script prints out its input so that each line begins with
a running line number:</P>
<PRE><CODE>
#!/usr/bin/perl
$line = 1;
while (&lt;&gt;) {
  print $line++, " ", $_; }
</CODE></PRE>
<P>The scalar variable <CODE>$line</CODE> is of course the line counter.<P>
<P>The loop construct is of the form<BR>
<CODE>while (&lt;&gt;) {</CODE><BR>
<VAR>process one line of input</VAR> <CODE>}</CODE><BR>
</P>
<P>Assuming that you have written this script (the simpler version of it) into a
file named <KBD>lines</KBD>, you could test it using a command of the form<BR>
<KBD>./lines</KBD> <VAR>datafile</VAR><BR>
In particular, using the script as input to itself, you would do as follows
(the details of system output vary from one system to another):
</P>
<PRE>
<SAMP>lk-hp-23 perl 251 % </SAMP><KBD>./lines lines</KBD>
<SAMP>1 #!/usr/bin/perl
2 $line = 1;
3 while (<>) {
4   print $line++, " ", $_; }
lk-hp-23 perl 252 % </SAMP>
</PRE>

Notes on the example:

Controlling the layout

First, get the structure of your document right. Then, if needed, consider making the layout better. Notice that different browsers use different layouts, and even the same browser may display the same document differently in different environments. For instance, when the user changes the size of his Netscape window, the layout may change radically.

Thus, on the Web there is no such thing as the layout of a document. As an author you cannot dictate layout, just make some efforts to affect it. The following notes, and all information related to layout-oriented features of HTML, should be read with this in mind.

Several HTML elements have optional attributes which can be used to affect the way in which the element is rendered. Consult the detailed descriptions of individual HTML tags to see the possibilities and to read notes about them.

In particular, you may wish to center parts of the text to make them more distinguishable from normal text. You can use the ALIGN=CENTER attribute in several elements like P or DIV (or the separate CENTER element).

If you wish to separate major portions of your document visually from each other, you can use the HR element. Typically it is rendered as a full width horizontal line. But please use this in addition to structuring tools like headings, not as a substitute for them.

As regards to detailed layout issues such as forcing or preventing line breaks, see section Division into lines and the use of blanks and tabs. Font issues were discussed above.

Links

Links (often called hyperlinks) are the feature which justifies the HT in HTML (HyperText Markup Language).

Technically links are specified using A (anchor) elements, and the technical issues are discussed in the description of the A tag. Here we just present the basic idea, a very simple example, and a few pragmatic or stylistic notes.

A link is a directed connection between a particular point in a document and another particular point in the same or another document. The points are often called anchors in HTML terminology.

The two ends of a link (the anchors) are in different logical positions: the link is from one point to another. The latter, called the target of the link, is very often the beginning of a document or, perhaps more logically speaking, an entire document.

In the simplest case, you create a link from one point of your document to another document (which could be your own or written by someone else, perhaps physically located at the other side of the globe). You have to decide which words act as a visual representation of the link, ie as the phrase which refers to the other document, and you need to know the Web address (the URL) of that document. Then you just put the pieces together into a suitable A element. For instance:

I work at <A HREF="http://www.hut.fi/english.html">HUT</a>.
This might, in one environment, be rendered as follows:

I work at HUT.

The link text, here the abbreviation HUT, acts as a link to a Web document which explains what the abbreviation means and also provides a lot of information about it. The renderings vary a lot - the link text might be underlined, colored, or otherwise distinguishable from normal text. The user (reader) is assumed to know how links are rendered in the particular environment.

Although it is technically easy to set up links, it is pragmatically often very difficult to use them the right way. Here are some practical guidelines:

Images, formulas, etc.

Basically, the image support in HTML is just an interface to the world of graphics. The creation and manipulation of images, the graphics formats and other graphics stuff is not part of HTML. In particular, the HTML specification does not pose any requirements or restrictions on the graphics formats supported by Web browsers.

Assuming that we have some graphics in some format in a file, there are two essentially different ways to use it in a Web document. You can either link to it or to embed it into your document. In the first case, you use an anchor (A) element; in the latter case, an IMG element. In the first case, when a user accesses your document he sees eg a verbal phrase which acts as a link, and activating that link causes an image to be displayed, either in the same window or in another, depending on the browser and its settings. On the other hand, an embedded image is part of your document; when a user accesses your document, the image is loaded along with it and displayed as part of it.

In both cases, the user will see the image only if the browser supports the particular graphics format. The most commonly supported formats are GIF and JPEG. They are often the only formats supported for embedded images. For linked images, the support is typically wider (it might include eg PostScript, PDF, and PNG) and extensible by the user (by installing new viewers and making suitable additions to the settings of the browser). The reason is that linked images are typically implemented so that the browser knows nothing of the graphics format itself but only knows how to launch a separate program to present it.

As a special case, it is possible to combine linking and embedding in a sense: you can create a document which contains an image which acts (instead of verbal link text) as a link to another image. Typically, the embedded image is rather small, stamp-like, often a small coarse version of the image to which it points as a link.

Linking to an image is usually permitted without specific permission. On the other hand, embedding an image means using it in a way which requires the author's permission, and the author must be mentioned. (See Web Law FAQ.) Obviously, some images are so simple that copyright is not applicable. Moreover, there is a large number of collections of images, some of which are in the public domain.

To illustrate linking to images and embedding images, let us consider a GIF image which has been put onto a suitable place so that it is accessible using the URL http://www.hut.fi/%7elsarakon/sae.gif. Now I could refer to it in the following way:

Example sae.html:

<A HREF="http://www.hut.fi/~lsarakon/">Liisa Sarakontu</A> has drawn
<A HREF="http://www.hut.fi/~lsarakon/sae.gif">a picture of
Siamese algae eater</A>.
On the other hand, since Liisa has given me the permission to do so, I could embed the image into a document of mine as follows:

Example sae-2.html:

The Siamese algae eater (<I>Crossocheilus siamensis</I>) is often
mixed up with another algae eating fish, the "false Siamensis"
(<I>Garra taeniata</I> or <I>Epalzeorhynchus sp.</I>). Below you
can see drawings of them by
<A HREF="http://www.hut.fi/~lsarakon/">Liisa Sarakontu</A>.
<P>
<IMG SRC="http://www.hut.fi/~lsarakon/sae.gif" ALT="[Picture of Siamese
algae eater]">
<P>
<IMG SRC="http://www.hut.fi/~lsarakon/false.gif" ALT='[Picture of "false
Siamensis"]'>
The issue of good use of images is very difficult any many-faceted. No attempt to cover it will be made here. The author has written a separate treatise How to use images in communication in general and on the Web in particular.

There is no general support in HTML 3.2 to presenting mathematical formulas. Consult the W3C document on Math Markup to see what work is in progress in this respect. However, you can use some software (eg TeX) to produce the representation of a formula as an image, eg in PostScript form, and use the IMG tag to embed it into your document or the A tag to create link to it. The latter method is often worth considering, especially for large formulas. The reader may prefer reading the text without distractions and looking at the formula (image) at the very moment he is prepared to do so. Moreover, he may prefer looking at it in a separate window (which is separately adjustable in size and positionable on the screen).

In some cases, when just a few separate symbols are needed within the text and they have reasonable textual alternatives, the following kind of approach can be suitable:

Example sigma.html:

The Greek letter <IMG SRC="http://www.ece.cmu.edu/icons/Sigma.xbm"
ALT="sigma"> is often used to denote summation.
There is a problem, however: since an image has fixed dimensions whereas the size of letters is browser-dependent, there might be an unesthetic disproportion.

Sometimes it is best to present mathematical expressions in linearized notation. For example, instead of trying to find a way of presenting the square root of 2 in the normal mathematical way, you might write just sqrt(2). It depends on intended audience whether you need to explain such notations.

Tables (Not in HTML 2.0!)

Index: See also Dianne Gorman's excellent Introduction to Tables (part of her Introduction to HTML).

The table concept in HTML 3.2

In HTML, a table is a structure consisting of rows and columns, which can have headers (names, titles, explanations). A table is typically rendered in some natural way corresponding to the structure, with columns adjusted accordingly. The components, or cells, of a table may contain any text elements or even block elements and headings. Thus, table element might be a number, a word, a text paragraph, an image, or something more complicated.

Table cells are often called table elements, but it is best to avoid that in the HTML context, since it might cause confusion eg with the TABLE element, which is the HTML description of an entire table.

Tables are the most important improvement in HTML 3.2 in comparison with HTML 2.0. On the other hand, the table constructs of HTML 3.2 are only a subset of The HTML3 Table Model (RFC 1942).

Unfortunately tables are not yet supported by all browsers, and even if support exists it may be of poor quality. (Text-only browsers and speech-based user agents will always have difficulties with complicated tables, of course.) See Alan Flavell's review Tables on non-table browser for information about making tables look somewhat reasonable, if possible, also on browsers which do not support tables.

Another unfortunate situation is that people have started using table elements just to get a desired layout of pages, not to represent data which is logically matrix-like in structure.

Tags used to represent tables

Representing a table involves several kinds of HTML tags:

The very basic table structure

Let us start with a very simple example. It consists of a 2 by 2 table of numbers (a unit matrix), with no headers whatsoever. The HTML code is as follows:

Example table1.html:

<TABLE>
<TR> <TD> 1 </TD> <TD> 0 </TD> </TR>
<TR> <TD> 0 </TD> <TD> 1 </TD> </TR>
</TABLE>
and it looks like the following on a typical browser:

1 0
0 1

Thus, the TABLE tags enclose the table rows, each of which is enclosed by TR tags and enclose table cells enclosed by TD tags. This corresponds to the logical structure of a table as a set of rows consisting of cells. You can abbreviate the table structure by omitting the TD and TR end tags (since a browser implicitly assumes them), but at the expense of losing the logical clarity to some extent:

<TABLE>
<TR> <TD> 1 <TD> 0
<TR> <TD> 0 <TD> 1
</TABLE>

Moreover, although omitting those end tags is legal HTML 3.2, it may in practise confuse some browsers (including Netscape) in some cases.

The use of blanks and newlines in the HTML code for a table is irrelevant to the visual appearance of a table when viewed with a browser, since that appearance is controlled by HTML tags. However, it is often useful to position table elements suitably in the HTML code so that items in the same column are adjusted to make the structure clear for you (or whoever has to maintain the HTML document).

Additional features; a typical table with text cells

There are several separate features which you will often like to add to this simple table model: The following, rather typical, example uses all of the above-mentioned features:

Example table2.html:

<P>An illustration of the use of the TABLE element in HTML.</P>
<TABLE BORDER=1>
<CAPTION>Finnish, English, and scientific names for some animals</CAPTION>
<TR><TH>Finnish name</TH><TH>English name</TH><TH>Scientific name</TH></TR>
<TR><TD>hirvi</TD><TD>elk</TD><TD><I>Alces alces</I></TD></TR>
<TR><TD>orava</TD><TD>squirrel</TD><TD><I>Sciurus vulgaris</I></TD></TR>
<TR><TD>susi</TD><TD>wolf</TD><TD><I>Canis lupus</I></TD></TR>
</TABLE>
Notice that some table elements in the example contain text markup; in this case, there is a specific reason for using the I element.

Parallel texts

If you have logically parallel texts, such as a document in several languages or several variants of the same text, the TABLE element is probably the best way of presenting them. (Using a PRE element is possible but requires tedious formatting by hand and results in the text being displayed in monospaced font.)

In the simplest case you can just write a TABLE element (with attributes defaulted) which contains a single row which contains two data cells, each of which contains a paragraph.

In a more general case, you should divide the parallel texts into logical parts, such as paragraphs, and make each part a cell of the table. This may require a lot of work (unless you have a suitable program to do the job), since you must take care of "merging" the text: after the first part of the first text, you must have the first part of the second text, etc.

The following example presents a passage from the Bible in three versions and translations:

Example table3.html:

<TABLE>
<CAPTION><STRONG>The beginning of Genesis
in three languages</STRONG></CAPTION>
<TR ALIGN=LEFT VALIGN=TOP>
<TH><TH>Latin (Vulgate)</TH><TH>English (King James version)</TH>
<TH>Finnish (1992 version)</TH>
</TR><TR ALIGN=LEFT VALIGN=TOP>
<TH>1</TH>
<TD>In principio creavit Deus caelum et terram.</TD>
<TD>In the beginning God created the heaven and the earth.</TD>
<TD>Alussa Jumala loi taivaan ja maan.</TD>
</TR><TR ALIGN=LEFT VALIGN=TOP>
<TH>2</TH>
<TD>Terra autem erat inanis et vacua et tenebrae super faciem
abyssi et spiritus Dei ferebatur super aquas.</TD>
<TD>And the earth was without form, and void;
and darkness was upon the face of the deep.
And the Spirit of God moved upon the face
of the waters.</TD>
<TD>Maa oli autio ja tyhjä, pimeys peitti syvyydet,
ja Jumalan henki liikkui vetten yllä. </TD>
</TR><TR ALIGN=LEFT VALIGN=TOP>
<TH>3</TH>
<TD>Dixitque Deus "Fiat lux" et facta est lux.</TD>
<TD>And God said, Let there be light: and there was light.</TD>
<TD>Jumala sanoi: "Tulkoon valo!" Ja valo tuli.</TD>
</TR></TABLE>
Notice that the ALIGN and VALIGN attributes can be essential for achieving good rendering. Browsers cannot know the nature of tables from their contents, so there are situations where the document author may need to control formatting issues like alignment.

Using a table to present a definition list

As mentioned in the discussion of list elements like DL, the typical rendering of "definition lists" is not very good. Moreover, there are just a few ways to affect the rendering.

Using a TABLE element for a definition list is perhaps not an intended use of that element but it is often useful, especially since the author can control things like alignment and use of borders. Consult the document Examples of various list elements in HTML for a very simple example of presenting a definition list as a table with default attribute settings. Usually you probably want the "definition terms" to be left-aligned, as in the following example:

Example table4.html:

<TABLE>
<CAPTION>The first three letters of the Greek alphabet</CAPTION>
<TR><TH ALIGN=LEFT>alpha</TH>
<TD> the first letter of the Greek alphabet </TD> </TR>
<TR><TH ALIGN=LEFT>beta</TH>
<TD> the second letter of the Greek alphabet </TD> </TR>
<TR><TH ALIGN=LEFT>gamma</TH>
<TD> the third letter of the Greek alphabet. </TD> </TR>
</TABLE>

Numerical tables

For many people, tables are essentially tables of numerical data. As the preceding examples show, tables have a lot of other use as well.

For numerical tables, proper alignment is usually crucial for easily readable rendering. (It is in a sense a structural feature, since it relates to the comparability of items of a column.)

Integer values in a column should be right aligned. This is easy to achieve in principle. There are two alternatives:

Values containing a decimal point (or, in many languages, a decimal comma) should be aligned according to that separator, but unfortunately this is not possible in HTML 3.2. (There are suggested ways of expressing such requests, but currently there is little if any support for them.) One solution is to present such values so that there is the same number of digits to the right of the decimal point in every value in a column, and use ALIGN=RIGHT.

However, the rendering might be unsatisfactory if numbers are presented using a proportional font so that digits are of essentially different sizes. It is possible but tedious to overcome this by putting the data in each numerical cell within a TT element. (Notice that it is not legal for a TT element to contain a TABLE element!)

The following example contains first a hand-formatted table presented using the PRE element, then the same data using a TABLE element. In general, it takes more work and care to use a TABLE element but the result is often much better.

Example table5.html:

Measurement results:
<PRE>
time     temperature   pressure
12:00       26           12.8
12:15       22.5          9.8
12:30       11            1.65
12:45        3.3          0.03
13:00        0.05         0.002
</PRE>

<TABLE>
<CAPTION>Measurement results</CAPTION>
<TR><TH>time</TH><TH>temperature</TH><TH>pressure</TH></TR>
<TR ALIGN=RIGHT><TD>12:00 </TD><TD>26.00 </TD><TD>12.800 </TD></TR>
<TR ALIGN=RIGHT><TD>12:15 </TD><TD>22.50 </TD><TD> 9.810 </TD></TR>
<TR ALIGN=RIGHT><TD>12:30 </TD><TD>11.00 </TD><TD> 1.650 </TD></TR>
<TR ALIGN=RIGHT><TD>12:45 </TD><TD> 3.30 </TD><TD> 0.030 </TD></TR>
<TR ALIGN=RIGHT><TD>13:00 </TD><TD> 0.05 </TD><TD> 0.002 </TD></TR>
</TABLE>

Using tables to represent menus

Very often one needs to present a relatively large set of relatively small items. For instance, suppose that we have documents about various countries and we wish to provide a menu of country names, to be used as an index.

The index is implemented in HTML using normal links, eg
<A HREF="af.html">Afghanistan</A>
What we will discuss here is how to present the link names, or some other pieces of text, as a list, table, or some other structure.

If you only read HTML specifications, the obvious answer is to use the DIR or MENU construct. However, as mentioned and exemplified in the general discussion of lists, this is not practically feasible. Thus, if we prefer having the menu in multicolumn format, as we usually do, we must use other constructs.

One possibility is to format the menu by hand and enclose it into a PRE element. If the menu items are link texts, you should first format it as text only, then add the anchor (A) tags, since adding them obscures the layout. For clarity, therefore, the following example is presented without links (unlike the other alternatives):

Example menu1.html:

<PRE>
Afghanistan           Albania               Algeria
American Samoa        Andorra               Angola
Anguilla              Antarctica            Antigua and Barbuda
Arctic Ocean          Argentina             Armenia
</PRE>
Another possibility, which should be the normal one, is to present the items simply as a text paragraph, using eg a blank or a blank and a comma as separator. This means that the browser takes care of dividing the text into lines and the presentation is very compact:

Example menu2.html:

<BASE HREF="http://www.odci.gov/cia/publications/nsolo/factbook/">
<P>
<A HREF="af.htm">Afghanistan</A>,
<A HREF="al.htm">Albania</A>,
<A HREF="ag.htm">Algeria</A>,
<A HREF="aq.htm">American Samoa</A>,
<A HREF="an.htm">Andorra</A>,
<A HREF="ao.htm">Angola</A>,
<A HREF="av.htm">Anguilla</A>,
<A HREF="ay.htm">Antarctica</A>,
<A HREF="ac.htm">Antigua and Barbuda</A>,
<A HREF="ocat.htm">Arctic Ocean</A>,
<A HREF="ar.htm">Argentina</A>,
<A HREF="am.htm">Armenia</A>
</P>
Of course, it is possible to force line breaks by using a BR element (eg to make a change in the initial letter cause a new line in an example like above). If you think the items are not distinguishable enough in the rendering, consider prefixing each item with a special character like * (and using just spaces as separator).

However, if for some reason the presentation must be such that all items occupy the same amount of space, then one can either use the PRE method described above or take the effort of designing a suitable TABLE element. Example:

Example menu3.html:

<BASE HREF="http://www.odci.gov/cia/publications/nsolo/factbook/">
<TABLE><TR>
<TD WIDTH=160><A HREF="af.htm">Afghanistan</A></TD>
<TD WIDTH=160><A HREF="al.htm">Albania</A></TD>
<TD WIDTH=160><A HREF="ag.htm">Algeria</A></TD>
<TD WIDTH=160><A HREF="aq.htm">American Samoa</A></TD>
</TR><TR>
<TD WIDTH=160><A HREF="an.htm">Andorra</A></TD>
<TD WIDTH=160><A HREF="ao.htm">Angola</A></TD>
<TD WIDTH=160><A HREF="av.htm">Anguilla</A></TD>
<TD WIDTH=160><A HREF="ay.htm">Antarctica</A></TD>
</TR><TR>
<TD WIDTH=160><A HREF="ac.htm">Antigua and Barbuda</A></TD>
<TD WIDTH=160><A HREF="ocat.htm">Arctic Ocean</A></TD>
<TD WIDTH=160><A HREF="ar.htm">Argentina</A></TD>
<TD WIDTH=160><A HREF="am.htm">Armenia</A></TD>
</TR></TABLE>
Alternatively, you might wish to consider the effect of using a table with borders.

Notice that this solution is rather unclean. It involves a TABLE structure where the division into lines is (normally) made for layout purposes only, and adding new items usually requires complete restructuring of the table. You typically need to insert WIDTH attributes to ensure that table columns are of the same width, and the specification is inherently device-dependent since it must be given in pixels. In particular, the pr