Chapter 1

Web Publishing: A Technology Overview

by William Robert Stanek


CONTENTS

The World Wide Web is rapidly evolving into a medium that rivals television for information content and entertainment value. Millions of people and thousands of businesses around the world are racing to get connected to the global Internet and the World Wide Web because the Web is the most powerful and least expensive medium to publish in. Whether you're an information provider or simply a creative person who wants to publish your own work, no other medium empowers the individual like the Web; it levels the playing field, allowing a one-person operation to compete head-to-head with corporate conglomerates.

To publish successfully on the Web, you don't have to be a genius, a programmer, or a computer guru with insider secrets. What you need are the practical advice, tips, and techniques you will find throughout this book. Many books on Internet and Web publishing discuss theories, but rarely follow a practical approach to Web publishing. Books without practical examples and genuinely useful information can leave you wondering where to start, how to start, and what to do when you do finally manage to start. This chapter, like all the chapters in this book, is filled with useful information designed to unleash the topic of Web publishing and help you become one of the thousands of successful Web publishers offering ideas, services, and products to millions of Web users.

This chapter gives you an overview of the technologies that make Web publishing possible. You can use this information as a launching pad toward success in Web publishing.

Overview of Web Publishing's Past

The World Wide Web is an open-ended information system designed specifically with ease of use and document interchange in mind. In early 1989, Tim Berners-Lee of the European Laboratory for Particle Physics (CERN) proposed the Web as a way for scientists around the world to collaborate, using a global information system based on hypertext.

In the fall of 1990, the first text-only browsers were set up, and CERN scientists could access hypertext files and other information at CERN. However, the structure of hypertext documents and the way they were transferred to remote sites needed to be better defined. Based on proposals by Berners-Lee, the structure of hypertext documents was defined by a new language called the HyperText Markup Language (HTML). HTML was based on a subset of the Standard Generalized Markup Language (SGML), already in wide use. To transfer HTML documents to remote sites, a new protocol was devised called HTTP (hypertext transfer protocol).

HTTP offers a means of moving from document to document and indexing within documents. The power of hypertext is its simplicity and transparency. Users can navigate through a global network of resources at the touch of a button. Hypertext documents are linked together by keywords or specified hot areas-such as graphical icons or even parts of indexed maps-in the document. When a new word or idea is introduced, you can, with the help of hypertext, jump to another document containing complete information on the new topic. Readers see links as highlighted keywords or images displayed graphically and can use these links to access additional documents or resources.

In the fall of 1991, conference-goers around the world started hearing about the promise and ease of hypertext, but sparks still weren't flying. By early 1993, there were only about 50 Web sites worldwide. Then a wonderful thing happened. A browser that allowed users to take advantage of the Web's graphical capabilities was developed at the National Center for Supercomputing Applications (ncSA). ncSA called the browser Mosaic. For a time, it seemed as though the Web and Mosaic were synonymous. Interest in the Web began to grow, from a trickle of interest to a great flood of enthusiasm. Today, the Web is the hottest and fastest growing area of the Internet, and Mosaic is only one of dozens of available browsers.

You've undoubtedly used a browser before, but you might not have thought about what makes a browser work the way it does. The purpose of a browser-also called a client-is to request and display information. Clients make requests to servers, then servers process those requests based on a set of rules-called a protocol-for communicating on the network. Protocols specify how programs talk to each other and what meaning to give to the data they receive. Many protocols are in use on the Internet, and the Web uses them all; however, the primary protocol is HTTP.

Generally, HTTP processes are transparent to users. To request information from a server, all the user has to do is activate a hypertext reference, and the user's browser takes care of interpreting the hypertext transfer commands and communicating requests. The mechanism on the receiving end, which is processing the requests, is a program called the Hypertext Transfer Protocol Daemon (HTTPD). A daemon is a UNIX term for a program that processes requests. If you've used a UNIX system, you have probably unknowingly sent requests to the Line-Printer Daemon (LPD) to print material to a printer by using the commands lp or lpr. The HTTPD resides on the Web server, which is at the heart of your connection to the Web.

Using the Web's hypertext facilities, you have the freedom to supply information to readers in powerfully innovative ways. The entrepreneurs who fostered the Web's growth started by creating small publications that used few of the Web's graphical and multimedia capabilities. This changed dramatically in a few short years, and today's Web publications use many of the Web's graphical, interactive, and multimedia features. New ways to publish on the Web are constantly being defined, and the features that tomorrow's publications will have may amaze you.

If you've browsed the Web, you've probably seen image maps, which are high-power graphical menus. There's no better way to create easy, graphic-based ways for users to browse information at your Web site. Using an image map, you can create a graphic image with multiple hot spots; each hot spot is a specific part of an image that the user can click on to access other documents and objects.

The wonderful thing about images is that you can pack the equivalent of hundreds of words into tiny symbols in your image map. Image maps are so user-friendly that you can pack a lot of information into a relatively small amount of space. Some image maps on the Web lead to dozens of pages, meaning virtually everything on the image is a doorway to something new. You'll learn all about image maps in Chapter 20, "Backgrounds, Image Maps, and Creative Layouts."

The specification for HTML 3.2 is a recent development in HTML publishing; it's a subset of the original HTML 3.0 specification and is based on features and extensions used in Web documents before May 1996. The first draft of the HTML 3.2 specification was released in May 1996.

Note
This book features extensive coverage of HTML 3.2. In fact, you will find five chapters loaded with information on HTML 3.2 in Part II, "Web Publishing with HTML 3.2."

However, the Web isn't defined by HTML alone. Many Web publishers are going back to the standard language HTML is based on-SGML. It's an advanced markup language that, although complex, offers better control over documents' layout than HTML does. SGML is also the basis for many page-definition languages used by publishing production systems, such as Adobe Acrobat and Common Ground.

Note
SGML is featured in Chapter 44, "Should You Upgrade to SGML?" You will also find a terrific SGML reference on the CD-ROM with this book.

Some Web publishers are looking at the origins of Web publishing, but others are taking giant leaps forward, made possible by innovators like Netscape Communications Corporation, Microsoft Corporation, and Sun Microsystems, Inc. In the fall of 1994, Netscape Communications Corporation released the first browser to support unique extensions to HTML. Netscape Navigator took the Internet community by storm and quickly became the most popular browser on the Net. A developer's site for Netscape products is featured in Figure 1.1.

Figure 1.1 : The Developer's Edge site from Netscape.

Tip
The Developer's Edge site (http://developer.netscape.com) is the place to find developer's information for Netscape. If you want to stay current with the cutting edge for Netscape products, visit this site often.

The browser that may replace top dog Netscape Navigator is Microsoft's Internet Explorer. Microsoft's Web site is shown in Figure 1.2, and, as you can imagine, the site showcases the browser Internet Explorer; it features extensions that enable Web publishers to add soundtracks and live video segments to their publications. When a reader accesses a publication with a soundtrack or a live video segment, the sound or video plays automatically if the reader's browser supports these extensions. Microsoft has a terrific reference resource for Microsoft Internet products called the Internet Center, pictured in Figure 1.2.

Figure 1.2 : For the latest in Web publishing technology, add the Internet Center to your hot list.

Tip
Those developing Web pages should add the Internet Center (www.microsoft.com/internet) to their hot list. It's the place to learn about the latest innovations for Internet Explorer, FrontPage, and many other Internet products from Microsoft.

Sun Microsystems (www.sun.com) has been a leading supporter of Web innovation. Recently, Sun Microsystems released the HotJava browser, written entirely in the Java programming language developed by Sun. The Java language is similar to C and C++, but its platform-independence is unique. Using Java, you can add programs called applets to your Web publications. Applets are self-running applications that readers can preview and play automatically. Sun has set up several Web servers to handle requests related to Java; the main server is at http://www.javasoft.com. Figure 1.3 shows a tribute to Java in the July issue of Sun Microsystem's online magazine.

Figure 1.3 : Java is often the main event in Sun's online magazine.

Note
The Java programming language is a hot topic in Web publishing. This book has three chapters designed to teach you everything you need to know to use Java in your Web pages; they are Chapter 36, "Including Java Applets in Your Web Pages," Chapter 37, "Writing Java Applets," and Chapter 38, "Integrating JavaScript and Java."

Ever since Java offered a taste of what powerful interactive content is like, Internet users around the world have clamored for more. Web developers and publishers seeking to answer this demand have taken Web publishing to new heights. Breaking new ground often meant thinking in completely new ways. For example, the traditional way to handle interactions between the client browser and the server is for the server to handle all the processing, but server-side processing is very restrictive and resource intensive. Seeking to solve this problem, Web developers looked to client-side handling of interactions, and another doorway was opened for Web publishers.

Through this doorway, you will find client-pull, client-side image maps, and client-side scripting. Client-pull is a wonderfully easy way to create documents that update themselves automatically. With client-side image maps, the user's browser can process the image map coordinates locally and more efficiently. Although client-pull and client-side image maps are terrific, client-side scripting is the innovation with the biggest impact on Web publishing. With client-side scripting, you can add completely interactive programs to your HTML documents.

Note
You will find in-depth coverage of client-side technology further on in this book. Client-pull is explored in Chapter 19, "Animating Graphical Images," and client-side handling of image maps is explored in Chapter 20.

Hot new scripting languages include the following:

JavaScript is a scripting language based on the Java programming language. This powerful up-and-coming scripting language is being developed by Netscape Communications Corporation. JavaScript can recognize and respond to mouse clicks, form input, and page navigation. This means your pages can "intelligently" react to user input. Using JavaScript is the main subject of Part VIII, "JavaScript and Java."

With VBScript (Visual Basic Script), Microsoft proves once again that it understands the tools developers need. VBScript is a subset of Visual Basic and is used to create highly interactive documents on the Web. As with JavaScript, programs written in VBScript are embedded in the body of your HTML documents. VBScript is covered extensively in Part VII, "ActiveX and VBScript."

Web publishers have been waiting for a scripting language like VRMLScript ever since the VRML standard was introduced. VRMLScript is the perfect marriage of VRML and client-side scripting. With VRMLScript, you can create dynamic interactive content for the Web. You will examine VRMLScript in Chapter 41, "Adding Behaviors with VRMLScript and Java."

Innovations by Netscape, Sun, and Microsoft represent only a small portion of the changes that are revolutionizing the way information is provided to millions of people around the world. These innovations, coupled with the explosive growth and enthusiasm in the Web, make now a more exciting time than ever to be a Web publisher.

As a Web publisher, you can publish information that will be seen by people in dozens of countries around the world, but the best news is that you as an individual can compete solely on the merits of your ideas, products, and services-not the size of your bank account. In Web publishing, you can reach the same audience whether your Web site is based on a $25 basic account from a service provider or a corporate Web server with leased lines costing $1,500 a month. Web users will judge your publications based on their information content and entertainment value.

Internet Standards and Specifications

Several standards in place on the Web allow information to be transferred the way it is; many of them relate to specifications for protocols that predate the Web, such as File Transfer Protocol (FTP) and Gopher. FTP provides a way to access files on remote systems. With FTP, you can log onto an FTP server, search for a file within a directory structure, and download the file. You also can upload files to the FTP server. Searching the file structures on FTP servers is a time-consuming process, especially if you don't know the directory of the file you're looking for. The basic functions of FTP have been extended in different ways. The most popular extension is Archie, which lets you search file archives easily by using keywords.

The Gopher protocol is similar to HTTP, but not as powerful or versatile. You can use it to search and retrieve information which is presented as a series of menus. Menu items are linked to the files containing the actual text. Gopher is most useful as the basis protocol for its more powerful and recent extensions, including Gopher Jewels, Jughead, and Veronica. Gopher Jewels enables you to search catalogs of Gopher resources indexed by category. Jughead lets you search Gopher indexes according to specified information. Veronica enables you to search Gopher menus by keyword.

The major shortcoming of early Internet protocols was the inability to access information through a common interface. Generally, files available through one interface weren't available through another. To get to information on an FTP server, you used FTP; for information on a Gopher server, you used Gopher. For files that weren't available through either FTP or Gopher, you could try to initiate a remote login to a host by using telnet. Sometimes you went from host to host looking for the information you needed.

Even with this simplified scenario, you can probably imagine how time-consuming and frustrating it was to track down information. Consequently, a major design issue for the Web was how to supply a common, easy-to-use interface to get to information on the Internet. To make sure information available through previous protocols is accessible on the Web as well, the Web was built on existing standards and specifications, like those for FTP and Gopher. Using these other protocols in your Web documents is easy-you simply specify the protocol in a reference to a uniform resource locator (URL). URLs give you a uniform way to access and retrieve files. Without one single way to retrieve files, Internet publishers and users would still be pulling their hair out.

Although the specification for URLs is an important specification for finding files on the Web, many other specifications play a major role in defining the Web. Specifications for HTTP define how hypertext documents are transferred, specifications for markup languages define the structure of Web documents, and specifications for multipurpose Internet mail extensions define the type of data being transferred and enable you to transfer any type of data on the Web. Finally, specifications for the Common Gateway Interface (CGI) make it possible for you to create dynamic documents. The following sections briefly explain each of these specifications, with emphasis on how they affect you as a Web publisher.

Transferring Files Using HTTP

HTTP is the primary protocol used to distribute information on the Web. It's a powerful, fast protocol that allows for easy exchange of files and is evolving along with other Web technologies. The original specification for HTTP is HTTP/0.9. HTTP version 0.9 has many shortcomings; for example, HTTP/0.9 doesn't allow for content typing and doesn't have provisions for supplying meta-information in requests and responses.

Content typing enables the computer receiving the data to identify the type of data being transferred. The computer can then use this information to display or process the data. Meta-information is supplemental data, such as environment variables that identify the client's computer. Being able to provide information about the type of data transferred, as well as supplemental information about the data, is important.

To address the shortcomings of HTTP/0.9, the current version of HTTP, HTTP/1.0 allows for headers with a Content-Type field and other types of meta-information. The type of data being transferred is defined in the Content-Type field. You can also use meta-information to offer additional information about the data, such as its language, encoding, and state information. (See Chapter 4 "Creating Web Documents with HTML," for a preliminary discussion on using meta-information in HTML documents.)

Most Web users and publishers want HTTP to address security; they want to be able to conduct secure transactions. The key issue in security for promoting the widespread use of electronic commerce is the ability to authenticate and encrypt transactions. Currently, there are several proposals for secure versions of HTTP. The two most popular secure protocols are Secure HTTP (S-HTTP) and Secure Socket Layer (SSL). When one of these specifications is adopted, secure transactions using HTTP will become a reality for mainstream Web users.

HTTP is a powerful protocol because it's fast and light, yet very versatile. To achieve this speed, versatility, and robustness, HTTP is defined as a connectionless and stateless protocol, which means that generally the client and server don't maintain a connection or state information about the connection.

Connectionless Versus Connection-Oriented Protocols

HTTP is a connectionless protocol. Connectionless protocols differ from connection-oriented protocols in the way requests and responses to requests are handled. With a connectionless protocol, clients connect to the server, make a request, get a response, and then disconnect. With a connection-oriented protocol, clients connect to the server, make a request, get a response, and then maintain the connection to handle future requests.

An example of a connection-oriented protocol is FTP. When you connect to an FTP server, the connection remains open after you download a file. The maintenance of this connection requires system resources. A server with too many open connections quickly gets bogged down. Consequently, many FTP servers are configured to allow only 250 open connections at one time, so only 250 users can access the FTP server at once. Additionally, processes that aren't disconnected cleanly can cause problems on the server. The worst of these processes runs out of control, uses system resources, and eventually crashes the server. The best of these processes simply eats up system resources.

In contrast, HTTP is a connectionless protocol. When clients connect to the server, they make a request, get a response, and then disconnect. Because the connection isn't maintained, no system resources are used after the transaction is finished. Consequently, HTTP servers are limited only by active connections and can generally handle thousands of transactions with low system overhead. The drawback to connectionless protocols is that when the same client requests more data, the connection must be reestablished. To Web users, this means a delay whenever they request more information.

Stateless Versus Stateful Protocols

HTTP is a stateless protocol. Stateless protocols differ from stateful protocols in the way information about requests is maintained. With a stateless protocol, no information about a transaction is maintained after a transaction has been processed. With a stateful protocol, state information is kept even after a transaction has been processed.

Servers using stateful protocols maintain information about transactions and processes, such as the status of the connection, the processes running, the status of the processes running, and so on. Generally, this state information resides in memory and uses up system resources. When a client breaks a connection with a server running a stateful protocol, the state information has to be cleaned up and is often logged as well.

Stateless protocols are light because servers using them keep no information about completed transactions and processes. When a client breaks a connection with a server running a stateless protocol, no data has to be cleaned up or logged. By not tracking state information, there's less overhead on the server, so it can generally handle transactions swiftly. The drawback for Web publishers is that if you need to maintain state information for your Web documents, you must include it as meta-information in the document header.

Determining the Structure of Web Documents

The way you can structure documents is largely determined by the language you use to lay out the document. Some languages are advanced and offer you extensive control over document layout. Other languages are basic and offer ease of use and "friendliness" instead of advanced features. The following sections take a look at commonly used languages, including

SGML

Most Web documents are structured with a markup language based on SGML. SGML defines a way to share complex documents by using a generalized markup described in terms of standard text. Describing complex structures with plain text ensures the widest distribution to any type of computer and presents the formatting in a human-readable form called markup. Because the markup contains standard characters, this also means anyone can create documents in a markup language without needing special software.

SGML is an advanced language with few limitations. In SGML, you have full control over the positioning of text and images, so text and images are displayed by the user's SGML browser in the precise location you designate. Although SGML is a powerful markup language, it isn't widely used on the Web. However, this is changing as more publishers become aware of SGML's versatility.

VRML

Technology on the Web is growing at an explosive pace, and one of the most recent developments is VRML. VRML enables you to render complex models and multidimensional documents by using a standardized markup language.

The implications of virtual reality for Web publishers are far-reaching. Using VRML, you can reduce calculations and data points that would have filled 10 M of disk space to just a few hundred lines of markup code. Not only does this feature drastically reduce the download time for VRML files and save network bandwidth, it also presents complex models in a readable and-gasp-understandable format. VRML isn't widely used on the Web yet, but it's attracting tremendous interest in the Internet community and the world community, as well. Although the current version of VRML is VRML 1.0, the Moving Worlds specification for VRML 2.0 has recently been approved and is gaining widespread support.

Note
Exploring the limitless possibilities of VRML is the subject of Part IX, "Creating VRML Worlds." This part of the book contains three chapters that showcase the VRML 2.0 Moving Worlds specification.

HTML

HTML is the most commonly used markup language. HTML's popularity stems mostly from its ease of use and friendliness. With HTML, you can quickly and easily create Web documents, make them available to a wide audience, and control many of the layout aspects for text and images. You can specify the relative size of headings and text, as well as text styles, including bold, underline, and italics. Extensions to HTML enable you to specify font type, but standard HTML specifications don't give you that capability.

Although many advanced layout controls for documents aren't available with HTML, it's still the publishing language of choice on the vast majority of Web sites. Remember, the limitations are a way to drastically reduce the complexity of HTML. Currently, HTML has three specifications: HTML 1.0, HTML 2.0, and HTML 3.2. Each level of specification steadily introduces more versatility and functionality.

In addition to these specifications, several Internet developers have created extensions to HTML. The extensions are nonstandard, but many have been accepted and used by Web publishers. Some extensions, such as Netscape's and Microsoft's, are so popular that they seem to be standard HTML.

Page Definition Languages

Some Web documents are formatted by using page definition languages instead of markup languages. Page definition languages often use formats specific to a particular commercial page-layout application, such as Adobe Acrobat or Common Ground. Page-layout applications are popular because they combine fine-tuned control over document layout with user-friendly graphical interfaces. Although the formats these applications use are proprietary, most of the formats are based on the standards set forth by SGML.

Identifying Data Types with MIME

With HTTP, you can transfer full-motion video sequences, stereo soundtracks, high-resolution images, and any other type of media you can think of. The standard that makes this possible is multipurpose Internet mail extensions (MIME). HTTP uses MIME to identify the type of object being transferred across the Internet. Object types are identified in a header field that comes before the actual data for the object. In HTTP, this header field is the Content-Type header field. By identifying the type of object in a header field, the client receiving the object can handle it appropriately.

For example, if the object is a gif image, it's identified by the MIME type image/gif. When the client receiving the object of type image/gif can handle the object type directly, it displays the object. When the client can't handle the object directly, it checks a configuration table to see whether an application is configured to handle an object of this MIME type. If an application is configured for use with the client and is available, the client calls the application, which then handles the object. In this case, the application would display the gif image.

Not only is MIME typing useful to HTTP, it's useful to other protocols, too. MIME typing was originally developed so that e-mail messages could have several parts, with different types of data in each part. In this way, you can attach any type of file to an e-mail message. The MIME standard is described in detail in Requests for Comments (RFCs) 1521 and 1522. (Many Internet standards and specifications are described in RFCs, which are a collection of documents about the Internet that cover both technical and nontechnical issues.) See Chapter 26, "Writing CGI Scripts," for more information on MIME types and their uses in your Web documents.

Tip
RFCs are great resources to browse. You can find RFCs at the following sites:
http://ds.internic.net/ds/dspg1intdoc.html
http://www.cis.ohio-state.edu:80/hypertext/information/rfc.html

Accessing and Retrieving Files with URLs

To retrieve a file from a server, a client must know three things: the address of the server, where on the server the file is located, and which protocol to use to access and retrieve the file. This information is specified as a URL. URLs can be used to find and retrieve files on the Internet with any valid protocol.

Although you normally use HTTP to transfer your Web documents, you can include references to other protocols in your documents. For example, you can specify the address to a file available through FTP simply by naming the protocol in a URL. Most URLs you use in your documents look something like this:

protocol://server_host:port/path_to_resource

The first part of the URL scheme names the protocol the client will use to access and transfer the file. The protocol name is generally followed by a colon and two forward slashes. The second part of the URL indicates the address of the server and ends with a single slash. The server host may be followed by a colon and a port address. The third part of the URL indicates where on the server the resource is located and may include a path structure. In a URL, double slash marks indicate that the protocol uses the format defined by the Common Internet Scheme Syntax (CISS); colons are separators. In this example, a colon separates the protocol from the rest of the URL scheme; the second colon separates the host address from the port number.

Note
CISS is a common syntax for URL schemes that involve the direct use of IP-based protocols. IP-based protocols specify a particular host on the Internet with a unique numeric identifier called an IP address or with a unique name that can be resolved to the IP address. Non-CISS URL schemes don't name a particular host computer. Therefore, the host is assumed to be the computer providing services for the client.

URLs, defined in RFC 1738, are powerful because they give you a uniform way to retrieve multiple types of data. Here are the most common protocols you can specify by using URLs:

FTPFile Transfer Protocol
GopherGopher Protocol
HTTPHypertext Transfer Protocol
mailtoElectronic mail address
ProsperoProspero Directory Service
newsUsenet news
NNTPUsenet news accessed with Network News Transfer Protocol
telnetRemote login sessions
WAISWide Area Information Servers
fileFiles on local host

Using these protocols in your Web documents is explored in Chapter 4.

Creating Dynamic Documents with CGI

The Web's popularity also stems from its interactivity. Web users click on hypertext links to access Web documents, images, and multimedia files, but the URLs in your hypertext links can lead to much more than static resources. URLs can also specify programs that process user input and return information to the user's browser. By specifying programs on the Web server, you can make your Web publications highly interactive and dynamic. You can also create customized documents on demand, based on the user's input and on the type of browser being used.

Programs specified in URLs are called gateway scripts; the term comes from the UNIX environment. Gateways are programs or devices that supply an interface. Here, the gateway or interface is between your browser and the server. Programs written in UNIX shells are called scripts by UNIX programmers because UNIX shells, such as Bourne, Korn, and C-shell, aren't actual programming languages. UNIX shells are easy to use and learn, so most gateway scripts are written in them.

The CGI specification describes how gateway scripts pass information to servers. CGI gives you the basis for creating dynamic documents, which can include interactive forms, graphical menus called image maps, and much more. The power of CGI is that it gives Web publishers a common interface to programs on Web servers. Using this common interface, Web publishers can provide dynamic documents to Web users without regard to the type of system the publisher and user are using.

Note
Understanding CGI is important to Web publishing. You will explore CGI in more depth in Part VI, "Adding Interactivity with CGI."

The Evolution of Standards and Specifications

The standards and specifications you read about in the previous section are the result of coordinated efforts by standards organizations and the working groups associated with them. Generally, these organizations approve changes to existing standards and specifications and develop new ones. Three primary standards groups develop standards and specifications for the Internet and networked computing in general:

The International Organization for Standardization

The International Organization for Standardization is one of the most important standards-making bodies in the world. The ISO doesn't usually develop standards specifically for the Internet; rather, it develops standards for networked computing in general. One of the most important developments by the organization is the internationally recognized seven-layer network model. The seven-layer model is commonly referred to as the Open Systems Interconnection (OSI) Reference Model.

Most Internet specifications and protocols incorporate standards developed by the ISO. For example, ISO standard 8859 is used by all Web browsers to define the standard character set. ISO 8859-1 defines the standard character set called ISO-Latin-1, which has been added to; the addition is called the ISO-Added-Latin-1 character set. You will refer to these character sets whenever you want to add special characters-such as &, ©, or ®-to your Web documents.

The Internet Engineering Task Force

The Internet Engineering Task Force (IETF) is the primary organization developing Internet standards. All changes to existing Internet standards and proposals for new standards are approved by the IETF, which meets three times a year to set directions for the Internet.

Changes to existing specifications and proposals for new ones are approved by formal committees, called working groups. The IETF has dozens of them. Each group typically focuses on a specific topic within a development area.

Note
The process for approving and making changes to specifications is standardized. The working groups propose Internet Draft specifications, such as the current specifications for HTML and HTTP. Internet Drafts are valid for six months after they're formalized. If the Internet Draft hasn't been approved in six months, then it expires and is no longer valid. If the Internet Draft is approved, it becomes an RFC.

RFCs are permanently archived and are valid until they're superseded by a later RFC. As their name implies, RFCs are made available to the general Internet community for discussion and suggestions for improvements.

Many RFCs eventually become Internet Standards, but the process isn't a swift one. For example, URLs were introduced by the World Wide Web global information initiative in 1990. They've been in use ever since, but the URL specification didn't become an RFC until December 1994 and was only recently approved as an Internet standard.

Figure 1.4 shows IETF's site on the Web. Here you can find information on current IETF initiatives, which include the latest standards and specifications for the Internet. You can find more information on the Internet Society and membership in the Internet Society at this Web site:

Figure 1.4 : The Internet Engineering Task Force Web site.

http://www.isoc.org/

Membership in the IETF is open to anyone. The directors of the working group areas handle the internal management of the IETF. These directors, along with the chairperson of the IETF, form the Internet Engineering Steering Group (IESG), which handles the operational management of the IETF under the direction of the Internet Society.

The World Wide Web Consortium

The World Wide Web Consortium (W3C) is managed by the Laboratory for Computer Science at the Massachusetts Institute of Technology. The W3C develops common standards for the evolution of the World Wide Web. It is a joint initiative between MIT, CERN, and INRIA. The U.S. W3C center is based at and run by MIT. The European W3C center is at the French National Institute for Research in Computing and Automation (INRIA). CERN and INRIA cooperate to manage the European W3C center.

The W3C was formed in part to help develop common standards for the development of Web technologies. One of the W3C's major goals is to offer a storehouse of information about the Web to Web developers and users. To do that, the W3C has sites where you can find the most current information on Web development. If you visit the page featured in Figure 1.5 and enter your e-mail address in the form at the bottom of the page, you will automatically be notified when the W3C updates the page on HTML.

Figure 1.5 : The HTML specification documents at the World Wide Web Consortium Web site.

Another goal of the W3C is to supply prototype applications that use new technologies proposed in Internet Drafts. The W3C works with its member organizations to propose specifications and standards to the IETF. Member organizations pay a fee based on their membership status. Full members pay $50,000 and affiliate members pay $5,000 for a one-year membership.

Summary

The Web was built on existing protocols and intended to provide a common interface to other protocols. Because of this design, you can use any valid protocol to transfer files. You primarily use HTTP to access your Web documents, but you can use other protocols, such as Gopher and FTP, to enhance the usefulness of your documents. The face of Web publishing is changing rapidly, and the way you specify the structure of Web documents is changing just as quickly. The most common way to structure Web documents is with HTML, but you can also use SGML, VRML, and page-layout applications to structure Web documents.

The MIME standard is what allows you to provide access to any type of document on the Web. With MIME, you can supply information about documents in the Content-Type header field. Browsers use the content type to take appropriate action on the document, such as displaying an image or calling another application. The URL standard, however, is what lets you access and retrieve files on the Web. With URLs, you can locate and retrieve files with the appropriate protocol. The final specification of interest to Web publishers is CGI; by using it, you can create dynamic documents.

To stay current with the latest developments on the Web, you should follow the Internet standards and specifications proposed by Internet standards groups, such as the IETF and the W3C.