by John Jung
Eventually, you'll probably want to put all the information in this book to use. To do so, you must have one very important item: a Web server. A Web server is a program that hands out Web pages to whoever asks for them. But Web server software is a special type of software. You can't simply create a Web site by getting the software and running it. There's much more involved in setting up and administering a Web server, as this chapter will explain.
Note |
This chapter focuses on installing and configuring ncSA and CERN Web servers on a UNIX platform. This was done because most Web servers use either CERN or ncSA. If they aren't, they are typically derivative versions of the two. Additionally, the majority of Web servers are still on UNIX machines. |
Although you may have been surfing the Web for a while, you might not necessarily know what Web servers do. There is, of course, the obvious job of giving Web pages to users who request them. However, Web servers also have other duties that they must perform. Some of these tasks are done at the request of the Webmaster; others are done at the request of individuals.
One of the most important jobs a Web server must be able to do is to perform user authentication. There are a number of times when you, the Webmaster, don't want everybody to have access to a part of your site. In this case, you use the Web servers to check whether specific users have access to a Web page.
Perhaps your Web site is a commercial venture, and you're selling access to parts of your site. Obviously, you wouldn't want nonpaying users to access those particular parts. If that happened, your company would quickly go out of business. Maybe your company has multiple levels of access to your particular site. The nonpaying customers have a lot of restrictions, the low-paying customers have some, and the high-paying customers have none.
Another possible reason that you would want to restrict user access is to protect information. This protection applies not only to Internet access, but intranet access as well. Large companies often are broken up into smaller groups, each with a different focus. Most of the time, each of these groups wants to keep some of its information private. Perhaps the research and development department has information about upcoming software releases that it doesn't want anybody to know, not even sales and marketing. Perhaps an important customer and a certain group have a constant flow of information between them. Certainly that frequent information exchange should be kept private between the group and the customer.
Another task that a Web server should be able to perform is to run scripts. One of the most important parts of a Web site is the CGI script (see Chapter 26, "Writing CGI Scripts"). These scripts can do a number of special functions.
One of the most common functions for a CGI script is to be used in an image map. If you recall from Chapter 20, "Backgrounds, Image Maps, and Creative Layouts," an image map is basically a picture with some defined regions. Each region points to a different part of the Web site, which can make navigating around your site easier. One of the principal components in a traditional image map is the CGI script. This script takes the mouse coordinates of where the user clicks and looks up the position. It finds the URL that corresponds to that position and returns the correct location to the Web browser.
Another use for scripts is to perform a simple form of animation on Web pages. Using the server push method, the Web server is instructed to send out a set of images. These images, all identical in dimension and at the same location, are sent out by a script. This script tells the server the order of the stream of images and regulates when the stream stops.
Another use of Web servers is to act as a proxy server to some other source of information, such as a database or FTP site. In particular, suppose you have a system that contained some sensitive information. To access the information, a person would have to first log in. You could then maintain tight control of the information on a per-user basis. Although this use is workable for small organizations, it may not be a viable option for large companies. In such situations, the people who can have access to the information may not all be in the same office. And the costs for networking all the remote locations together can be expensive.
Consequently, you may want to put the information behind a password-protected Web server. Anybody who doesn't have a Web account can't access the data behind the Web. This method makes it far easier for everybody who should have access to the information to get to it. Additionally, the user and password information for the Web server is centralized, so it is easier to add, delete, and modify the user accounts.
With the tremendous explosion of the Internet, Web sites are popping up all over the place. This fact could lead you to believe that there are a lot of different types of Web servers. In fact, most Web servers use one of two basic types of Web server software, both of which are free. One is the CERN server, which was created by the originators of the Web. The other is the ncSA server, which was created by the authors of Mosaic. Certainly, commercial Web server software exists, but these two are still the most popular. Also, many other available servers often have their roots in one of these two.
CERN, Conseil Europeen pour la Recherche Nucleaire, was originally founded as a European research center for particle physics. It has since diversified and, along with physics, also does research in electronics and computing. It is here that the World Wide Web was born. Since that time, the World Wide Web Consortium (W3C) has taken over the development of World Wide Web standards.
The CERN, or W3C, server has most of the tools and features that you'd expect from a Web server. The most notable feature lacking with the CERN server, which is in the ncSA server, is the support of SSI (server-side includes). SSIs are basically special markers in Web pages that are modified by the server. This allows Web pages to show you the current time, date, and weather condition, for example. The last major version of the W3C server is 3.0a (subsequent updates have been released to fix security issues). All future Web server development by W3C will be implemented in the Jigsaw Web server, a Java-based Web server that is fully object-oriented and completely modular.
The other popular Web server software is the ncSA Web server. ncSA (National Center for Supercomputing Applications) was where one of the first graphic Web browsers, Mosaic, was developed. The ncSA Web server has many of the same features as those found in the CERN server. However, in many respects, the ncSA Web server is noticeably faster than the CERN server.
Though similar to CERN, the ncSA Web server is by far more popular. The basic ncSA Web server has been modified a great deal by other companies. The Apache Web server, for example, has its roots firmly planted in the ncSA program. Also, Netscape Communications' NetSite Communications Server is a derivative from the ncSA program. Additionally, Microsoft's FrontPage Personal Web Server is also a modification of the ncSA server.
With two general Web servers, the inevitable question is, "What are the differences between them?" One of the most notable differences is the format of the image map definition file. Although minor, this information is always necessary for Webmasters and Web authors alike.
CERN and ncSA also differ in the features they support. For example, ncSA supports server-side includes. This feature allows the Web server to enhance Web pages that it's sending out. This enhancement typically takes the form of dynamically adding in some sort of information into a Web page. The information added is usually the local time, the local weather condition, or similar data. The CERN server doesn't support this feature.
To be fair, though, CERN has a notable feature not present in the ncSA Web server. That feature is a much finer control of what files are accessible to general users. ncSA, like CERN, allows the Webmaster to easily password-protect directory structures. However, CERN also lets the Webmaster password-protect individual files.
The two basic types of Web servers are fairly easy to obtain. Because the CERN server was developed by W3C, you can easily download it from its Web page. Similarly, the ncSA Web server is available through ncSA's home page.
You can get a copy of the CERN Web server by pointing your Web browser to ftp://ftp.w3.org/pub/httpd/old/. There, you'll see a list of all of the available precompiled CERN Web server binaries (see Figure 45.1). Find the system configuration that most closely matches your desired machine, and click it. Depending on the platform you selected, the binaries can take anywhere from about 400 K to 1.5 M.
Figure 45.1 : The quickest and easiest way to get the CERN binaries is to pick the one you want.
Note |
There are only precompiled binaries for CERN 3.0, not 3.0a. Because development has stopped on CERN, the only files available for 3.0a are the source code. |
Caution |
The precompiled binaries may not be what you want. They are compiled with a conservative set of build options. If you want to fully customize the CERN Web server, download the source code. You may need to modify the makefile and then build the server with the make command. |
After you download the binary distribution, you must extract the files. Depending on the distribution you selected, you either have to uncompress and untar the file or gunzip and untar the file. Regardless, after you unpack the distribution, you'll have a fully functional copy of the CERN Web server software.
You can install the CERN Web server software anywhere on your file system, and any user can run it. It doesn't have to be run as root or with any special privileges. Next, configure the Web server to suit your needs. You can then run the Web server by going to the bin directory and running the httpd program.
Note |
When executing CGI scripts, the CERN Web server takes on the privileges of its owner. Consequently, some files may not be accessible to the scripts. |
The ncSA Web server software is very easy to get and install. For the most part, you can simply follow ncSA's OneStep Downloader. This is basically a Web page that behaves similarly to a Microsoft wizard. You're asked a series of questions about your desired configuration, and the custom file is created for you. To enter the OneStep Downloader, point your Web browser to http://hoohoo.ncsa.uiuc.edu/docs/setup/OneStep.html. You're asked for your operating system, followed by seven questions, called directives (see Figure 45.2). The directives are important questions that directly affect the Web server software itself.
After you've filled in all the questions, click the Submit Customization button. Next, you are given a constant update on the status of the custom ncSA distribution. After the updates are done, you are given a hypertext link to click in order to get the custom distribution (see Figure 45.3). Along with the binary built to your specifications, you're given some installation instructions. All you have to do is uncompress and untar the specially created distribution.
Figure 45.3 : ncSA's OneStep creates custom archives for easy installation.
Because of the level of content control that the CERN server gives you, there's a fair amount of configuration. Part of the problem with configuring the CERN server is that there are so many different approaches to it. You can store the various parts of the server configuration information in any number of files. Fortunately, though, most people store the server configuration information in one file.
By default, the CERN Web server comes with a number of sample server configuration files. These files are stored in the httpd/config directory, and they all have the extension .conf. The CERN configuration file does not have to have a particular file name, however. To use a particular configuration file, run the Web server and use the -r command-line option, and then specify the path to the configuration file name. Suppose you downloaded the Web server software and installed it in the /foo directory. Also suppose that you wanted to use the httpd.conf sample configuration file that came with the server. To do so, you would type in the following:
/foo/httpd/bin/httpd -r /foo/httpd/config/httpd.conf
Probably the most confusing aspect of configuring the CERN Web server is the configuration files. The basic configuration files are straightforward and easy to understand. The problem is that when you want to create more sophisticated configuration files, the syntax starts getting tricky. Most of the general configuration options take just the attributes that they need. Some of the more advanced options require multiple attributes. One needs an entire classification of objects to be defined. Another needs a directory path and some classification to be applied to it. Still another option needs two directory paths so that the Web server can translate URL references.
Some important configuration options should be set regardless of the level of configuration files you want to create:
A basic CERN configuration file would look something like the following:
ServerRoot /home/myself/httpd Port 80 UserId myself GroupId mygroup Exec /cgi-bin/* /home/myself/httpd/cgi-bin/* Pass /* /home/myself/httpd/itip/*
One of the most commonly used capabilities of Web servers is the capability to regulate Web page access. As was previously mentioned in the "User Authentication" section, sometimes you don't want everyone to have access to a particular page. For CERN, this form of restriction is accomplished, as with almost everything else, in the configuration file. Unfortunately, to implement such a feature with the W3C server can be somewhat confusing, especially for people who've never used the CERN server before. Password-protecting a particular directory on the Web server involves two elements. The first defines who has access, and the second one defines the directory to be protected.
The first element needed in protecting a directory is to define a classification of users. The Web server uses this classification to determine which users can have access. To define this classification, you must use the Protection option. This option needs two parameters: the name for the classification and its corresponding information. The information for a classification consists of several subelements, UserId, GroupId, ServerId, AuthType, PasswdFile, GroupFile, and GETMask. The UserId and GroupId functions take the same parameters as they do in the main configuration file. The ServerId option has one option, the name of the classification, that is passed to the Web browser. The AuthType option specifies what sort of authentication scheme is used and typically should be set to Basic. The PasswdFile and GroupFile options specify an absolute path and file name for the location of the password and group files. The GETMask option defines which users or groups can submit form information to the protected directory. The following example shows what a typical Protection option should look like:
Protection ITIPWeb { UserId jjung GroupId users0 ServerId ITIPWeb AuthType Basic PasswdFile /home/jjung/httpd/itippasswd GroupFile /home/jjung/httpd/itipgroup GETMask all }
Caution |
Do not use standard UNIX password or group files for the PasswdFile or GroupFile options. Though similar in nature, Web password and group information is different from what's used by UNIX. |
Note |
You can create passwords to use with the PasswdFile option with the htadm utility. This utility is included with the standard distribution of the W3C server. |
The second element needed in protecting a directory is the Protect option, which takes two parameters. The first parameter is the URL that a user would type in to access a protected directory. The second is the classification rule to be applied to that particular directory. The following is an example of a Protect statement:
Protect /private/* ITIPWeb
To better illustrate how these two directives work together, suppose that you're in charge of http://www.mycom.com/, which is the organization's main Web page. Further, suppose that there's a /private directory off of the main server. You can password-protect the /private directory by putting the two preceding examples in your CERN configuration file. When users try to accesss http://www.mycom.com/private/, they'll be prompted for a user name and password. The information they enter must match the information contained in the PasswdFile directive.
Other Web servers have no facility to password-protect a single file. A common workaround is to put the file to be protected into a directory. Next, apply a password-protection scheme on that particular directory. Although workable, this solution is inelegant and imperfect because you need a directory for every file you want to password-protect.
In the CERN Web server, you can implement this level of protection by using a file named .www_acl. This file consists of a series of lines, each with three options that are separated by colons (:). The first option specifies the file or files that need protecting. The second option controls the privileges that can be performed on those files. This option can be GET, POST, or GET,POST. The final option specifies which individuals or groups have the granted privileges. The following is a sample .www_acl file:
marketing*.html : GET,POST : sales_marketing tech*.html: GET,POST : research_development *.html : GET : me, myself, I
Note |
The .www_acl file is only used when a Protect statement has been applied for that particular directory. So, if you wanted to protect some files in /foo, a Protect statement must exist in the configuration file for /foo. Once the Protect statement is in the configuration file, the server will use the /foo/.www_acl file. |
Where the CERN server has one configuration file, maybe two for file protection, the ncSA server has three. Consequently, configuring an ncSA server may be a bit more difficult than configuring a CERN server for first-time Webmasters. That's not to say that ncSA is worse than CERN, just that the learning curve is a little steeper in the beginning. ncSA server configuration is controlled by the httpd.conf, srm.conf, and access.conf files. Each has its own functions and uses for the ncSA server, which are detailed in the following sections.
Note |
The mime.types file is another configuration file. This file simply lists a MIME type followed by its extension. You need to modify this file if you add more file types to your server. |
Along with supporting the usual suite of Web server functions, the ncSA server has a particularly useful feature; it can be configured to work in a multihoming environment. Multihoming is the capability of a Web server to pretend to be more than one Web server. To enable this feature, you need to configure your system and your Web server.
The httpd.conf file is the main ncSA configuration file and is probably the easiest to understand. There are a number of options that you can specify in this file, each with a particular default. If you got your ncSA distribution with the OneStep Downloader, you pretty much don't have to modify this file. If, however, you've personally built the ncSA server to your specifications, this file will need some tweaking.
There are a number of very important configuration options that you should set in the httpd.conf file:
The following is an example of a httpd.conf file:
User myself Group mygroup ServerAdmin myself@mycom.com ServerRoot /home/myself/ncsa_httpd
The srm.conf file, short for Server Resource Map, is the configuration file that specifies a lot of ncSA's behavior. This file is used to control where users' home pages will be, where the default Web pages are, and similar information. As with the httpd.conf file, if you received your distribution from the OneStep Downloader, you won't need to change anything. However, there are some important configuration options that you may need to change if you built the ncSA server yourself:
The following is an example of a srm.conf file:
DocumentRoot /home/myself/ncsa_httpd/htdocs Alias /old/ /home/myself/old_pages/ ScriptAlias /cgi-bin/ /home/myself/ncsa_httpd/cgi-bin/ DirectoryIndex index.html
The final important configuration file for the ncSA Web server is the access.conf file. This file controls the behavior and protection for directories within a Web site. The syntax for the access.conf file is a bit different than for the other two configuration files. Rather than specifying what options are enabled or disabled for the whole system, you do it on a per-directory basis.
The general layout of this file is similar to the way HTML is written. There are starting and ending tags, which affect everything enclosed within. For the access.conf file, the starting tag is the <Directory> string. This string has one required attribute, the full physical path name for the directory. The ending tag is the </Directory> string. Most of the time, you can pretty much leave this file alone. However, there are directory directives that you can apply between the starting and ending tags, if you want:
The following is an example of an access.conf file:
<Directory /home/myself/other_pages/> Options Indexes </Directory> <Directory /home/myself/ncsa_httpd/htdocs> Options Indexes ExecCGI AllowOverride All </Directory>
Multihoming is the capability of a Web server to act as two separate Web servers. Typically, computers on the Internet have only one IP address and a host name. It's very easy for a particular computer to have multiple host names for a particular IP address. However, most UNIX systems have the capability to have two different IP addresses, in addition to having multiple host names. Typically, you can have one IP address per network interface, although some systems allow multiple IP addresses on a single interface.
The ncSA Web server is able to take advantage of a single computer with multiple IPs and make it act as separate systems. This feature is most commonly used by Internet Service Providers (ISPs) and Web Service Providers (WSPs). Often, they'll have many corporate customers, each wanting a different Web site. Consequently, the capability to perform multihoming is important for the business environment.
You can easily configure the ncSA Web server for multihoming by modifying the httpd.conf file. To do multihoming, you must use the <VirtualHost> option, which follows an HTML-like syntax. Specify two attributes: the IP address of the system you want to multihome and either the string Required or Optional. Most of the time, you should use Optional, which allows the server to start up even if the multihome configuration is bad. After specifying the starting element, you can put in as many configuration options as you want. These options are the same as those used for the nonmultihomed configuration. After specifying the options for the multihomed system, you have to close out the multihoming configuration with the </VirtualHost> string.
The following is a sample entry in the httpd.conf file for a multihomed system:
<VirtualHost 127.0.0.1 Optional> DocumentRoot /home/myself/ncsa_httpd/customer_htdocs ServerName www.mycom2.com ServerAdmin myself2@mycom2.com ResourceConfig conf/customer_srm.conf </VirtualHost>
Caution |
In configuring a multihome system, pay particular attention to the options being used. Some options use the ServerRoot as a top-level directory. Other options use absolute path names. |
A number of Web servers are available for your use. Two in particular, CERN and ncSA, are among the more popular, and they're both free. It's up to you as the Webmaster to decide which one is right for you. Take a look at the strengths and weaknesses of each one and determine which is best suited to your needs. Most of the time, you can simply download the precompiled binaries, and they'll work for your system.
The only tricky part about setting up your Web server is the configuration. The CERN server mostly uses just one configuration file, which has a number of configuration options. The ncSA server has three main configuration files, each with a different use and syntax.