Project 3: A Simple HTTP Client and a Simple HTTP Server
The project requires you to learn a non-trivial, existing network protocol, HTTP, and to thoroughly understand it, at least in the parts that are required by this project. Therefore, you will need to make a lot of efforts to conduct an independent research and self-learning before you can even start to write a single line of code.
Once you understand the protocol it is not too difficult (but can be tedious) to write a client and a server that is logically sound but inefficient. However, it is a challenge to design and implement an HTTP client and an HTTP server that are both correct to the protocol specification and also efficient!
This project involves an independent study of an existing protocol named HTTP (HyperText Transfer Protocol), version 1.1, and then design and implement a simple HTTP client that is able to communicate with any HTTP server, and design and implement a simple HTTP server that is able to provide simple services requested by any HTTP client.
Please note that a thorough understanding of the HTTP protocol is an important part of this project. Unless you are confident that you are able to gain a thorough understanding of the protocol, you should not select this project as your assignment.
This project has three components:
A summary of at least three pages long, explaining your understanding of the HTTP protocol. This documente should provide details of the protocol that are relevant to the project.
Design and implementation of an HTTP client program using C, see the detailed requirements below.
Design and implementation of an HTTP server program using C, see the detailed requirements below.
The HTTP Client
The client program must be able to request resources using the GET method from any HTTP server, such as Apache HTTP Server, Microsoft IIS and your own HTTP server. It must also be able to handle HEAD and TRACE methods. However, it is not required to render HTML or any graphics or to manipulate HTML links. Content from an HTTP server can be displayed in the same format as you receive it on the terminal.
The client should be named
The command takes an url and send an HTTP request message to the server and receives the response from the server. If there is no -a option, the program displays the content of the response. Otherwise it displays entire response message including both the headers and the content.
By default, the client uses GET method. But if the command line includes an -m option, the client should either use HEAD method or TRACE method: if the option is -m head, the client should use HEAD method; If the option is -m trace, the client should use TRACE method.
The HTTP Server
The server program must be able to perform the following tasks:
serve regular files in response to GET requests from any HTTP client, such as Mozilla Firefox, Google Chrome, Microsoft Internet Explorer as well as your own HTTP client. handle directory requests. If the url points to a directory file and the url has a trailing slash, returns the index file in the directory . If the directory does not contain a recognized index file, serve the directory listing instead. support redirection when the url points to a directory but without trailing slash. disallow backtracking beyond the document root using handle HEAD method and TRACE method from any client. input a MIME type file containing a list of MIME type definition and handle the content accordingly.
In addition, your server should run as a daemon and it should be able to serve multiple client requests simultaneously. The server should be named myhttpd with the following command line is an optional port number for the server. If the optional port number is not supplied, your server should use TCP port 8000.
The default port for HTTP is TCP 80. However, on Unix, only the root can use ports that are below 1024. To run your server with port 80, you must logon as the root (which is not possible in some machines such as ceto.murdoch.edu.au). This is the reason we use port 8000 instead of 80 as the default port.
When testing your server on a shared machine such as ceto, you should avoid using port 8000, as there may be clashes if two students use the same port at the same time on that machine. To avoid potential conflict you should use the TCP port allocated to you if you test your server on ceto. is an optional directory path representing the document root of the server under which all files are stored. Without this option, you document root is taken as the current directory of the server.
When testing your server, make sure that under your document root, there are regular files as well as directories. Some of these directories contain the index file index.html. Other directories do not contain this index file. Your should test both cases. is an optional path of the log file, to which the server will send its logs. Without this option, your log file should be ./myhttdp.log, i.e., file myhttpd.log under the server's current directory.
Your server must add one line into the log file for each client request, giving the date and time of the request, the HTTP method in the request, the originating host, the resource request and status code returned to the client. is an optional file containing a list of mime types that the server recognises. Without this option, your server would only recognise the following mime types: text/plain, text/html, image/jpeg, and image/gif. In testing your server, you must at least test the above file types plus two additional file types. is an optional number of preforks. Without this option, the number of preforks is fixed at 5.
Preforks mean that the server (parent server) will always maintain a pool of idle child processes (child servers), so that when a client request arrives, the parent server can immediately hand it to one of the idle child servers. In this way the client can get speedier service rather than waiting for the parent server to create a new child process before it can be served. Note that it takes a long time to create a new child process.
It is not a trivial task to design an efficient web server that can serve many client requests simultaneously and quickly. You may want to consider using clone systems call to create child processes to reduce the overhead. However, if you choose to do so, you must be careful with the global variables as they are shared between the parent and all child processes!
Please note that the server myhttpd can take any combination of the above five command line options, not necessarily one option each time. For example, we may run the server with two options as shown below:
To complete this project you will have to carry out research on issues such as web server, HTTP protocol, MIME type, etc. There are numerous Internet sites providing the relevant materials. A good starting point is Wikipedia.