Skip to content

Assignement 1 | Computer Networks SP 2022

Notifications You must be signed in to change notification settings

BarreiroCollege/USI-NTW-A1

Repository files navigation

HTTP Server

HTTP server running over TCP IPv4

Assignment 1 for Computer Networking course during the Spring Semester 2022 @ USI Università della Svizzera italiana.

All code is properly documented with extensive inline and definition comments.

Usage Instructions

This code has been written in Python 3.9, so it is guaranteed that no errors will appear there. However, lower Python versions may work properly.

Installation

No specific installation instructions are required. Just install a Python 3 version.

Running

To get the server up and running, just run the following command:

python server.py

By default, it will be running at port 8080, but it can be changed with the --port flag.
All options are listed here:

PS C:\Github\NTW22-1> python server.py --help
usage: server.py [-h] [-p [PORT]]

HTTP server based on TCP IPv4 with multithreading support.

optional arguments:
  -h, --help            show this help message and exit
  -p [PORT], --port [PORT]
                        port to use to listen connections
PS C:\Github\NTW22-1>

Tasks

This assignment was divided into several tasks. Each member of the group worked on different tasks as stated in the table below. As it can be seen, multiple members have worked on the same task.

Aristeidis Marina Diego
Task A
Task B
Task C
Task D
Task E
Task F
Task G
Task H
Task I
Task J
Optional Task A
Optional Task B

Thus, to properly manage the project, we have used Github and their features to manage it as a "big" software project. We used Issues to keep track of the pending tasks, bugs found, etc.; we set up a Milestone to view the deadline and the progress done, and Pull Requests to independtly work on different branches to avoid breaking or overwriting code.

For example, the Issues page looks like this right now after filtering for the specific tasks:

Project Structure

All the project has been created following an object-oriented pattern to ease operations. Some of these classes override basic Python operations (like __bytes__ or __setitem__), so code can be written clearer.

As required, server.py file is the entry point. It is actually the only executable file (the rest Python files are just modules, and will not run anything unless invoked from a different file). Thus, such file must be run as a Python script (and will not do anything if imported from a module). This file has the Server class, which actually keeps the server socket alive and parses the Vhost file (one object per virtual host). It will start listening for connections and, for each connection, will process it in a new thread.

The settings.py file defines some constants for the server, like the default running port (8080), the encoding (utf-8) to keep all request uniform, the server name (Group AMD Server) and the virtual hosts file (vhost.confs).

Task B websites are available in the respective folders (arisvrazitoulis.ch, marina.ch, diegobarreiro.es), as well as the virtual hosts file (vhosts.conf).

The http module takes care of creating the response and request objects. HttpRequest class will be constructed from a raw HTTP request, containing all the data inside the respective attributes. And then HttpResponse will contain all the data for the output response, which can be serialized into a raw HTTP response (in this case, some subclasses have been defined for the error codes to ease its usage which extend Exception, allowing to be raised and later caught by the Server to be sent as "valid" responses). enums.py file contains several available constants, like HttpMethod, HttpResponseCode and HttpVersion (which are used in the request and response objects) . And it has been defined a HttpHeader class as a key-value standarized header.

And finally, the utils module takes care of other minor tasks. entity.py file will generate the raw HTTP response from both request and response (from an OOP perspective, response could receive the request, but it would create a dependency between these objects which, in theory, are independent, as the response object "only" varies in status code, headers and body), and it will also inject some auto independent headers like Content-Length or Date. The mime.py file defines some custom MIME types for the GET method (this is explained later). And finally the vhosts.py file which contains the Vhost class with the attributes of a virtual host.

NTW22-1
├── README.md
├── arisvrazitoulis.ch
│   ├── giannena.jpeg
│   ├── index.html
│   └── me.jpg
├── diegobarreiro.es
│   ├── 404.html
│   ├── about.html
│   ├── assets
│   │   ├── css
│   │   │   └── features.css
│   │   ├── img
│   │   │   ├── etse.jpg
│   │   │   ├── home.jpg
│   │   │   ├── ies1.jpg
│   │   │   ├── ies2.jpg
│   │   │   ├── ies3.jpg
│   │   │   ├── mit1.jpg
│   │   │   ├── mit2.jpg
│   │   │   ├── pc1.png
│   │   │   └── pc2.jpg
│   │   ├── js
│   │   ├── vendor
│   │   │   ├── bootstrap-5.1.3
│   │   │   └── bootstrap-icons-1.8.1
│   │   └── video
│   │       └── kodular.mp4
│   ├── contact.html
│   ├── edu.html
│   ├── favicon.ico
│   └── work.html
├── guyincognito.ch
│   ├── home.html
│   ├── images
│   │   ├── avatar.png
│   │   ├── paddlin.png
│   │   ├── pglit2.gif
│   │   └── under_construction.gif
│   └── test
├── http
│   ├── enums.py
│   ├── header.py
│   ├── request.py
│   └── response.py
├── marina.ch
│   ├── images
│   │   ├── 1.jpeg
│   │   ├── 2.jpeg
│   │   └── 3.jpeg
│   ├── index.html
│   └── style.css
├── server.py
├── settings.py
├── utils
│   ├── entity.py
│   ├── mime.py
│   └── vhosts.py
└── vhosts.conf

HTTP Implementation

The first step of the HTTP process is generating the HttpRequest object. Upon receiving a new connection, Server will launch a new thread to start processing this new request (such thread will keep listening for connections in HTTP/1.1 if Connection: Close is not present). This new thread will call the constructor for HttpRequest, sending the raw bytes in the socket. The constructor will try to parse this HTTP request, considering the breakline as CRLF. If no errors are found, the HttpRequest object will be created. The following errors may be triggered (in the specified priority), and will raise the corresponding error breaking the procedure and returning the HTTP response earlier:

Status Code Class Reason
501 HttpResponseNotImplemented Specified request method is not implemented
505 HttpResponseHttpVersionNotSupported Not HTTP/1.0 or HTTP/1.1
403 HttpResponseForbidden Specified path is outside of the virtual host scope
404 HttpResponseNotFound Specified host is not available in the server
400 HttpResponseBadRequest Error parsing the request (malformed data)

Once the HttpRequest object is generated, the next step is generating the appropiate response for such request. So, back into the Server object, it will generate the HttpResponse object for such request. Depending on the method, different code is executed, so this part is better explained in the sections below for each method. Keep in mind that some errors can appear when generating the response as well, so a similar table as the one above will be present for each method.

And finally, now that both HttpRequest and HttpResponse objects are created, they are ready to be "merged" into the raw output for the socket to be sent to the client. Thus, entity.py will receive both objects and start generating this string, and will also inject some "auto" headers into the response object. These headers include Date or Server, for example. The return data is the actual string encoded in bytes that has to be sent back to the client as response.

Additionally, a custom error page feature has been implemented. In the root of a virtual host folder, files named CODE.html can be created, where CODE is an HTTP error code. When entity.py detects that response is an error response, if no content is specified and GET method has been requested, it will try to search for such file and put it as response content. This is pretty useful for cases like designing custom not found pages, or other error pages.

GET

The first step is to check if the provided path is a folder or not. As a Vhost specifies an index file, if the user tries to access a folder, they are in fact trying to access to the index file of such folder. So, it has to be appended.

The next steps is to check if the specified file exists in the filesystem. Vhost already provides a method to get the Path of the root for its files, so the request path has to be appended to this path. Once done, the file can be checked if it exists or not in the filesystem. And, if it exists, check that is not a folder.

Finally, we can try to open the file (assuming we have permission to do so), and get its contents in bytes. Now the HttpResponse object gets constructed, with the specified content. However, before it becomes a valid response, the MIME type of such file has to be checked. Server will try to guess its type using the standard mimetypes library and, if it cannot get resolved with either the library or the custom ones, an error will be raised. It is worth mentioning that hundreds of file types are supported, from several image formats to video and other types.

The list of error responses that this method can return are the following ones (with the given priority):

Status Code Class Reason
404 HttpResponseNotFound Specified file (or folder) does not exist
405 HttpResponseMethodNotAllowed Specified "file" path is a folder in the filesystem
403 HttpResponseForbidden Cannot read file contents (missing filesystem permissions)
415 HttpResponseUnsupportedMediaType File exists but its MIME type cannot be guessed

If no error appears, HttpResponse will have code 200 OK and as body the contents of such file.

PUT

The first step in this method is to check if the specified request path is a folder or not (ending with /). It has to be checked before because the specified path does not require to exist and, consequently, we cannot write data into a non-file node. Then, if the specified list of parent folders do not exist, create all of them (being aware of possible permission errors).

And finally, put the request body into the specified file, and add the Content-Location header (which matches the request path attribute, as we are strict regarding the file to write).

The list of error responses that this method can return are the following ones (with the given priority):

Status Code Class Reason
405 HttpResponseMethodNotAllowed Specified "file" path is a folder in the filesystem
403 HttpResponseForbidden Cannot create either parent folders or file node

If no error appears, HttpResponse will have code 201 CREATED and empty body.

DELETE

This method will be very strict (not like GET which could be more lax regarding which file is accessing). This is implemented like this to prevent unintended changes in the filesystem. In other words, it will not try to guess with index files which file the user is trying to delete.

The first step is to check if the specified file exists in the filesystem. If it does not exist, then there is no need to proceed further. If it does, then it is needed to confirm that such "file" is a file and not a folder.

And finally, try to delete the file (and recursively the parent folders if they are empty). Note that only the permission will be checked to delete the file, not the parent folders.

The list of error responses that this method can return are the following ones (with the given priority):

Status Code Class Reason
404 HttpResponseNotFound Specified file (or folder) does not exist
405 HttpResponseMethodNotAllowed Specified "file" path is a folder in the filesystem
403 HttpResponseForbidden Cannot delete file (missing filesystem permissions)
415 HttpResponseUnsupportedMediaType File exists but its MIME type cannot be guessed

If no error appears, HttpResponse will have code 200 OK and empty body.

NTW22INFO

This method generates a static response. It does not really depend on the request: output is always a constant. Thus, for all paths, a request would look like the following one:

NTW22INFO / HTTP/1.0
Host: gyuincognito.ch

And the response will look like this:

HTTP/1.0 200 OK
Date: Wed, 24 Mar 2021 09:30:00 GMT
Server: Guy incognito's Server
Content-Length: 98
Content-Type: text/plain

The administator of guyncognito.ch is Guy incognito.
You can contact him at guy.incognito@usi.ch.

The only variables are in the content of the response, which depends on the virtual host the user is trying to access. It is not possible to get an error in this method.

Acknowledgments

About

Assignement 1 | Computer Networks SP 2022

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published