Minishell

Overview

Our Minishell project represents a collaborative effort to build a custom shell program in C, inspired by the functionalities of bash, the Bourne Again SHell. As part of the 42 School curriculum, this project served as a unique opportunity for us to deepen our collective understanding of system processes, file descriptors, and the complexities of command-line interfaces. Through the development of our own shell, we engaged in a comprehensive exploration of Unix-based systems, gaining hands-on experience in process control, command interpretation, and environmental variable management.

Project Challenges

Parsing User Input: Accurately parsing user input, including handling spaces, quotes, and special characters, while distinguishing between command arguments and options.
Executing Commands: Implementing logic to search for and execute the right executable based on the PATH environment variable or a specified path, and managing execution of built-in commands versus external commands.
Signal Handling: Correctly handling Unix signals such as SIGINT (ctrl-C), SIGQUIT (ctrl-\), and EOF (ctrl-D), and ensuring the shell behaves similarly to bash in response to these signals.
Input/Output Redirection and Pipes: Implementing input and output redirection (<, >, >>, <<) and pipes (|) to allow for command chaining and data redirection, which involves managing file descriptors and process communication.
Environment Variable Expansion: Managing environment variables and supporting their expansion within commands, including the special case of $? to represent the exit status of the most recently executed command.
Memory Management: Ensuring efficient memory management throughout the shell, including preventing memory leaks especially in the context of the readline function and dynamically allocated resources.
Built-in Commands Implementation: Creating internal implementations of several built-in commands (echo, cd, pwd, export, unset, env, exit) that behave consistently with their bash counterparts.
Concurrency and Process Management: Handling concurrency through process creation and management, using system calls like fork, execve, wait, and pipe, and ensuring robust process control and signal handling.
Error Handling: Developing comprehensive error handling strategies to deal with invalid commands, permissions issues, nonexistent files, and other runtime errors.

Team Development Steps for Minishell

Step 1: Initial Planning and Setup

Repository Setup: Collaboratively set up the GitHub repository, ensuring a clear directory structure and branch strategy.
Makefile Creation: Makefile that includes rules for all, clean, fclean, re
Set up libft libray

Step 2: Research and Design Phase

This phase was about understanding the shell's operations, researching the external functions allowed, and dividing them among ourselves to research and explain their usage to the team.

Shell Operations

What Happens When You Type a Command in Your Terminal 📹
Shell Code Explained 📹
Shell Command Language 📹
Unix terminals and shells 📹
Terminal vs. Bash vs. Command line vs. Prompt 📹

Tutorial - Write a Shell in C 📝

External Functions

Reviewed the external functions allowed, dividing them among ourselves to research and explain their usage to the team.

Readline Functions:

Function	Description
`readline`	Reads a line from the standard input and returns it.
`rl_clear_history`	Clears the readline history list.
`rl_on_new_line`	Prepares readline for reading input on a new line.
`rl_replace_line`	Replaces the content of the readline current line buffer.
`rl_redisplay`	Updates the display to reflect changes to the input line.
`add_history`	Adds the most recent input to the readline history list.

Standard I/O Functions:

Function	Description
`printf`	Outputs formatted data to stdout.

Memory Allocation Functions:

Function	Description
`malloc`	Allocates specified bytes of heap memory.
`free`	Deallocates previously allocated memory.

File I/O Functions:

Function	Description
`write`	Writes data to a file descriptor.
`access`	Checks calling process's permissions for a file or directory.
`open`	Opens a file or device, returning a file descriptor.
`read`	Reads data from a file descriptor into a buffer.
`close`	Closes a previously opened file descriptor.

Process Control Functions:

Function	Description
`fork`	Creates a new process by duplicating the calling process.
`wait`	Suspends execution of the calling process until one of its children terminates.
`waitpid`	Waits for a specific child process to change state.
`wait3`	Waits for any child process to change state.
`wait4`	Waits for a specific child process to change state.
`signal`	Handles or ignores signals sent to the process.
`sigaction`	Handles or ignores signals sent to the process.
`sigemptyset`	Initializes and adds signals to a signal set.
`sigaddset`	Initializes and adds signals to a signal set.
`kill`	Sends a signal to a process or a group of processes.
`exit`	Terminates the calling process.

Directory Functions:

Function	Description
`getcwd`	Gets the current working directory.
`chdir`	Changes the current working directory.
`stat`	Returns information about a file or a file descriptor.
`lstat`	Returns information about a file or a file descriptor.
`fstat`	Returns information about a file or a file descriptor.
`unlink`	Removes a link to a file.
`execve`	Replaces the current process image with a new process image.

File Descriptor Functions:

Function	Description
`dup`	Duplicates a file descriptor.
`dup2`	Duplicates a file descriptor.
`pipe`	Creates a pipe for inter-process communication.

Directory Functions:

Function	Description
`opendir`	Manages directory streams.
`readdir`	Manages directory streams.
`closedir`	Manages directory streams.

Error Handling Functions:

Function	Description
`strerror`	Returns a pointer to the textual representation of an error code.
`perror`	Returns a pointer to the textual representation of an error code.

Terminal Functions:

Function	Description
`isatty`	Tests whether a file descriptor refers to a terminal.
`ttyname`	Returns the name of the terminal associated with a file descriptor.
`ttyslot`	Returns the name of the terminal associated with a file descriptor.
`ioctl`	Controls device-specific input/output operations.
`getenv`	Returns the value of an environment variable.
`tcsetattr`	Sets and gets terminal attributes.
`tcgetattr`	Sets and gets terminal attributes.
`tgetent`	Terminal handling functions from the termcap library.
`tgetflag`	Terminal handling functions from the termcap library.
`tgetnum`	Terminal handling functions from the termcap library.
`tgetstr`	Terminal handling functions from the termcap library.
`tgoto`	Terminal handling functions from the termcap library.
`tput`	Terminal handling functions from the termcap library.

Step 3: Parsing and Input Management

Parsing - Computerphile
Learn About Parsing - What is it and Why Do You Need It?

Command Reading

Readline Library: Implemented readline and integrated add_history ( GNU Readline)

brew install readline

readline(3) - Linux manual page

Add the following to the Makefile:

READLINE_INCLUDE = $(shell brew --prefix readline)/include
READLINE_LIB = $(shell brew --prefix readline)/lib
INCLUDES = -I./includes -I./lib/libft -I$(READLINE_INCLUDE)

#include <stdio.h>
#include <readline/readline.h>
#include <readline/history.h>

int main(void)
{
    char *input;

    while (1)
    {
        input = readline("minishell$ ");
        if (!input)
            break;
        if (*input)
            add_history(input);
        printf("Input: %s\n", input);
        free(input);
    }
    return 0;
}

Steps for building the parser and AST (Abstract Syntax Tree)

What Is An Abstract Syntax Tree

Syntax Error Checking: This involves verifying whether the input string adheres to the shell's syntax rules. It checks for unclosed quotes, and misuse of redirection or pipe symbols. Syntax error checking ensures that the input can be correctly interpreted and executed.
Tokenization: This step breaks the input string into meaningful pieces, known as tokens. Tokens can be commands, arguments, redirection operators (<, >, >>, <<), pipe symbols (|), and environment variable identifiers. Tokenization simplifies the parsing process by converting the input string into a format that's easier to analyze.
Parsing: During parsing, tokens are analyzed to understand their syntactical relationship. This step involves constructing a representation of the input that reflects the user's intention. Depending on the complexity of the shell, this could mean building an abstract syntax tree (AST) or a simpler structure.
AST Construction:
- Commands: Nodes in the AST represent commands along with their arguments. These nodes are fundamental to understanding what actions the shell needs to perform.
- Redirections: Redirection nodes are created to represent input and output redirections. These nodes are attached to command nodes to modify how the commands read their input or write their output.
- Pipelines: When a pipe is encountered, a pipeline node links command nodes together, indicating that the output of one command serves as the input to another.
- Environment Variable Expansion: This can be handled as part of command parsing or immediately before command execution. It involves replacing environment variable identifiers with their corresponding values.
Execution AST: After the AST is built, it is traversed to execute the commands it represents. This involves:
- Executing built-in commands directly within the shell.
- Launching external commands by creating new processes.
- Setting up redirections as specified by the redirection nodes.
- Managing pipelines by connecting the stdout of one command to the stdin of the next.

a. Designing a Syntax Error Checker

the syntax error checker will be responsible for identifying and reporting syntax errors in the user input.

Unclosed Quotes: verify that all quotes are properly closed.
Misplaced Operators: Detect if pipes | are used incorrectly, such as being at the start or end of the input, or if multiple pipes are used consecutively.
Logical Operators: Detect logical operators such as && and || and report them as not supported.
Invalid Redirections: Detect invalid redirections, such as multiple consecutive redirections or redirections at the start or end of the input.

b. Syntax Error Checker Function

syntax_checker.c:

syntax_error_checker Function: Iterates through the input string, checking for syntax errors and reporting them if found.
has_unclosed_quotes Function: Checks for unclosed quotes in the input string.
has_invalid_redirections Function: Detects invalid redirections, such as multiple consecutive redirections or redirections at the start or end of the input.
has_misplaced_operators Function: Detects misplaced pipes and redirections.
has_logical_operators Function: Detects logical operators such as && and || and reports them as not supported.

c. Tokenization

The goal of the tokenization process is to break down the input string into a series of tokens that the parser can easily understand. These tokens represent commands, arguments, pipes, redirections, and other elements of the shell syntax.

Quotations: Distinguishing between single (') and double (") quotes.
Redirections: Recognizing input (<), output (>), append (>>), and here-documen(<<) redirections.
Pipes (|): Splitting commands to be executed in a pipeline.
Environment variables: Expanding variables starting with $.
Command separation: Identifying commands and their arguments.

d. Token Structure

// Token type enumeration
typedef enum e_token_type
{
    TOKEN_WORD,      // For commands and arguments
    TOKEN_PIPE,      // For '|'
    TOKEN_REDIR_IN,  // For '<'
    TOKEN_REDIR_OUT, // For '>'
    TOKEN_REDIR_APPEND, // For '>>'
    TOKEN_REDIR_HEREDOC, // For '<<'
    TOKEN_ENV_VAR, // For environment variables
}   t_token_type;

// Token structure
typedef struct s_token
{
    t_token_type type;
    char        *value;
    struct s_token *next;
}   t_token;

e. Tokenization Strategy

Whitespace Handling: Skip whitespace outside quotes to separate commands and arguments.
Quoting: Correctly handle single (') and double quotes ("), preserving the text exactly as is within single quotes and allowing for variable expansion and escaped characters within double quotes.
Redirections and Pipes: Detect >, >>, <, <<, and |, treating them as separate tokens while managing any adjacent whitespace.

f. Tokenization Function

The tokenization function will iterate through the input string, identifying and categorizing segments according to the shell syntax.

tokenization.c:

tokenize_input Function: Iterates through the input, creating tokens for words separated by spaces.
handle_special_chars Function: Handles the special characters in the input string.
handle_word Function: Handles the words in the input string.
print_tokens Function: Prints the tokens to verify the tokenization process.

Example:

> ls -l | wc -l > output.txt | ls > output2.txt
Token:  ls                    | Type:  WORD                
--------------------------------------------------
Token:  -l                    | Type:  WORD                
--------------------------------------------------
Token:  |                     | Type:  PIPE                
--------------------------------------------------
Token:  wc                    | Type:  WORD                
--------------------------------------------------
Token:  -l                    | Type:  WORD                
--------------------------------------------------
Token:  >                     | Type:  REDIRECT_OUT        
--------------------------------------------------
Token:  output.txt            | Type:  WORD                
--------------------------------------------------
Token:  |                     | Type:  PIPE                
--------------------------------------------------
Token:  ls                    | Type:  WORD                
--------------------------------------------------
Token:  >                     | Type:  REDIRECT_OUT        
--------------------------------------------------
Token:  output2.txt           | Type:  WORD                
--------------------------------------------------

g. Parsing

The parsing process involves analyzing the tokens to understand their syntactical relationship. This step constructs a representation of the input that reflects the user's intention.

Command Parsing: Parsing commands and their arguments, creating command nodes in the AST.
Pipeline Parsing: Parsing pipeline tokens, creating pipeline nodes in the AST.
Redirection Parsing: Parsing redirection tokens, creating redirection nodes in the AST.
File Node Creation: Creating a file node for redirections in the AST.

h .AST Node Structure

The AST node structure will represent the input string in a way that reflects the user's intention. The AST will be composed of nodes that represent commands, arguments, redirections, and pipelines.

typedef struct s_ast_node
{
    t_node type type;
    char *args;
    struct s_ast_node *left;
    struct s_ast_node *right;
}   t_ast_node;

i. Parsing Function

The parsing function will iterate through the tokens, building an abstract syntax tree (AST) that represents the input string.

parse.c:

parse_tokens Function: Iterates through the tokens, building an abstract syntax tree (AST) that represents the input string.
parse_command Function: Parses a command and its arguments, creating a command node in the AST.
parse_pipeline Function: Parses pipeline tokens, creating pipeline nodes in the AST.
parse_redirection Function: Parses redirection tokens, creating redirection nodes in the AST.
create_file_node Function: Creates a file node for redirections in the AST.

j. AST Visualization

generate_ast_diagram/generate_ast_diagram.c:

generate_ast_diagram Function: Generates a visual representation of the AST, showing the structure of the input string.

Example:

ls -l | wc -l > output.txt | ls > output2.txt

GraphvizOnline

Step 4: Command Execution

Built-in Commands: Implementing internal versions of several built-in commands (echo, cd, pwd, export, unset, env, exit) that behave consistently with their bash counterparts.
External Commands: Implementing logic to search for and execute the right executable based on the PATH environment variable or a specified path.
Process Creation: Using system calls like fork, execve, wait, and pipe to manage process creation and execution.
Redirection and Pipes: Implementing input and output redirection (<, >, >>, <<) and pipes (|) to allow for command chaining and data redirection.

Step 5: Built-in Commands Implementation

echo: Outputs the arguments passed to it.
cd: Changes the current working directory.
pwd: Prints the current working directory.
export: Sets environment variables.
unset: Unsets environment variables
env: Prints the environment variables.
exit: Exits the shell.

Step 6: Signal Handling

SIGINT: Handling the ctrl-C signal to interrupt the shell's execution.
SIGQUIT: Handling the ctrl-\ signal to quit the shell.
EOF: Handling the ctrl-D signal to exit the shell.

Step 7: Environment Variable Expansion

Environment Variable Expansion: Managing environment variables and supporting their expansion within commands, including the special case of $? to represent the exit status of the most recently executed command.

Step 8: Testing and Debugging

reda ghouzraf : The most skilled human tester in the world
Minishell Tests
https://github.com/Pyr-0/Minishell-42/wiki/edge-(or-non-edge)-cases-to-test-out
https://github.com/nicolasgasco/42_minishell/blob/main/test_cases.txt

Project Structure

├── includes
│   └── minishell.h
├── lib
│   └── libft
├── Makefile
└── src
    ├── 01_input_validation
    ├── 02_tokenization
    ├── 03_parsing
    ├── 04_execution
    ├── 05_builtin
    ├── 06_signals
    ├── 07_env_var_expansion
    ├── main.c
    └── utils

Team Members

Bilal Edd
Zakaria Elhajoui

Resources

https://www.gnu.org/software/bash/manual/html_node/Definitions.html#index-name
https://www.youtube.com/watch?v=S2W3SXGPVyU&ab_channel=theroadmap
https://tomassetti.me/guide-parsing-algorithms-terminology/
https://github.com/os-moussao/Recursive-Descent-Parser/tree/master
https://minishell.simple.ink/
https://github.com/iciamyplant/Minishell

Acknowledgements

Thanks to reda ghouzraf, MTRX, Nasreddine hanafi, Khalid zerri for their help and support during the project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Minishell

Table of Contents

Overview

Project Challenges

Team Development Steps for Minishell

Step 1: Initial Planning and Setup

Step 2: Research and Design Phase

Shell Operations

External Functions

Step 3: Parsing and Input Management

Command Reading

Steps for building the parser and AST (Abstract Syntax Tree)

a. Designing a Syntax Error Checker

b. Syntax Error Checker Function

c. Tokenization

d. Token Structure

e. Tokenization Strategy

f. Tokenization Function

g. Parsing

h .AST Node Structure

i. Parsing Function

j. AST Visualization

Step 4: Command Execution

Step 5: Built-in Commands Implementation

Step 6: Signal Handling

Step 7: Environment Variable Expansion

Step 8: Testing and Debugging

Project Structure

Team Members

Resources

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Minishell

Table of Contents

Overview

Project Challenges

Team Development Steps for Minishell

Step 1: Initial Planning and Setup

Step 2: Research and Design Phase

Shell Operations

External Functions

Step 3: Parsing and Input Management

Command Reading

Steps for building the parser and AST (Abstract Syntax Tree)

a. Designing a Syntax Error Checker

b. Syntax Error Checker Function

c. Tokenization

d. Token Structure

e. Tokenization Strategy

f. Tokenization Function

g. Parsing

h .AST Node Structure

i. Parsing Function

j. AST Visualization

Step 4: Command Execution

Step 5: Built-in Commands Implementation

Step 6: Signal Handling

Step 7: Environment Variable Expansion

Step 8: Testing and Debugging

Project Structure

Team Members

Resources

Acknowledgements