This repository contains the first practice of the Fundamentals of Informatics course (2022-23), which involves cleaning a dataset.
The goal of this practice is to learn how to handle and clean datasets using Shell scripts. Several CSV files with movie and show data have been provided, and scripts have been created to filter and clean this data, generating final files that are more manageable and useful for further analysis.
- Movies.csv: Original file with movie data.
- Movies_columna12.csv to Movies_columna16.csv: Files with specific columns extracted from the original dataset.
- Movies_f.csv and Movies_net.csv: Files with filtered and cleaned movie data.
- Shows.csv: Original file with show data.
- Shows_columna12.csv to Shows_columna15.csv: Files with specific columns extracted from the original dataset.
- Shows_f.csv and Shows_net.csv: Files with filtered and cleaned show data.
- practica1.sh: Script used for data cleaning and processing.
- prova.txt and prova_script_pas4: Test files used during the development of the practice.
- titles.cvs: File with titles of movies and shows.
-
Clone the repository:
git clone https://github.com/luciarevaliente/fon_info_practica1.git cd fon_info_practica1
-
Run the cleaning script:
./practica1.sh
This project is part of an academic course and does not accept external contributions.
This project does not have a specific license and is for educational purposes only.