Skip to content

Cleaning the Datasets

veerleprins edited this page Nov 13, 2020 · 3 revisions

Every raw dataset contains data that cannot be used. Think of values that are missing, data types other than those that can be used or data that has been misinterpreted. For this reason, it is important to 'clean up' data before it can be visualized.

Jump to:

Survey dataset

.map()

To clean the survey dataset I first used the method .map() to create an array of all the eye colors of the dataset. I have shown this map method below. It can be seen that of the total dataset (fullData) I returned all values from the column 'eyeColor'. I saved this in the variable allEyeColours:

let allEyeColours = fullData.map((value) => {
  return value.oogKleur;
});

.filter()

After this I wanted to remove all hash marks from the values. I did this by using the .replace() and .filter() method. From the array with all eye colors (allEyeColours) I filtered on the colors that have a '#' and replaced it with "".

let newEyeColours = allEyeColours.filter(colour => colour.replace("#", ));

After this I found out that .filter() was not the right method for this. This is because .filter() only returns a value in an array if the conditions are true (Source). The correct method for this is the .map () where I loop through each value and adjust the value slightly (changing "#" to ""). For this reason I changed the .filter() to the .map().

After this I did use .filter() in a good way. I did this by filtering only the rgb values from the total eye color array using a .match(). I also made this function fully functional: The function is written in such a way that I could use it more often to filter for other values.

const regx = new RegExp(/rgb/g);

let rgbArray = filterData(lowData, regx);

function filterData (dataArray) {
  return dataArray.filter(element => element.match(regex));
}

RDW dataset

To clean up the data from the RDW datasets I started to write a function in which I can clean the string types of the datasets to integers or floats. I need this function because the values in the data set of the RDW are all strings. The numbers contained in these strings cannot be used if they are not 'integers' or 'floats' type. Below is an example of 1 of the objects in the RDW data array, which shows that, for example, the number of parking spaces is a string. This is before I used the functions:

Screenshot 2020-11-13 at 14 41 42

First I wrote this function below:

export function cleanData(dataset) {
  return dataset.map(column => {
    column.chargingCapacity = +column.chargingCapacity;
    column.parkingCapacity = +column.parkingCapacity;
    column.location.latitude = +column.location.latitude;
    column.location.longitude = +column.location.longitude;
  });
}

When I wrote this code I immediately realized that this function was not functional. For this reason I have rewritten this function. This function is shown below:

export function toNumbers(dataArray, columnArr) {
  return dataArray.forEach(arrItem => {
    columnArr.forEach(c => {arrItem[c] = +arrItem[c];});
  });
}

In this function toNumbers I pass the total data array, with an array for the column names that need to be changed to floats or integers. Then this function changes the values to floats or integers for each item in the total data array (with the specific column name).

Because I not only had to change string types in the total dataset, but also in an object within the total dataset, I started writing a specific function for this. This code is shown below:

export function toIntegersInObj (dataArray, objName, columnArr) {
  return dataArray.forEach(arrItem => {
    columnArr.forEach(c => {arrItem[objName][c] = +arrItem[objName][c];});
  });
}

Actually this function 'toIntegersObj' works the same as the 'toIntegers'. The only difference is that it changes the values in an object within the total object. Below is an object from the RDW data array after applying my functions:

Screenshot 2020-11-13 at 14 42 03