Skip to content

achmad-irfan/PA1-NHTSA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Back

NHTSA

URL Dashoboard project (Bonus) : NHTSA

1. Background Project :

NHTSA is one such department government in the United States focused on reducing the number traffic accident on the highway. Currently NHTSA is brewing a new regulation that will be implemented next year. I was asked to analyze the data collected during 2021. This data is a complete information about accidents that occurred during 2021.

2. Purpose:

The main purpose is to provide a number of recommendations about how to reduce the number of accidents on the highway. To do this, first I need to identify the following data:

• Top 10 states where the most accidents occur

• The average number of accidents per day by hour accident

• Percentage of accidents caused by drunk drivers

• Percentage of accidents in rural and urban areas

• Number of accidents by day

3. Dataset:

Data can be access in the following link : Click here

4. Data Preparation

4.1 Data Validation

All the data must be checked whetever there is a abnormal data. The queery for data checking and validating :

count(consecutive_number) from crash ;-- to count consecutive number (unique code in every traffic accident)

select count(distinct consecutive_number) from crash ; -- data is equivalent with previous queery

select distinct state_name from crash ; --no abnormal data

select max(number_of_vehicle_forms_submitted_all), min(number_of_vehicle_forms_submitted_all) from crash ; --no abnormal data

select max(number_of_motor_vehicles_in_transport_mvit), min(number_of_motor_vehicles_in_transport_mvit) from crash; --no abnormal data

select max(number_of_parked_working_vehicles), min(number_of_parked_working_vehicles) from crash; --no abnormal data

select max(number_of_forms_submitted_for_persons_not_in_motor_vehicles), min(number_of_forms_submitted_for_persons_not_in_motor_vehicles) from crash --no abnormal data ;

select max(number_of_persons_in_motor_vehicles_in_transport_mvit), min(number_of_persons_in_motor_vehicles_in_transport_mvit) from crash ; --no abnormal data

select max(number_of_persons_not_in_motor_vehicles_in_transport_mvit), min(number_of_persons_not_in_motor_vehicles_in_transport_mvit) from crash ; --no abnormal data

select distinct land_use_name from crash ; --no abnormal data

select distinct functional_system_name from crash ; --no abnormal data

select min(milepoint),max(milepoint) from crash ; --no abnormal data

select distinct manner_of_collision_name from crash ;--no abnormal data

select distinct type_of_intersection_name from crash ;--no abnormal data

select distinct light_condition_name from crash ;--no abnormal data

select distinct atmospheric_conditions_1_name from crash ;--no abnormal data

select distinct number_of_fatalities from crash ; --no abnormal data

select distinct number_of_drunk_drivers from crash ; --no abnormal data

select min(timestamp_of_crash),max(timestamp_of_crash) from crash ; -- found that time is not in local state of USA, so it must to convert to local time

4.2 Data Cleansing

From data validation, found that data time in column timestamp_of_crash isn't displayed in local time, so it must be converted to local time in every states in USA, the step to convert time is shown in this following item:

• Import table local time of states USA to the same server as main table

This table is contain the code of local time is every states in USA, new table can be access in the following link : Click here

output:

• Add new coloumn in main table

Querry for add new coloumn in table crash:

alter table crash

add column timezone_code text,

add column local_time timestamp

output:

• Insert data in new coloumn

Querry for add data in column timezone_code in table crash based on data in table us_timezone:

update crash

set timezone_code =

(select code

from timezone

where crash.state_name = timezone.state_name)

output:

Querry for add data in column local_time in table crash :

update crash

set local_time = timestamp_of_crash at time zone timezone_code

output:

• Remove data before and after 2022

Querry for Remove data before and after 2022 table crash :

delete from crash

where local_time not between '2021-01-01 00:00:00' and '2021-12-31 23:59:59'

5. Data Analyze:

The querry for identify the following item :

• Top 10 states where the most accidents occur

Querry

select state_name, count(consecutive_number) as jumlah_kecelakaan

from crash

group by 1

order by 2 desc

limit 10

Output

• The average number of accidents per day by hour accident

Querry

select to_char(local_time,'HH24') as jam,

count(consecutive_number)/(select extract(day from (max(local_time)-min(local_time)))+1 from crash)rata2_kecelakaan

from crash

group by jam

order by rata2_kecelakaan desc

Output

• Percentage of accidents caused by drunk drivers

Querry

select mabuk.kondisi, count(mabuk.kondisi) as jumlah

from

(select consecutive_number,

case

when number_of_drunk_drivers = 0

then 'tidak mabuk'

else 'mabuk'

end kondisi

from crash) as mabuk

group by 1

Output

• Percentage of accidents in rural and urban areas

Querry

select land, sum(count) from

(select land_use_name,

count(consecutive_number),

case

when land_use_name ='Rural'

then 'Rural'

when land_use_name ='Urban'

then 'Urban'

else 'Others'

end land

from crash

group by land_use_name) as x

group by land

Output

• Number of accidents by day

Querry

select hari, jumlah from

(select to_char(local_time,'D') harii, to_char(local_time,'Day') hari, count(consecutive_number) jumlah

from crash

group by hari,harii) as x

order by harii

Output

6. Insight and Recomendation

• Based on data, texas and california is states that have most traffic accident in 2022

• The interval time between 12-15 is the most prone to accident

• 25% percent of traffic accident in usa is because people drunk

• The probability of traffic accident is 25% higher in weekend than weekday

• Based on data, to reduce the number of traffic accident in 2022, We recommend increasing the number of sheriffs on duty during lunch hours, especially on weekends, as this is an accident-prone time in the Texas and California areas.

Releases

No releases published

Packages

No packages published