Texas beaches are a great place to relax and have fun, but there are potential dangers in the water. Bacteria levels can exceed safe levels causing the state to close beaches, locals to lose revenue, or most severely, make swimmers sick.
The state currently tests and reports bacteria levels for several beaches and displays bacteria counts at TexasBeachWatch.com.
The problem is that testing occurs weekly or biweekly, there is a delay of at least three days from when the sample is taken to when the results are finalized and published to the public. Bacteria levels may have risen to unsafe levels during the time delay.
Using historical bacteria samples and weather records, we propose to train a regression model that estimates the bacteria counts when provided weather information. If successful, the delay from testing to reporting to the public would be greatly reduced.
-
Beach Advisory and Closing On-line Notification historical bacteria levels example. (CSV download)
-
Historical Weather Data example. (CSV download)
-
The team will meet weekly via zoom call at 9:00 AM on Mondays to map out a work plan and duties for the coming week.
-
Slack is the primary channel for real-time communications between team members and instructional staff.
-
The team will comment on pull requests and issues to create a record of work specific to changes in the repo.
ETL was performed on CSV files listed in the data sources above. All of the ETL work was performed in python using pandas. The connection to the database was established using SQLAlchemy.
Details of the ETL process can be found in this Juypter Notebook.
An instance of PostgreSQL on AWS is used to hold the transformed CSV files. The database contains 5 tables containing information about beach properties, bacteria sample records, and historical data from three different weather stations. The tables were joined to create a view in PostgreSQL that can be accessed by the machine learning model via SQLAlchemy.
The detailed information on the database can be found in the datbase folder.
Exploratory analysis was done using Tableau. The findings were used to enhance and narrow the machine learning options.
The Tableau story can be found here.
The machine learning model evolved from a regression model to a classifier from feedback in model performance and findings in the exploratory analysis.
A detailed write-up of the machine learning model evolution can be found here.
The slide presentation can be found here.
The draft dashboard is a website that allows users to interactively explore the dataset.
Information can be found in the website folder.