Skip to content

Commit

Permalink
Fix analysis script issues (#244)
Browse files Browse the repository at this point in the history
* fix script

* update readme

* remove comments

* update

* Update README.md

* Update README.md

* add next line

* update

* remove branch specific instruction

* address comments

* address comments

* update readme

* update readme
  • Loading branch information
dan-du-car authored Aug 15, 2024
1 parent 5f60f3d commit 3a699c3
Show file tree
Hide file tree
Showing 7 changed files with 255 additions and 90 deletions.
136 changes: 136 additions & 0 deletions telematic_system/scripts/log_analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Prerequisite
- Preferred operating system ubuntu 20 or above
- Python environment setup
1. Install python
```
sudo apt update
sudo apt install python3
```
2. Check python version
```
python3 --version
```
Recommended version is `3.10`
3. Create a virtual environment. Navigate to `cda-telematics/telematic_system/scripts/log_analysis` directory, and run below command:
```
python3 -m venv .venv
```
4. Activate virtual environment.
```
.venv\bin\activate
```
Note: Need to run this command to activate virtual environment every time openning a new terminal.
- Install depedencies:
- Install debian packages
```
sudo apt install libcairo2-dev libxt-dev libgirepository1.0-dev

```
- Install python packages
```
pip install -r requirements.txt
```
- Clone repos:
- Clone cda-telematics GitHub repos
```
git clone https://github.com/usdot-fhwa-stol/cda-telematics.git
cd cda-telematics
```
- Download `log_timesheet.csv`
Most of the python analysis scripts refer to `log_timesheet.csv` for test runs and their duration. Since this `log_timesheet.csv` is generated during the verification/validation testing, ensure download the `log_timesheet.csv` file to this `log_analysis` folder before executing any python scripts.


# Process V2xHub bridge log
1. Navigate to `cda-telematics/telematic_system/scripts/log_analysis` directory
2. Download v2xhub logs to the current folder.
3. Run command to generate data publishing metrics.
```
python3 parse_v2xhub_telematic_plugin_logs.py --log_file_path <input-file-name>
e.g:
python3 parse_v2xhub_telematic_plugin_logs.py --log_file_path T20_R6-13_V2XHub.log
```
It will generate parsed bridge log in csv files.

# Process Streets bridge log
1. Navigate to `cda-telematics/telematic_system/scripts/log_analysis` directory
2. Download streets bridge logs to the current folder.
3. Run command to generate data publishing metrics.
```
python3 parse_streets_bridge_logs.py <path-to-log-file>
```
It will generate parsed bridge log in csv files.

# Process Cloud bridge log
1. Navigate to `cda-telematics/telematic_system/scripts/log_analysis` directory
2. Download streets bridge logs to the current folder.
3. Run command to generate data publishing metrics.
```
parse_cloud_bridge_logs.py <path-to-log-file>
e.g:
python3 parse_cloud_bridge_logs.py T20_R6-9_carma_cloud.log
python3 parse_cloud_bridge_logs.py T20_R10-13_carma_cloud.log
```
It will generate parsed bridge log in csv files.

# Process Vehicle bridge log
1. Navigate to `cda-telematics/telematic_system/scripts/log_analysis` directory
2. Download vehicle bridge logs to the current folder.
3. Run command to generate data publishing metrics.
```
python3 parse_vehicle_bridge_logs.py <path-to-log-file>
e.g:
python3 parse_vehicle_bridge_logs.py T20_R6_R13_fusion/T20_R6_fusion.log
```
It will generate parsed bridge log in csv files.

# Process Messaging Server log
1. Navigate to `cda-telematics/telematic_system/scripts/log_analysis` directory
2. Download messaging server logs to the current folder.
3. Run command to generate data publishing metrics.
```
parse_messaging_server_logs.py <path-to-log-file>
e.g:
python3 parse_messaging_server_logs.py T20_R6-13_messaging_server.log
```
It will generate parsed messaging server delay and message drop log in csv files.

# Metric analysis
## Latency
1. Create a folder with the test case name in the current `log_analysis` folder.
For example, test case 20:
```
mkdir T20
```
2. Copy all the generated T20_*_messaging_server_*_delay_parsed.csv files to this new folder `T20`
3. Run plot latency script to generate plots for those csv files with delay metrics in folder `T20`.
```
python3 latencyPlotter.py <folder-name or test case name>
e.g:
python3 latencyPlotter.py T20
```
The generated plots are saved into `output` folder.
## Message loss
1. Create a folder with the test case name and message drop in the current `log_analysis` folder.
For example, test case 20:
```
mkdir T20_message_drop

```
2. Copy all generated <test case name>_*_messaging_server_*_message_drop_parsed.csv files to this new folder `<test case name>_message_drop`.
3. Copy all generated bridge csv files into the same folder
4. Run message drop analysis script to analyze all files in the `<test case name>_message_drop` folder.
```
python3 get_message_drop.py <folder-name or test case name>_message_drop
e.g:
python3 get_message_drop.py T20_message_drop
```
Generated result is similar to below:
<br>
![Message_loss_result](https://github.com/user-attachments/assets/15fefacb-e929-4340-a0e3-6d7f6441ba8e)

50 changes: 11 additions & 39 deletions telematic_system/scripts/log_analysis/get_message_drop.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

import matplotlib.dates as mdates
import matplotlib.pyplot as plt

import os
warnings.filterwarnings("ignore")

'''
Expand Down Expand Up @@ -38,12 +38,12 @@ def combineFiles(log_dir):
bridges_csv = []

messaging_server_csv_exist = False
messaging_server_csv = ""
messaging_server_csv = []

for filename in filenames:
if "Messaging" in filename:
if "messaging" in filename.lower():
messaging_server_csv_exist = True
messaging_server_csv = log_dir + "/" + filename
messaging_server_csv.append(log_dir + "/" + filename)

matched = re.match(bridge_csv_regex, filename, re.IGNORECASE)
if matched:
Expand All @@ -55,9 +55,8 @@ def combineFiles(log_dir):

if not messaging_server_csv_exist:
sys.exit("Did not find any Messaging server csv logs in directory: "+log_dir+ "")


messaging_server_df = pd.read_csv(messaging_server_csv)
messaging_server_df = pd.concat(map(pd.read_csv, messaging_server_csv), ignore_index=True)
infrastructure_units = ['streets_id', 'cloud_id']

############# Load messaging server logs and get a list of dataframes for all unit ids
Expand All @@ -71,25 +70,21 @@ def combineFiles(log_dir):
# value = value.drop('Metadata',axis =1)


#Get dataframes from bridge logs
bridge_dfs = dict()
for bridge_csv in bridges_csv:
bridge_df = pd.read_csv(bridge_csv)
bridge_dfs.update(dict(tuple(bridge_df.groupby('Unit Id'))))

print(bridge_dfs.keys())
bridge_df = pd.concat(map(pd.read_csv, bridges_csv), ignore_index=True)
bridge_dfs = dict(tuple(bridge_df.groupby('Unit Id')))


# Create combined dataframes from
for key in bridge_dfs:
if key in messaging_server_dfs:

bridge_df_combined = pd.merge(bridge_dfs[key], messaging_server_dfs[key], how='left', left_on=['Topic','Payload Timestamp'], right_on = ['Topic','Message Time'])
bridge_df_combined.to_csv(log_dir + key + "_combined.csv")
if not os.path.exists("output"):
os.mkdir("output")
bridge_df_combined.to_csv("output/"+log_dir+"_"+ key + "_combined.csv")

bridge_missing_message_count = bridge_df_combined['Log_Timestamp(s)'].isnull().sum()
bridge_total_message_count = len(bridge_df_combined['Payload Timestamp'])
print("Message drop for unit: ", key)
print("\nMessage drop for unit: ", key)
print("Missing count: ", bridge_missing_message_count)
print("Total count: ", bridge_total_message_count)
print("Percentage of messages received",(1 - (bridge_missing_message_count/bridge_total_message_count))*100)
Expand All @@ -101,29 +96,6 @@ def combineFiles(log_dir):
print("{} missed messages: ".format(key))
print(topics_with_empty_count)

# Plot vehicle data
bridge_df_combined = bridge_df_combined[bridge_df_combined['Message Time'].isnull()]
bridge_df_combined['Payload Timestamp'] = pd.to_datetime(bridge_df_combined['Payload Timestamp'], infer_datetime_format=True)
bridge_df_combined['Message Time'] = pd.to_datetime(bridge_df_combined['Message Time'], infer_datetime_format=True)


ax1 = plt.plot(bridge_df_combined['Topic'], bridge_df_combined['Payload Timestamp'], '|')

#Plot start and end lines
start_time = pd.to_datetime(messaging_server_dfs[key]['Log_Timestamp(s)'].iloc[0])
end_time = pd.to_datetime(messaging_server_dfs[key]['Log_Timestamp(s)'].iloc[-1])

plt.axhline(y = start_time, color = 'r', linestyle = '-', label = 'Test Start Time')
plt.axhline(y = end_time, color = 'r', linestyle = '-', label = 'Test End Time')

plt.title('{} : Topics against time of dropped message'.format(key))
plt.xlabel('Topics with dropped messages hours:mins:seconds')
plt.ylabel('Time of message drop')
xfmt = mdates.DateFormatter('%H:%M:%S')
plt.gcf().autofmt_xdate()
plt.show()
# plt.savefig('{}_Message_drop.png'.format(key))




Expand Down
50 changes: 11 additions & 39 deletions telematic_system/scripts/log_analysis/latencyPlotter.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,12 @@ def concatRuns(folderName):
allFiles.append(df)

concatOutput = pd.concat(allFiles, axis=0, ignore_index=True)
concatOutput.to_csv(f'{folderName}_allruns.csv', index=False)
if not os.path.exists("output"):
os.mkdir("output")
concatOutput.to_csv(f'output/{folderName}_allruns.csv', index=False)

def plotter(folderName):
allRuns = folderName + "_allruns.csv"
allRuns = "output/"+str(folderName) + "_allruns.csv"

#read in the combined csv data
data = pd.read_csv(allRuns)
Expand All @@ -47,51 +49,21 @@ def plotter(folderName):
print("95th Latency: " + str(trimmed_data["Delay(s)"].quantile(0.95)))

#plot vehicle, streets, and cloud data histograms if they were part of the test
streets_data = trimmed_data[trimmed_data['Unit Id'] == "streets_id"]

if len(streets_data) > 0:
fig, ax1 = plt.subplots()
fig.set_size_inches(10, 10)
sns.histplot(streets_data['Delay(s)'], kde=False)
plt.xlim(0, 0.75)
plt.xlabel('Latency(s)', fontsize=18)
plt.ylabel('Count', fontsize=18)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.title(folderName + " Streets Bridge Latency Histogram", fontsize=18)
plt.savefig(f'{folderName}_streets_latency_hist.png')


cloud_data = trimmed_data[trimmed_data['Unit Id'] == "cloud_id"]

if len(cloud_data) > 0:
fig, ax1 = plt.subplots()
fig.set_size_inches(10, 10)
sns.histplot(cloud_data['Delay(s)'], kde=False)
plt.xlim(0, 0.75)
plt.xlabel('Latency(s)', fontsize=18)
plt.ylabel('Count', fontsize=18)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.title(folderName + " Cloud Bridge Latency Histogram", fontsize=18)
plt.savefig(f'{folderName}_cloud_latency_hist.png')

vehicles = ["DOT-45244", "DOT-45254"]
for vehicle in vehicles:
vehicle_data = trimmed_data[trimmed_data['Unit Id'] == vehicle]
units = ["DOT-45244", "DOT-45254","DOT_45254","vehicle_id","rsu_1234","streets_id","cloud_id"]
for unit in units:
unit_data = trimmed_data[trimmed_data['Unit Id'] == unit]

if len(vehicle_data) > 0:
if len(unit_data) > 0:
fig, ax1 = plt.subplots()
fig.set_size_inches(10, 10)
sns.histplot(vehicle_data['Delay(s)'], kde=False)
sns.histplot(unit_data['Delay(s)'], kde=False)
plt.xlim(0, 0.75)
plt.xlabel('Latency(s)', fontsize=18)
plt.ylabel('Count', fontsize=18)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.title(folderName + " " + vehicle + " Vehicle Bridge Latency Histogram", fontsize=18)
plt.savefig(f'{folderName}_{vehicle}_latency_hist.png')

plt.title(folderName + " " + unit + " Latency Histogram", fontsize=18)
plt.savefig(f'output/{folderName}_{unit}_latency_hist.png')

def main():
if len(sys.argv) < 2:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def parseInfluxfile(logname, start_time_epoch, end_time_epoch, run_num):
# Get write time
write_time_split = log_line.split("INFO")
write_time_string = write_time_split[0][:len(write_time_split[0]) - 2]
log_time_in_datetime = datetime.datetime.strptime(write_time_string, '%Y-%m-%d %H:%M:%S.%f')
log_time_in_datetime = datetime.datetime.strptime(write_time_string, '%Y-%m-%dT%H:%M:%S.%fZ')


payload_index = log_line.index(search_string) + 1
Expand Down Expand Up @@ -197,14 +197,20 @@ def main():
for index in range(0, len(test_df)):
start_time_epoch = test_df['Start Time'].values[index]
end_time_epoch = test_df['End Time'].values[index]



local = pytz.timezone("America/New_York")


run_num = test_df['Run'].values[index].split('R')[1]

if int(run_num) in runs_range:

print("start time epoch: " + str(start_time_epoch))
print("end time epoch: " + str(end_time_epoch))
print("test case: "+ test_case)
print("runs_string: "+ runs_string)
print(runs_range)
print("Run num: ", run_num)
parseInfluxfile(logname, start_time_epoch, end_time_epoch, run_num)

Expand Down
Loading

0 comments on commit 3a699c3

Please sign in to comment.