Skip to content

SMAPPNYU/internal-dashboard

Repository files navigation

CSMaP Internal Dashboard

A modern dashboard to explore and visualize tweets related to a topic of interest.

Why use this dashboard | What questions can be answered | Installation and Configuration | Example MySQL queries | Resources |

Sample dashboard layout

Why use this dashboard?

If you're studying social networks, chances are you might want to collect and visualize tweets related to a certain topic. Fortunately, CSMaP has access to Decahose, a 10% sample of all tweets!

With minimal configuration, this repository:

  • Loads tweets related to a given keyword(s) from Decahose to a local MySQL database
  • Sets up a Superset dashboard
  • Schedules a background job to load new tweets every 24 hours
  • Users can then explore tweets and create dashboard

System architecture

What questions can be answered?

  • How many tweets are related to a topic (e.g.: BLM, vaccine, election...)?
  • Who are sharing those tweets; when and where do they post?
  • What hashtags/urls are co-shared?
  • Is the topic gaining or losing popularity? What is the trend?
  • ...

Since new tweets are added every day, the system also schedules a background job that loads new tweets every 24 hours. The overall system architecture is shown below.

Installation and Configuration

Step 1: Initiate MySQL database and Superset
  1. Log in HPC, then cd /home/$USER
  2. git clone git@github.com:SMAPPNYU/internal-dashboard.git
  3. cd /home/$USER/internal-dashboard (do not rename the directory)
  4. Modify config.json, change KEYWORD or TABLE_NAME. Leave other configurations as they are. Note: KEYWORD needs at least 2 words.
  5. ./init.sh config.json (do not forget to type the first dot)
  6. We are ready to create our first dashboard (go to Step 2-a)

Tip: if you want to track other keywords, please create another config file (e.g.: config-vaccine.json, and issue ./init.sh config-vaccine.json)

Step 2-a: Connect to a new Superset dashboard
  1. Inside HPC log-in node, run command: source $HOME/.bash_profile; cd /home/$USER/internal-dashboard && ./daily_update.sh config.json
  2. Wait until the script finishes, follow instructions printed on the console (stdout)
  3. After re-connecting to HPC, proceed to Step 3
Step 2-b: Connect to an existing Superset dashboard
  1. ./init.sh config.json schedule a background job that loads new tweets and hosts a new Dashboard on a new IP address every 24 hours.
  2. To connect to a dashboard from a running job, find the hostname YOUR_HOSTNAME via cat /home/$USER/decahose_visualization_setup/latest_hostname.txt
  3. Log out HPC exit
  4. Log in HPC again with port forwarding, ssh -L 8088:YOUR_HOSTNAME:8088 YOUR_NETID@log-1.nyu.cluster (change YOUR_HOSTNAME and YOUR_NETID)
  5. Proceed to Step 3

Tip: Inside the same HPC log-in node as Step 1, run crontab -e. You can delete the scheduled job if you prefer.

Step 3: Create your first dashboard
  1. Open browser, visit http://localhost:8088/
  2. Enter default username (admin) and password (admin). Update the password for enhanced security.
  3. Go to http://localhost:8088/databaseview/list/, click "+DATABASE", connect with the following string

mysql+pymysql://csmap_user:csmap@localhost:3306/tweet?read_default_file=~/.my.cnf

  1. Once we connect to the database, add a dataset
  2. Navigate to Chart page to create 20+ types of visualization: http://localhost:8088/chart/list/

Example MySQL queries

Following are some MySQL queries to get started

  • count number of tweets per day
SELECT COUNT(*), yymmdd
FROM covid_tweet
GROUP BY yymmdd 
ORDER BY yymmdd 
  • top 10 tweets from users with the highest number of followers
SELECT text, user__followers_count, user__screen_name , user__name
FROM covid_tweet
WHERE user__followers_count > 10000
ORDER BY user__followers_count desc
LIMIT 10

Resources

Note: If you encounter any problem during the set up, please contact Zhouhan, email: zc1245@nyu.edu

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published