Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis Road Map #12

Open
dhruvalb opened this issue May 3, 2019 · 5 comments
Open

Analysis Road Map #12

dhruvalb opened this issue May 3, 2019 · 5 comments
Assignees
Labels

Comments

@dhruvalb
Copy link
Collaborator

dhruvalb commented May 3, 2019

User Story:

@liu431
Copy link
Owner

liu431 commented May 9, 2019

Question idea: do the sentiments of comments correlate with the popularity of the language?

Hypothesis: With the growing of language user and questions, commenters/answers become meaner and more arrogant/impatient.

Sentiments of comments: this simple project has interesting results. We could use better metrics and tools.

Popularity of the language: trends. Many articles and visualizations on this.

@dhruvalb
Copy link
Collaborator Author

dhruvalb commented May 9, 2019

Question:
Evolution of user from one who asks questions to one who answers question

  • Do questioners use similar language as their answerers?

-As one becomes a matured answerer, is there language – stern, mean, short? [address a typical complain of stackoverflow]

  • Differentiate between users that start from questioner to answerer versus purely there to answer

  • Explore the heterogenous effect of gender, occupation, region/country

  • If the answerer is mean, do people who he/she has influence to answer, also answer meanly

@dhruvalb
Copy link
Collaborator Author

dhruvalb commented May 17, 2019

Selecting Variables for Question 1:

Readme for Data: https://ia800107.us.archive.org/27/items/stackexchange/readme.txt

• Find Top 10 languages from tags.csv

  • Example from the 500 line subset
     javascript
     java
     c#
     php
     android
     python
     jquery
     html
     c++

• Identify users that that have gold standard badges in answering from badges.csv
For description of Badges: https://stackoverflow.com/help/badges
• Identify all posts made by a given user and find the ones that are answers ‘2’ and have the tag for the language in consideration in Posts.csv. Analyze text on body and track creation date.

File Variable
Badges User Id
Badges Name
Badges Class
Tags TagName
Tags Count
Posts PostTypeId
Posts OwnerUserId
Posts Tags
Posts Body
Post CreationDate

@dhruvalb
Copy link
Collaborator Author

dhruvalb commented May 17, 2019

Sentiment Analysis of User Response Over Time for Top 10 Language

File Name Use? Variables
Badges.csv  
Comments.csv    
sample_PostLinks.csv    
Posts.csv X Body;  Creation Date; Tags; PostType Id
sample_PostHistory.csv    
Tags.csv X Tagname; count
sample_users.csv  
sample_votes.csv    

@dhruvalb
Copy link
Collaborator Author

Top 15 Tags run on whole data

  TagName Count
2 javascript 1769208
11 java 1519552
6 c# 1289429
4 php 1265522
683 android 1176225
10 python 1120872
411 jquery 945635
1 html 806983
7 c++ 606864
18939 ios 591950
3 css 575590
14 mysql 551664
15 sql 480831
54 asp.net 343360
2279 ruby-on-rails 303555

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants