Skip to content
This repository has been archived by the owner on Nov 7, 2022. It is now read-only.
/ mongoball Public archive

Baseball data and Mongo...now that's a walk-off

Notifications You must be signed in to change notification settings

corbtastik/mongoball

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mongoDB + baseball = mongoball

The Gist

  1. Get yo-self a MongoDB instance
  2. Clone this repo
  3. Edit mongoball.var and add your config (mongodb host, mongoball db name)
  4. Create a .creds.var file and add your deets (user, password, authdb)
  5. Run ./mongoball.sh

This script clones the Chadwick Baseball Bureau's Baseball Databank and uses mongoimport to populate a mongodb instance.

It requires a user in mongodb with at least readWrite and can't run wide open. See the mongodb docs on users and roles.

Creds

.creds.var - create and place next to mongoball.sh (this file is in .gitignore).

MONGODB_USER=harrycaray
MONGODB_PASSWORD=holycow
MONGODB_AUTHDB=admin

Mongoball vars

mongoball.var - add your config (this file is read by mongoball.sh).

MONGODB_HOST=cluster0-shard-00-00-negae.gcp.mongodb.net:27017 #atlas cluster
# MONGODB_HOST=localhost:27017 #local
MONGOBALL_DB=mongoball

Collections

Each baseballdatabank file is loaded as a Collection in the database $MONGOBALL_DB.

| ------------------- | ------ | --------------- | ------ | --------------- | ----- |
| allStar             | 5291   | fielding        | 143046 | people          | 19878 |
| appearances         | 107357 | fieldingOF      | 12028  | pitching        | 47628 |
| awardsManagers      | 179    | fieldingOFsplit | 33279  | pitchingPost    | 5798  |
| awardsPlayers       | 6236   | fieldingPost    | 13938  | salaries        | 26428 |
| awardsShareManagers | 425    | hallOfFame      | 4191   | schools         | 1207  |
| awardsSharePlayers  | 6879   | homeGames       | 3108   | seriesPost      | 343   |
| batting             | 107429 | managers        | 3536   | teams           | 2925  |
| battingPost         | 14750  | managersHalf    | 93     | teamsFranchises | 120   |
| collegePlaying      | 17350  | parks           | 252    | teamsHalf       | 52    |
| ------------------- | ------ | --------------- | ------ | --------------- | ----- |

Queries

  1. Find Jackie Robinson and Babe Ruth
db.people.find({ nameFirst: "Jackie", nameLast: "Robinson"}).pretty()
db.people.find({nameFirst: "Babe", nameLast: "Ruth"}).pretty()
  1. Find the player with most homers in a single season, output only playerId, yearId and homeRuns.
// using sort and limit
db.batting.find({}, {"_id":1, "playerId":1, "yearId":1, "homeRuns":1}).sort({homeRuns:-1}).limit(1).pretty()

Data Access Patterns

{
	"_id" : ObjectId("5e63af3bff5e503ecc0a4782"),
	"playerId" : "ruthba01",
	"birthYear" : 1895,
	"birthMonth" : 2,
	"birthDay" : 6,
	"birthCountry" : "USA",
	"birthState" : "MD",
	"birthCity" : "Baltimore",
	"deathYear" : 1948,
	"deathMonth" : 8,
	"deathDay" : 16,
	"deathCountry" : "USA",
	"deathState" : "NY",
	"deathCity" : "New York",
	"nameFirst" : "Babe",
	"nameLast" : "Ruth",
	"nameGiven" : "George Herman",
	"weight" : 215,
	"height" : 74,
	"bats" : "L",
	"throws" : "L",
	"debut" : ISODate("1914-07-11T00:00:00Z"),
	"finalGame" : ISODate("1935-05-30T00:00:00Z"),
	"retroId" : "ruthb101",
	"bbrefId" : "ruthba01"
}
{
	"_id" : ObjectId("5e63af39b225d551b6748bc2"),
	"playerId" : "ruthba01",
	"yearId" : 1936,
	"votedBy" : "BBWAA",
	"ballots" : 226,
	"needed" : 170,
	"votes" : 215,
	"inducted" : "Y",
	"category" : "Player"
}

References

  1. MongoDB
  2. Atlas - mongoDB as a Service across the globe
  3. Mongoimport - used by mongoball.sh
  4. Baseball Databank - source of data
  5. Chadwick Baseball Bureau - check them out!
  6. Sean Lahman Baseball Archive