- Get yo-self a MongoDB instance
- Clone this repo
- Edit
mongoball.var
and add your config (mongodb host, mongoball db name) - Create a
.creds.var
file and add your deets (user, password, authdb) - Run
./mongoball.sh
This script clones the Chadwick Baseball Bureau's Baseball Databank and uses mongoimport
to populate a mongodb instance.
It requires a user in mongodb with at least readWrite and can't run wide open. See the mongodb docs on users and roles.
.creds.var - create and place next to mongoball.sh
(this file is in .gitignore
).
MONGODB_USER=harrycaray
MONGODB_PASSWORD=holycow
MONGODB_AUTHDB=admin
mongoball.var - add your config (this file is read by mongoball.sh
).
MONGODB_HOST=cluster0-shard-00-00-negae.gcp.mongodb.net:27017 #atlas cluster
# MONGODB_HOST=localhost:27017 #local
MONGOBALL_DB=mongoball
Each baseballdatabank file is loaded as a Collection in the database $MONGOBALL_DB
.
| ------------------- | ------ | --------------- | ------ | --------------- | ----- |
| allStar | 5291 | fielding | 143046 | people | 19878 |
| appearances | 107357 | fieldingOF | 12028 | pitching | 47628 |
| awardsManagers | 179 | fieldingOFsplit | 33279 | pitchingPost | 5798 |
| awardsPlayers | 6236 | fieldingPost | 13938 | salaries | 26428 |
| awardsShareManagers | 425 | hallOfFame | 4191 | schools | 1207 |
| awardsSharePlayers | 6879 | homeGames | 3108 | seriesPost | 343 |
| batting | 107429 | managers | 3536 | teams | 2925 |
| battingPost | 14750 | managersHalf | 93 | teamsFranchises | 120 |
| collegePlaying | 17350 | parks | 252 | teamsHalf | 52 |
| ------------------- | ------ | --------------- | ------ | --------------- | ----- |
- Find Jackie Robinson and Babe Ruth
db.people.find({ nameFirst: "Jackie", nameLast: "Robinson"}).pretty()
db.people.find({nameFirst: "Babe", nameLast: "Ruth"}).pretty()
- Find the player with most homers in a single season, output only playerId, yearId and homeRuns.
// using sort and limit
db.batting.find({}, {"_id":1, "playerId":1, "yearId":1, "homeRuns":1}).sort({homeRuns:-1}).limit(1).pretty()
{
"_id" : ObjectId("5e63af3bff5e503ecc0a4782"),
"playerId" : "ruthba01",
"birthYear" : 1895,
"birthMonth" : 2,
"birthDay" : 6,
"birthCountry" : "USA",
"birthState" : "MD",
"birthCity" : "Baltimore",
"deathYear" : 1948,
"deathMonth" : 8,
"deathDay" : 16,
"deathCountry" : "USA",
"deathState" : "NY",
"deathCity" : "New York",
"nameFirst" : "Babe",
"nameLast" : "Ruth",
"nameGiven" : "George Herman",
"weight" : 215,
"height" : 74,
"bats" : "L",
"throws" : "L",
"debut" : ISODate("1914-07-11T00:00:00Z"),
"finalGame" : ISODate("1935-05-30T00:00:00Z"),
"retroId" : "ruthb101",
"bbrefId" : "ruthba01"
}
{
"_id" : ObjectId("5e63af39b225d551b6748bc2"),
"playerId" : "ruthba01",
"yearId" : 1936,
"votedBy" : "BBWAA",
"ballots" : 226,
"needed" : 170,
"votes" : 215,
"inducted" : "Y",
"category" : "Player"
}
- MongoDB
- Atlas - mongoDB as a Service across the globe
- Mongoimport - used by
mongoball.sh
- Baseball Databank - source of data
- Chadwick Baseball Bureau - check them out!
- Sean Lahman Baseball Archive