startrek

The goal of startrek is to access Star Trek transcripts in a data.frame for easy analysis. All transcripts have been parsed from text files to a tidy data format.

Installation

Keep in mind that this is a data package which stores the data locally. There aren’t any functions which scrape data from a reliable source. As of now, the size of this package is ~17.7 MB.

If the size isn’t a concern, you can install the development version from GitHub:

devtools::install_github("tylurp/startrek")

Or, download the data to disk from the data folder in this repository.

Example

To access an episode transcript from The Next Generation series, see the tng list:

library(startrek)
library(tibble)
library(dplyr)
library(tidyr)

tng$`The Inner Light`
#> # A tibble: 410 x 6
#>       id perspective    setting         character  description line        
#>    <int> <chr>          <chr>           <chr>      <chr>       <chr>       
#>  1    83 3 EXT. SPACE … at warp.        PICARD (V… <NA>        Captain's l…
#>  2    94 4 INT. BRIDGE  PICARD, RIKER,… PICARD     <NA>        The last ti…
#>  3    99 4 INT. BRIDGE  PICARD, RIKER,… GEORDI     <NA>        Nine hours.…
#>  4   101 4 INT. BRIDGE  PICARD, RIKER,… PICARD     <NA>        "The entire…
#>  5   104 4 INT. BRIDGE  PICARD, RIKER,… RIKER      <NA>        That's a li…
#>  6   107 4 INT. BRIDGE  PICARD, RIKER,… PICARD     <NA>        And for me.…
#>  7   115 4 CONTINUED:   PICARD, RIKER,… WORF       <NA>        Sir, sensor…
#>  8   120 4 CONTINUED:   PICARD, RIKER,… PICARD     <NA>        On screen.  
#>  9   126 5 ANGLE - VIE… An alien objec… PICARD     <NA>        Magnify.    
#> 10   130 5 ANGLE - VIE… The object spr… PICARD     <NA>        Mister Data?
#> # … with 400 more rows

Or access the entire series and play with the data in creative ways. For example, we might infer character specific episodes by counting the number of lines each character has in each episode:

tng %>% 
  bind_rows(.id = "episode") %>% 
  select(episode, everything()) %>% 
  group_by(episode) %>% 
  count(character, sort = TRUE)
#> # A tibble: 4,227 x 3
#> # Groups:   episode [176]
#>    episode               character     n
#>    <chr>                 <chr>     <int>
#>  1 All Good Things...    PICARD      348
#>  2 Encounter at Farpoint PICARD      224
#>  3 Interface             GEORDI      197
#>  4 Future Imperfect      RIKER       183
#>  5 Frame of Mind         RIKER       177
#>  6 The Outcast           RIKER       173
#>  7 Suspicions            BEVERLY     172
#>  8 Captain's Holiday     PICARD      171
#>  9 Bloodlines            PICARD      168
#> 10 Remember Me           BEVERLY     165
#> # … with 4,217 more rows

The Deep Space Nine series is also available:

ds9$Chimera
#> # A tibble: 415 x 6
#>       id perspective  setting          character description   line        
#>    <int> <chr>        <chr>            <chr>     <chr>         <chr>       
#>  1    79 2 INT. RUNA… ODO is in the c… O'BRIEN   (moving to t… How long wa…
#>  2    81 2 INT. RUNA… ODO is in the c… ODO       <NA>          Almost two …
#>  3    86 2 INT. RUNA… O'Brien's surpr… O'BRIEN   (noticing)    You dropped…
#>  4    90 2 INT. RUNA… O'Brien's surpr… ODO       (nods)        We entered …
#>  5    94 2 INT. RUNA… O'Brien's surpr… O'BRIEN   (taking a se… What's that?
#>  6    96 2 INT. RUNA… O'Brien's surpr… ODO       <NA>          The shopkee…
#>  7    99 2 INT. RUNA… O'Brien's surpr… O'BRIEN   <NA>          I didn't kn…
#>  8   104 2 CONTINUED: O'Brien's surpr… ODO       <NA>          It's a pres…
#>  9   108 2 CONTINUED: O'Brien's featu… ODO       (misundersta… You don't t…
#> 10   110 2 CONTINUED: O'Brien's featu… O'BRIEN   <NA>          I'm sure sh…
#> # … with 405 more rows

If you want both datasets together, one approach might be to created a nested data frame:

all_episodes <- function(.data, series_name) {
  .data %>% 
    bind_rows(.id = "episode") %>% 
    mutate(series = series_name) %>% 
    select(series, everything())
}

tng_all <- all_episodes(tng, "TNG")
ds9_all <- all_episodes(ds9, "DS9")

bind_rows(tng_all, ds9_all) %>% 
  group_by(series, episode) %>% 
  nest() 
#> # A tibble: 349 x 3
#>    series episode                     data              
#>    <chr>  <chr>                       <list>            
#>  1 TNG    Encounter at Farpoint       <tibble [805 × 6]>
#>  2 TNG    The Naked Now               <tibble [405 × 6]>
#>  3 TNG    Code of Honor               <tibble [438 × 6]>
#>  4 TNG    Haven                       <tibble [421 × 6]>
#>  5 TNG    Where None Have Gone Before <tibble [409 × 6]>
#>  6 TNG    The Last Outpost            <tibble [493 × 6]>
#>  7 TNG    Lonely Among Us             <tibble [450 × 6]>
#>  8 TNG    Justice                     <tibble [452 × 6]>
#>  9 TNG    The Battle                  <tibble [523 × 6]>
#> 10 TNG    Hide And Q                  <tibble [363 × 6]>
#> # … with 339 more rows

The columns have been arranged in a specific order to read from left to right or when using glimpse(), top to bottom. For example:

ds9$Chimera %>% 
  .[5, ] %>% 
  glimpse()
#> Observations: 1
#> Variables: 6
#> $ id          <int> 94
#> $ perspective <chr> "2 INT. RUNABOUT"
#> $ setting     <chr> "O'Brien's surprised to hear he was asleep that long…
#> $ character   <chr> "O'BRIEN"
#> $ description <chr> "(taking a seat)"
#> $ line        <chr> "What's that?"

The raw text files were parsed using the scripts found in the data-raw folder of this repository. Below is a visual explanation:

ds9$Emissary %>% 
  .[26, ] %>% 
  glimpse()
#> Observations: 1
#> Variables: 6
#> $ id          <int> 289
#> $ perspective <chr> "10 INT. SISKO'S QUARTERS (OPTICAL)"
#> $ setting     <chr> "Destroyed... an explosion has ripped a hole in the …
#> $ character   <chr> "SISKO"
#> $ description <chr> "(calm, controlled)"
#> $ line        <chr> "It's gonna be okay... I'll get you  out of there...…

Acknowledgements

Transcripts were taken from Star Trek Minutiae

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
R		R
data-raw		data-raw
data		data
inst/extdata		inst/extdata
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
appveyor.yml		appveyor.yml
startrek.Rproj		startrek.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

startrek

Installation

Example

Acknowledgements

About

Licenses found

Releases

Packages

Languages

License

Licenses found

tylerlittlefield/startrek

Folders and files

Latest commit

History

Repository files navigation

startrek

Installation

Example

Acknowledgements

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages