WebSummit 2019 Transcripts

At WebSummit 2019, 46 talks (mainly from the Central Stage) were automatically transcripted in real-time using the Otter.ai platform. This repo, provides the transcripted text, as well as the code to re-download it and preprocess it.

With this dataset, you can do statistic analysis on the text of the transcripts, or even train a neural network model to produce your very own WebSummit speech.

How to use it

Each speech, is an individual .txt file inside the plain-texts folder eg: (224STLFR2BIGPLOD.txt). All the speeches are titled using their id from the otter.ai platform. If you have a different naming scheme to propose, I'm all ears! :D

Apart from that, inside the plain-texts folder, there is a data.json file, that includes all the raw data from otter.ai.

How to generate the data again:

Prepare the code:

npm install
npm run compile

Run the scripts:

npm run datagen (This downloads the transcripts from otter.ai and produces the data.json file)
npm run preprocess (This reads the data.json file and creates the various .txt files.)

Contributions

Please fork, copy, share and contribute!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
plain-texts		plain-texts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebSummit 2019 Transcripts

How to use it

How to generate the data again:

Prepare the code:

Run the scripts:

Contributions

About

Releases

Packages

Languages

License

chrispanag/websummit19-transcripts

Folders and files

Latest commit

History

Repository files navigation

WebSummit 2019 Transcripts

How to use it

How to generate the data again:

Prepare the code:

Run the scripts:

Contributions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages