Code samples and materials for tech talks and meetups hosted by TrustYou
Example combining spaCy and Keras for a simple machine learning exercise. See README.md.
In this meetup I showed how a pipeline similar to TrustYou's - crawl, analyze, serve - can be built out of popular Python libraries. And then how it can be scaled up.
$ cd pydata
$ pip install -r requirements.txt # e.g. in a virtualenv
$ ./run_example.sh
The above example will crawl meetup.com to discover all meetups (runs a few hours!), and then build a Word2Vec model based on their descriptions.
In this meetup we shared some insights into the TrustYou big data tech stack, and gave introductions to two tools we've found useful: Apache Pig and Luigi. The examples from the slides are contained in this repo.
Install Apache Pig, e.g. from their website. No Hadoop necessary! Alternatively, give the Hortonworks sandbox a try if you're planning to try out other Hadoop-related technologies as well. Then, run this:
$ cd big-data/pig
$ ./run_examples.sh
Look in the *.tsv sub folders for the output - when run locally, Apache Pig mimics the folder structure of job output in the HDFS, so the data will be in part files.
Install dependencies by running pip install -r requirements.txt
from luigi folder. Then, run:
$ cd big-data/luigi
$ ./run_example.sh
We had a look behind the scenes of CPython, the reference implementation of Python, and its C API which allows you to extend the Python language in C. Finally we checked out Cython, which seems to be the sanest way of writing C extensions of Python.
In the examples we focused on benchmarking different implementations of QuickSort in Python, C and Cython. Before trying them, run pip install -r requirements.txt
from the python-c folder. I propose to run the following inside a virtualenv:
$ virtualenv venv
$ . venv/bin/activate
(venv) $ cd python-c
(venv) $ ./run_examples.sh