This is an examples repository to supplement codeart. The repository was getting large so I decided to move image and data files over here.
The Dockerfiles dinosaur dataset has about 100K Dockerfiles, and this small example shows creating visualizations for them. The interactive version isnt' very useful here because we only have one file type, and the range of colors is limited from greens to blues. This project helped to develop the sorted interactive version of the colormap.
The parse_repo folder shows how to parse a repository (spack) and then generate a color gradient lookup. The first attempt at the color grid can be seen here and this was updated to better organize, seen here. For the second, since the Word2Vec model derives similar embeddings represented in color, this means that similar colors equate to similar terms. You can explore the visualization with this knowledge.
To generate a custom codeart image (with text), the library can also do that, with an example here and shown below.
And of course it would be more appropriate to write the name of the software as the text instead!
and of course you have to zoom in to see that the pixels are actually code, colored based on their context thanks to Word2Vec.
Finally, to generate a d3 tree that shows code images in a folder on mouseover, see this example.
It's not a great visualization, could be a lot better.
The interactive version shows the interactive color grid, where groups (in this case extensions) are colored based on salience. Click an extension to see relevant terms in the embeddings model.
The abstract versions, including those for:
are graphics generated with pictures of the code themself. By far, the all files generation is the most interesting and abstract!
The interactive version is cooler as you can click on any of the files to see in slightly more detail.
Is an example project to use codeart to parse a large Python code base, determine year of creation using the GitHub api, and then break into groups based on the year. This small project helped me to develop the interactive colomap example, which you can view here or the previous version (not sorted) here
The example also generates static images, along with a gif (animation) to show the change in data over time.
Of course this was impossible to explore, hence why I made the interactive version.
The derive_colormap folder an example to show working on deriving a colormap from a set of embeddings. This means that we start with 3d, project to 2d, and then use Voronoi to fill border cells. Here is the original 3d map:
which we then project to 2d
And then the Voronoi exercise didn't work out as intended (and I pursued other methods instead)
And ultimately wound up developing an interactive colormap (too large to add to the repo here!) that is better sorted by rows and columns (and generates in a reasonable time). Note that since this is for all of my Python code, the interface is a bit clunky. Probably something with canvas would work better here for this number of points. The notebook is useful to get someone started with a similar project.
The parse_folders is a similar effort to "Parse Repo" above, but instead we parse folders on our local file system and generate a colors gradient. This is the color gradient across all extensions for the model:
Or instead, you can generate an interactive version that allows for mousing over colors to see terms, and clicking on the list of extensions to see relevant terms. The opacity corresponds to the relative count of the term for the extension. For spack, most terms will be derived from Python files.
A github workflow is provided to create a gallery! The workflow opens a pull request, and deploys the files to the docs folder. You can see the live version here.