Skip to content

Search your Jupyter notebook files (.ipynb) from bash.

License

Notifications You must be signed in to change notification settings

gitjeff05/kapitsa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KAPITSA (v0.1.9)

Kapitsa logo

Search your Jupyter notebooks.

Motivation

As the number of projects grow, it becomes difficult to keep notebooks organized and searchable on your local machine.

See blog post for for how this project got started and the tools behind it..

Solution

Kapitsa is a simple, minimally configured command line program that provides a centralized way to search and keep track of your notebooks. Users simply configure paths where they keep their notebooks. Kapitsa provides convenience methods do do the following:

  1. Search Code - Query your notebooks' source.
  2. Search Tags - Query your notebooks' cell tags.
  3. List Recent - List notebooks you have worked on recently.
  4. List Directories - View all directories on your system that contain notebooks.

Benefits & Use Cases

Providing the ability to search your notebooks and any notebooks on your machine makes finding code examples easier. Search all notebooks on your local machine -- that means notebooks you pulled from Github, too.

Use Cases:

  1. Have you ever solved some complex problem and thought bookmarking the solution for later would be beneficial? With Kapitsa, you just tag the cell with a relevant "keyword" and find it later with kapitsa tags "keyword".

  2. Having trouble remembering how to use some API in [insert library here]? Just run kapitsa search "function" to search your notebooks for all cells containing function in their source.

  3. Where was that notebook you worked on 2 weeks ago? Running kapitsa recent will jog your memory.

API

Command Description
kapitsa search regex [-p path] Search notebook source.
kapitsa tags regex [-v] Search notebook cell tags.
kapitsa list [path] List all paths containing .ipynb files.
kapitsa recent [num_days] List recently modified notebooks (default last 60 days)
kapitsa list-tags [path] List all defined tags.
kapitsa clear notebook Remove cell outputs and execution_count from code cells in notebook
kapitsa [help|h] Print help info.

Quick Examples

Command Description
kapitsa list . Lists all paths in current directory containing .ipynb files
kapitsa search "join" Print cells matching "join"
kapitsa search "(join|concat)" Print cells matching on "join" or "concat"
kapitsa tags "(pandas|where)" Print cells with tags "pandas" or "where"
kapitsa tags "(?=.*where)(?=.*loc)" Print cells with tags "where" and "loc"

Verbose Examples

Find notebook cells matching "join".

> kapitsa search "join"
Found 2 matching cells in /Users/Me/File.ipynb
[
  {
    "source": [
      "df = df.join(zip_codes.loc[:, ['Zip', 'Latitude', 'Longitude']].set_index('Zip'), on='ZIPCODE')"
    ]
  },
  {
    "source": [
      "# join two dataframes on the original dataframes' column. Index must be set appropriately on second df.\n",
      "us = us.join(zips_to_coords.loc[:, ['Zip', 'Lat', 'Long']].set_index('Zip'), on='ZIPCODE')"
    ]
  }
]
...

Find cells matching "series" OR "dataframe".

> kapitsa search "series|dataframe"

Find cells matching "pandas" AND "numpy".

> kapitsa search "(?=.*pandas)(?=.*regex)"

Note: The above command does seem ridiculously complex for an and statement. Perhaps there is a better way. For now, I wanted the users to have complete control over the argument passed to jq to do the search.

Have a quick glance to see what notebooks you have worked on recently:

> kapitsa recent
Thu, 05 Mar 2020 17:05:29 -0500 /Users/Me/Github/States/train/Population.ipynb
Wed, 04 Mar 2020 11:42:24 -0500 /Users/Me/Github/Class/examples/Intro-To-Pandas.ipynb
Mon, 24 Feb 2020 22:58:51 -0500 /Users/Me/Projects/BigTime/train/Marketing-Plan.ipynb

Print out a list of all paths containing notebook files:

> kapitsa list
/Users/Me/Github/kapitsa/examples/
/Users/Me/Projects/Classifiers/train/awards-grants/
/Users/Me/Projects/Classifiers/train/noaa/
/Users/Me/Tutorials/DeepLearning/Fish

Setup and install

Dependencies

Users must have jq >= jq-1.6 installed because you need something to parse notebook documents (.ipynb) which are json.

Clone the repo.

> git clone https://github.com/gitjeff05/kapitsa
> cd kapitsa && chmod u+rx kapitsa # Make `kapitsa` executable by the user.

Create .kapitsa configuration file.

> echo '{"path":"$HOME/Github/kapitsa"}' > ~/.kapitsa # separate paths by ":"

Create an environment variable, KAPITSA that points to your config file.

> export KAPITSA=$HOME/.kapitsa # add to your .bash_profile

Source kapitsa

> . ./kapitsa # add to your .bash_profile

Test that everything is working:

> kapitsa list

If you get a list of paths then you are good to go. Add more paths (separated by ':') to the path key in .kapitsa config file.

Get Help

> kapitsa help

Open a ticket for bug reports or feature requests.

Testing and compatibility:

Kapitsa has been tested on the following platforms.

  • GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin19)
  • zsh 5.7.1 (x86_64-apple-darwin18.2.0)
  • GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)

If Kapitsa is not working for you, please open a pull request with as much information as possible so we can try to get it working.

Advanced queries

Under the hood Kapitsa uses jq regular expressions (PCRE) through the test method. Anything you can pass to jq's test method should also be valid for Kapitsa. Please open a ticket if you find any functionality missing.

File Hierarchy

> tree -L 2
.
├── LICENSE
├── README.md
├── examples
│   ├── dataframe_nih_2019.ipynb
│   ├── nih_2019.gzip
│   └── zip_geo.gzip
├── kapitsa
└── lib
    ├── find_in_notebook_source.sh
    ├── find_in_notebook_tags.sh
    ├── find_notebook_directories.sh
    └── find_recent_notebooks.sh

2 directories, 10 files

Note that any of the files in lib are meant to be run on their own or through kapitsa.

Notes on implementation

This program is meant to be minimal and portable. Most of the functionality is through find and jq. I have not included comprehensive install scripts that would do a lot of the setup for you (e.g., editing .bash_profile or adding files to your $HOME directory). Users should be at least somewhat familiar with the command line (i.e., knowledge of .bash_profile (or similar), alias and chmod). Anyone can read the contents of kapitsa and know that it simply reads some files and produces an output. Please open an issue to file a bug or request a feature.

Security

I have done my best to ensure that this code can do no harm. The primary use of this script is to read files and output the results. It does not write to directories or publish anything it finds. It does not track your usage. It makes no network requests at all. I encourage you to check the source code and open an issue to alert the author of any security vulnerabilities.

Future Ideas

  • A default set of examples for some languages/frameworks (e.g., python, pandas, numpy, scikit)
  • Caching cell metadata to make searches faster
  • Fetching metadata and source from remote locations. Users could essentially "subscribe" to other authors' preferred examples the same way bash users borrow shell configurations (.dotfiles) from authors.
  • A JupyterLab extension to facilitate the search right inside of JupyterLab.
  • Ability to build cheat sheets based on tags.

License

Licensed under the Apache License, Version 2.0