traytable
provides methods for
- storing all information about a crystallization screen in dictionaries
- extracting and tabulating all data about "hits" into a
pandas
dataframe.
The goal of traytable
is for all crystallization data to be inputted once and only once, and then conveniently looked up and reused whenever needed.
You can find a jupyter notebook with a brief demonstration of package functionality here.
Full documentation via Read the Docs can be found here
pip install traytable
A super brief example:
import traytable as tt
myscreen = tt.screen(row='protein', col='PEG', maxwell='H6') # Each row is a different [protein], and each column is a different %PEG
tray1 = tt.tray(myscreen, rows=[1,8], cols=[10,20]) # Rows vary from 1 to 8, columns vary from 10 to 20
results = tt.well(tray1, 'A3', 'good') # there is a good crystal in well A3 of tray 1
results = tt.well(tray1, ['E4', 'E5'], 'needle', old_df=results) # there are needle-y crystals in wells E4 and E5 of tray 1
The return results
from tt.well()
is a pandas
data frame where every crystal you've logged gets its own row, and every parameter you've indicated gets its own column. This makes it easy to keep track of the best conditions for your crystals across many trays with slightly different conditions. Note that upon logging your "hits", there's no need to input [protein] or %PEG; that information is already encoded by the tray and well you specified!
row
: a string indicating the parameter that is encoded by each row in a traycol
: a string indicating the parameter that is encoded by each column in a traymaxwell
: a string indicating the name of the well in the bottom right corner of each tray. Any size tray is supported; however, currently, rows must be named with letters, and columns must be named with numbers
screen
: The screen, as created bytt.screen()
, that this tray should inherit parameters from. You can't create a tray without a screen.rows
: Specify the values to assign to each row with either a single number (to assign to all rows), a list of two numbers (to evenly space among the rows) or a list of numbers explicitly specifying a value for each row. With 8 rows, you might sayrows=5
,rows=[1,8]
orrows=[1,2,3,4,6,8,10,12]
.cols
: Specify values for columns, with the same format as forrows
.
tray
: The traywell
: The well; must be a string of format '[letter][number]', and must fall into the range specified by the screen'smaxwell
quality
: Any type, though I recommend either a short categorical string (e.g. 'good', 'bad', 'needles', or 'multilattice') or a numerical score, in order to best utilize the tools ofpandas
to manipulate and summarize your results.old_df
: Not strictly required, but to append previous results, pass previous returns fromtt.well()
to the next call asold_df =
All three of these methods (tt.screen()
, tt.tray()
, and tt.well()
) will accept any additional named arguments, and include them as columns in the final data frame. As you would expect, arguments passed to tt.screen()
will apply to all wells in all trays in the screen, and arguments passed to tt.tray()
will apply to all wells in that tray. For example:
detailedscreen = tt.screen(row='protein', col='PEG', maxwell='H6', construct='HEWL', buffer='imidazole', bufferconc=20, salt='MnCl2', saltconc=125)
tray1 = tt.tray(detailedscreen, rows=[1,8], cols=[10,20], date='2021-01-01', setby='robot', weathernotes='very humid day')
results = tt.well(tray1, 'A1', 'good', appxnum=3, notes='rod-shaped')
To save some typing, you can create trays with tt.clonetray()
. Usage is newtray = tt.clonetray(oldtray, **kwargs)
. Any additional arguments passed to clonetray()
will supercede the associated parameter from the parent tray. For example:
# assume screen already exists
tray1 = tt.tray(screen, rows=[1,8], cols=[5,10], date='2021-01-02'
tray2 = tt.clonetray(tray1, date='2021-01-03')
A crystal will frequently have two dates associated with it - when the tray was set, and when the crystal is being logged. Two things of note happen to address this:
- Arguments named
'date'
passed tott.tray()
andtt.well()
automatically become columns named'date_set'
and'date_logged'
, respectively. - If both
'dates'
s are present and in ISO format (YYYY-MM-DD
), they are subtracted (via thedatetime
module) to compute a new columndays_elapsed
. This is an especially important datapoint in crystallization, so it makes sense to give it special treatment. This also avoids the redundant input of date set, date logged, and days elapsed, when the latter is of course determined by the two former.
As mentioned above, tt.well()
returns a pandas
dataframe. This means that you can use pandas
methods and features as desired. One frequent usage might be printing out only select columns with bracket notation, or accessing a certain column with dot notation, e.g.
concise_results = results[['protein', 'PEG', 'quality']]
or
import numpy as np
number_of_crystals = np.sum(results.appxnum())
You can also use the built-in plotting backend of pandas
, which can be nifty to visualize what conditions are working best.
results.plot.scatter('protein', 'PEG')
A slightly fancier plot:
import numpy as np
results['proteinplot'] = results.protein + np.random.normal(scale=0.15, size=len(results))
results['PEGplot'] = results.PEG + np.random.normal(scale=0.15, size=len(results))
colordict= {'good':'green',
'needles':'red'}
results.plot.scatter('proteinplot', 'PEGplot', alpha=0.5, c=results.quality.map(colordict))
Results from subsequent calls to tt.well()
are appended via an "outer_join
", meaning that columns present in one dataframe but not the other will give NaN
values where appropriate, but no errors. This gives flexibility to vary the kinds of details you include across different trays and wells, while still keeping the "core" data common to all crystals in one place.