Add some notes on initial evaluation tooling

awwaiid · Nov 23, 2024 · f7ee1d8 · f7ee1d8
1 parent badbccd
commit f7ee1d8
Showing 1 changed file with 9 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -68,6 +68,15 @@ Draw some stuff on your screen, and then trigger the assistant by *touching/tapp
   * It is able to use an almost identical tool-use setup, so I should be able to merge the two
   * So far it seems to like drawing a bit more, but it is not great at drawing and not much better at spacial awareness
   * Maybe next on the queue will be augmenting spacial awareness through some image pre-processing and result positioning. Like detect bounding boxes, segments, etc, feed that into the model, and have the model return an array of svgs and where they should be positioned. Maybe.
+* **2024-11-22** - Manual Evaluations
+  * Starting to sketch out how an evaluation might work
+  * First I've added a bunch of parameters for recording input/output
+  * Then I use that to record a sample input and output on the device
+  * Then I added support to run ghostwriter on my laptop using the pre-captured input
+  * Next I will build some tooling around iterating on examples given different prompts or pre-processing
+  * And then if I can get enough examples maye I'll have to make an AI judge to scale :)
+  * To help with that ... on idea is to make overlay the original input with the output but make the output a different color to make it differentiable by the judge
+  * So far this technique is looking good for SVG output, but it'd be nice to somehow render keyboard output locally too. That is tricker since the keyboard input rendering is done by the reMarkable app
 
 ## Ideas
 * [DONE] Matt showed me his iOS super calc that just came out, take inspiration from that!