-
Notifications
You must be signed in to change notification settings - Fork 178
Troubleshooting, testing and live coding
TODO: This page is a draft and shall be expanded.
You can use Cascading Traps with Cascalog to capture tuples whose processing fails. To store those tuples into a sink tap (for example a local file or hfs-textline), use the :trap
keyword with an error sink:
(def errors (lfs-textline "file:///tmp/people.bad_records" :sinkmode :replace))
;; or (stdout) or (hfs-textline "hdfs:///tmp/...") if running on Hadoop
(<- [?name ?age]
(people ?name ?age)
(:trap errors)
(< ?age 40))
You may use the functions and macros from the cascalog.testing namespace together with clojure.test test your queries. See Cascalog's own tests for examples.
The best way to get started with testing is howere to check out the documentation and examples of the midje-cascalog library. It uses for example fact?-
to execute a query and compare its outputs with the expected ones or something like (facts query => (produces [[3 10] [1 5] [5 11]])
where (def query (<- ...))
. Read Sam Ritchie's blog post Cascalog Testing 2.0 for more details and examples of midje-cascalog 0.4.0.
There are certain features that support live, interactive coding:
- Use simple Clojure collections as data sources (
(def people [["ben" 21] ["jim" 42]])
) - You can during development easily change some parts of Cascalog code to standard Clojure functions and call them from the REPL, for example a custom operator by replacing
(defaggregateop
with(defn
. - Queries can be of course executed from the REPL