Skip to content
tuulos edited this page Sep 12, 2010 · 14 revisions

RulesOfThumb for M/R programming

This page lists open bugs, issues and wishes about Disco. If you want to add a new item, please add your contact info after the item so we know who to ping for more information.

bugs

  • “Kill job” doesn’t work always (tuulos)
  • Tracebacks are formatted incorrectly on the status page (tuulos)
  • Filter doesn’t work correctly (tuulos)

wishlist

  • For each job, make a process that encapsulates job’s information. This way job info can be queried from various modules in the system without carrying a large record around. When this is done, use the mechanism to parse Python client’s version from the request so that a corresponding interpreter can be used on the nodes. This should solve the problem with mismatching python versions. (tuulos)
  • Disco.job() implementation for other languages besides Python, using the external interface (tuulos)
  • General speed-ups: Replace urllib with pycurl, rewrite netstr_reader (tuulos)
  • Support for streaming data between maps and reduces: If sorting is disabled, we could stream map outputs to reduces directly, without writing any intermediate files, and without the reduces needing to wait for maps to finish. (tuulos)
  • A way to stop map / reduce before all data has been consumed (tuulos)
  • Separate users / groups: A personal joblist etc. (tuulos)
  • Distribute params-files to multiple servers — fix the issue with all tasks trying to retrieve the same params file from the master simultaneously when they start (tuulos)
Clone this wiki locally