-
Notifications
You must be signed in to change notification settings - Fork 241
Home
tuulos edited this page Sep 12, 2010
·
14 revisions
Rules of Thumb for Map/Reduce programming for M/R programming
This page lists open bugs, issues and wishes about Disco. If you want to add a new item, please add your contact info after the item so we know who to ping for more information.
- “Kill job” doesn’t work always (tuulos)
- Tracebacks are formatted incorrectly on the status page (tuulos)
- Filter doesn’t work correctly (tuulos)
- For each job, make a process that encapsulates job’s information. This way job info can be queried from various modules in the system without carrying a large record around. When this is done, use the mechanism to parse Python client’s version from the request so that a corresponding interpreter can be used on the nodes. This should solve the problem with mismatching python versions. (tuulos)
- Disco.job() implementation for other languages besides Python, using the external interface (tuulos)
- General speed-ups: Replace urllib with pycurl, rewrite netstr_reader (tuulos)
- Support for streaming data between maps and reduces: If sorting is disabled, we could stream map outputs to reduces directly, without writing any intermediate files, and without the reduces needing to wait for maps to finish. (tuulos)
- A way to stop map / reduce before all data has been consumed (tuulos)
- Separate users / groups: A personal joblist etc. (tuulos)
- Distribute params-files to multiple servers — fix the issue with all tasks trying to retrieve the same params file from the master simultaneously when they start (tuulos)