Skip to content
This repository has been archived by the owner on Jun 16, 2023. It is now read-only.

Release 2.1.0

Compare
Choose a tag to compare
@wuchong wuchong released this 12 Nov 10:02
· 241 commits to master since this release

This version is for Alibaba Global Shopping Festival, November 11th 2015.

New features

  1. Totally redesign Web UI
    1. Make the UI more beatiful
    2. Improve Web UI speed much.
    3. Add Cluster/Topology Level Summarized Metrics in recent 30 minutes.
    4. Add DAG in the Web UI, support Uer Interaction to get key information such as emit, tuple lifecycle, tps
  2. Redesign Metrics/Monitor System
    1. New metrics core, support sample with more metric, avoid noise, merge metrics automatically for user.
    2. No metrics will be stored in ZK
    3. Support metrics HA
    4. Add more useful metrics, such as tuple lifecycle, netty metrics, disk space etc. accurately get worker memory
    5. Support external storage plugin to store metrics.
  3. Implement Smart BackPressure
    1. Smart Backpressure, the dataflow will be more stable, avoid noise to trigger
    2. Easy to manual control Backpressure
  4. Implement TopologyMaster
    1. Redesign hearbeat mechanism, easily support 6000+ tasks
    2. Collect all task's metrics, do merge job, release Nimbus pressure.
    3. Central Control Coordinator, issue control command
  5. Redesign ZK usage, one set of ZK support more 2000+ hardware nodes.
    1. No dynamic data in ZK, such as heartbeat, metrics, monitor status.
    2. Nimbus reduce visiting ZK frequence when serve thrift API.
    3. Reduce visiting ZK frequence, merge some task level ZK node.
    4. Reduce visiting ZK frequence, remove useless ZK node, such as empty taskerror node
    5. Tuning ZK cache
    6. Optimize ZK reconnect mechanism
  6. Tuning Executor Batch performance
    1. Add smart batch size setting
    2. Remove memory copy
    3. Directly issue tuple without batch for internal channel
    4. Set the default Serialize/Deserialize method as Kryo
  7. Set the default Serialized/Deserialized method as Kryo to improve performance.
  8. Support dynamic reload binary/configuration
  9. Tuning LocalShuffle performance, Set 3 level priority, local worker, local node, other node, add dynamic check queue status, connection status.
  10. Optimize Nimbus HA, only the highest priority nimbuses can be promoted as master

Improvement

  1. Supervisor automatically dump worker jstack/jmap, when worker's status is invalid.
  2. Supervisor can generate more ports according to memory.
  3. Supervisor can download binary more time.
  4. Support set logdir in configuration
  5. Add configuration "nimbus.host.start.supervisor"
  6. Add supervisor/nimbus/drpc gc log
  7. Adjust jvm parameter 1. set -Xmn 1/2 of heap memory 2. set PermSize to 1/32 and MaxPermSize 1/16 of heap memory; 3. set -Xms by "worker.memory.min.size"。
  8. Refine ZK error schema, when worker is dead, UI will report error
  9. Add function to zktool utility, support remove all topology znodes, support list
  10. Optimize netty client.
  11. Dynamic update connected task status by network connection, not by ZK znode.
  12. Add configuration "topology.enable.metrics".
  13. Classify all topology log into one directory by topologyName.

Bug fix

  1. Skip download same binary when assigment has been changed.
  2. Skip start worker when binary is invalid.
  3. Use correct configuration map in a lot of worker thread
  4. In the first step Nimbus will check topologyName or not when submit topology
  5. Support fieldGrouping for Object[]
  6. For drpc single instance under one configuration
  7. In the client topologyNameExists interface,directly use trhift api
  8. Fix failed to restart due to topology cleanup thread's competition

Deploy and scripts

  1. Optimize cleandisk.sh, avoid delete useful worker log