This repository has been archived by the owner on Jun 16, 2023. It is now read-only.
Release 2.1.0
This version is for Alibaba Global Shopping Festival, November 11th 2015.
New features
- Totally redesign Web UI
- Make the UI more beatiful
- Improve Web UI speed much.
- Add Cluster/Topology Level Summarized Metrics in recent 30 minutes.
- Add DAG in the Web UI, support Uer Interaction to get key information such as emit, tuple lifecycle, tps
- Redesign Metrics/Monitor System
- New metrics core, support sample with more metric, avoid noise, merge metrics automatically for user.
- No metrics will be stored in ZK
- Support metrics HA
- Add more useful metrics, such as tuple lifecycle, netty metrics, disk space etc. accurately get worker memory
- Support external storage plugin to store metrics.
- Implement Smart BackPressure
- Smart Backpressure, the dataflow will be more stable, avoid noise to trigger
- Easy to manual control Backpressure
- Implement TopologyMaster
- Redesign hearbeat mechanism, easily support 6000+ tasks
- Collect all task's metrics, do merge job, release Nimbus pressure.
- Central Control Coordinator, issue control command
- Redesign ZK usage, one set of ZK support more 2000+ hardware nodes.
- No dynamic data in ZK, such as heartbeat, metrics, monitor status.
- Nimbus reduce visiting ZK frequence when serve thrift API.
- Reduce visiting ZK frequence, merge some task level ZK node.
- Reduce visiting ZK frequence, remove useless ZK node, such as empty taskerror node
- Tuning ZK cache
- Optimize ZK reconnect mechanism
- Tuning Executor Batch performance
- Add smart batch size setting
- Remove memory copy
- Directly issue tuple without batch for internal channel
- Set the default Serialize/Deserialize method as Kryo
- Set the default Serialized/Deserialized method as Kryo to improve performance.
- Support dynamic reload binary/configuration
- Tuning LocalShuffle performance, Set 3 level priority, local worker, local node, other node, add dynamic check queue status, connection status.
- Optimize Nimbus HA, only the highest priority nimbuses can be promoted as master
Improvement
- Supervisor automatically dump worker jstack/jmap, when worker's status is invalid.
- Supervisor can generate more ports according to memory.
- Supervisor can download binary more time.
- Support set logdir in configuration
- Add configuration "nimbus.host.start.supervisor"
- Add supervisor/nimbus/drpc gc log
- Adjust jvm parameter 1. set -Xmn 1/2 of heap memory 2. set PermSize to 1/32 and MaxPermSize 1/16 of heap memory; 3. set -Xms by "worker.memory.min.size"。
- Refine ZK error schema, when worker is dead, UI will report error
- Add function to zktool utility, support remove all topology znodes, support list
- Optimize netty client.
- Dynamic update connected task status by network connection, not by ZK znode.
- Add configuration "topology.enable.metrics".
- Classify all topology log into one directory by topologyName.
Bug fix
- Skip download same binary when assigment has been changed.
- Skip start worker when binary is invalid.
- Use correct configuration map in a lot of worker thread
- In the first step Nimbus will check topologyName or not when submit topology
- Support fieldGrouping for Object[]
- For drpc single instance under one configuration
- In the client topologyNameExists interface,directly use trhift api
- Fix failed to restart due to topology cleanup thread's competition
Deploy and scripts
- Optimize cleandisk.sh, avoid delete useful worker log