-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
39 additions
and
40 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,63 +1,62 @@ | ||
# Barbarian big data system | ||
Barbarian is the world's best cloud-first, cloud-agnostic big data system founded on Apache Hadoop for enterprise-ready parallel distributed data processing at scale. | ||
|
||
Barbarian is the world's best cloud-first, cloud-agnostic in-memory big data system founded on Apache Hadoop for enterprise-ready parallel distributed data processing at scale. | ||
|
||
Read more at: | ||
[https://barbarians.io/](https://barbarians.io) | ||
|
||
## Hive Docker image | ||
Docs at: | ||
[http://docs.barbarians.io/](http://docs.barbarians.io) | ||
|
||
This repo contains the configuration files and build scripts for the Barbarian Hadoop **Hive Docker image**. | ||
### About Barbarian | ||
|
||
The latest release of the Hive Docker image is based on the following Apache Foundation software releases: | ||
- Apache Hadoop 2.8.4 | ||
- Apache Hive 3.1.0 | ||
- Apache Ignite 2.6 (patched) | ||
- Apache Tez 0.9.2 | ||
- Apache Slider 0.6 | ||
The Barbarian Data System is an in-memory, parallel, distributed (MPP) data warehousing engine designed to be deployed to Kubernetes clusters, offering Apache Hive for powerful and flexible SQL based analytics. Barbarian includes an integrated in-memory filesystem and can run in three modes of operation. | ||
* As an in-memory, standalone data warehousing system | ||
* As a data warehousing system backed by an external storage system like Amazon S3 | ||
* In a hybrid mode, where primary storage is the external storage system, with common paths mounted to the in-memory filesystem | ||
|
||
## Releases | ||
Barbarian includes compelling features including Apache Hive LLAP and Tez, with transactional tables enabled by default. | ||
|
||
| Release | Notes | | ||
| -- | -- | | ||
| 0.1-INTERNAL | Prelease PoC for demo | | ||
| -- | -- | | ||
Barbarian's integrated Ignite in-memory distributed parallel filesystem is resilient to node failure with replication enabled by default. | ||
|
||
## Building | ||
Barbarian has no single points of failure. | ||
|
||
Just run the buildscript @ ```$WORKING_DIR/build-image.sh``` | ||
Barbarian is offered with the [Apache v2.0](https://www.apache.org/licenses/LICENSE-2.0) software license. | ||
|
||
## Running | ||
### Installing Barbarian | ||
|
||
This image is designed to be run as a part of the Barbarian Hadoop distribution - a Kubernetes based platform for data processing at scale, founded on free software developed by the [Apache Software Foundation](https://www.apache.org/). | ||
Barbarian can be deployed to your Kubernetes cluster with just two commands: | ||
|
||
Launching this image is done using [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) and the provided [yaml](http://yaml.org) configuration file (see in the directory ./yaml). Successful deployment depends on: | ||
- A running Kubernetes cluster with sufficient resource to deploy the full platform | ||
- A running ZooKeeper ensemble | ||
- A running and configured MySQL database server (automation, move to PostgreSQL and automated initialization/upgrade will come) | ||
- A running Ignite IGFS cluster | ||
- An AWS S3 bucket and associated (AWS IAM restricted) access keys | ||
- An (internal, firewalled) webserver that hosts the access keys | ||
- A running YARN ResourceManager | ||
- A running cluster of YARN NodeManagers | ||
```helm repo add barbarians http://charts.barbarians.io/barbarian``` | ||
```helm install barbarians/barbarian``` | ||
|
||
Launch a 3-node Hive metastore cluster with ```kubectl create -f yaml/metastore.yaml``` | ||
## Hive container image | ||
|
||
Launch a 4-node LLAP cluster with ```kubectl create -f yaml/hiveserver2.yaml``` | ||
This repo contains the configuration files and build scripts for the Barbarian Hadoop Distribution **Hive container image**. | ||
|
||
*** please ensure that you set up the necessary secrets files and use the secrets setup script in the repo [barbarian-tooling](https://github.com/go-barbarians/barbarian-tooling) or the Metastore's database will not be accessible *** | ||
*** please ensure that you initialize an RDS instance using ```hive/bin/schematool -init``` or the Metastore's database will not be useable *** | ||
The latest release of the Hive container image is based on the following Apache Foundation software releases: | ||
- Apache Hive 3.1 | ||
|
||
## Features | ||
## Releases | ||
|
||
The image includes support for the following Hive features: | ||
- An initpod to launch Hive LLAP daemons (low latency analytics processing aka live long and process) on the cluster | ||
- An initpod to wait for the Hive LLAPd service to come up | ||
- Apache Tez execution engine | ||
- Standalone Metastore in HA | ||
- Hiveserver2 | ||
| Release | Notes | | ||
| -- | -- | | ||
| 0.1 | Prelease 1 | | ||
| 0.2 | Barbarian Data System r2 | | ||
| -- | -- | | ||
|
||
## What's still to do | ||
## Building | ||
|
||
- Support for LDAP, kerberos & Apache Ranger will follow | ||
See .codefresh | ||
|
||
## Running | ||
|
||
This image is designed to be run as a part of the Barbarian Hadoop distribution - a Kubernetes based platform for data processing at scale, founded on free software developed by the [Apache Software Foundation](https://www.apache.org/). | ||
|
||
## Features | ||
|
||
The image includes support for the following features: | ||
- Hive | ||
- Tez | ||
- LLAP | ||
- LLAP on YARN Services deployment model |