Skip to content

Commit

Permalink
Update readme.md
Browse files Browse the repository at this point in the history
  • Loading branch information
grobbie authored Oct 17, 2018
1 parent 4aedce1 commit 1d81253
Showing 1 changed file with 39 additions and 40 deletions.
79 changes: 39 additions & 40 deletions readme.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,62 @@
# Barbarian big data system
Barbarian is the world's best cloud-first, cloud-agnostic big data system founded on Apache Hadoop for enterprise-ready parallel distributed data processing at scale.

Barbarian is the world's best cloud-first, cloud-agnostic in-memory big data system founded on Apache Hadoop for enterprise-ready parallel distributed data processing at scale.

Read more at:
[https://barbarians.io/](https://barbarians.io)

## Hive Docker image
Docs at:
[http://docs.barbarians.io/](http://docs.barbarians.io)

This repo contains the configuration files and build scripts for the Barbarian Hadoop **Hive Docker image**.
### About Barbarian

The latest release of the Hive Docker image is based on the following Apache Foundation software releases:
- Apache Hadoop 2.8.4
- Apache Hive 3.1.0
- Apache Ignite 2.6 (patched)
- Apache Tez 0.9.2
- Apache Slider 0.6
The Barbarian Data System is an in-memory, parallel, distributed (MPP) data warehousing engine designed to be deployed to Kubernetes clusters, offering Apache Hive for powerful and flexible SQL based analytics. Barbarian includes an integrated in-memory filesystem and can run in three modes of operation.
* As an in-memory, standalone data warehousing system
* As a data warehousing system backed by an external storage system like Amazon S3
* In a hybrid mode, where primary storage is the external storage system, with common paths mounted to the in-memory filesystem

## Releases
Barbarian includes compelling features including Apache Hive LLAP and Tez, with transactional tables enabled by default.

| Release | Notes |
| -- | -- |
| 0.1-INTERNAL | Prelease PoC for demo |
| -- | -- |
Barbarian's integrated Ignite in-memory distributed parallel filesystem is resilient to node failure with replication enabled by default.

## Building
Barbarian has no single points of failure.

Just run the buildscript @ ```$WORKING_DIR/build-image.sh```
Barbarian is offered with the [Apache v2.0](https://www.apache.org/licenses/LICENSE-2.0) software license.

## Running
### Installing Barbarian

This image is designed to be run as a part of the Barbarian Hadoop distribution - a Kubernetes based platform for data processing at scale, founded on free software developed by the [Apache Software Foundation](https://www.apache.org/).
Barbarian can be deployed to your Kubernetes cluster with just two commands:

Launching this image is done using [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) and the provided [yaml](http://yaml.org) configuration file (see in the directory ./yaml). Successful deployment depends on:
- A running Kubernetes cluster with sufficient resource to deploy the full platform
- A running ZooKeeper ensemble
- A running and configured MySQL database server (automation, move to PostgreSQL and automated initialization/upgrade will come)
- A running Ignite IGFS cluster
- An AWS S3 bucket and associated (AWS IAM restricted) access keys
- An (internal, firewalled) webserver that hosts the access keys
- A running YARN ResourceManager
- A running cluster of YARN NodeManagers
```helm repo add barbarians http://charts.barbarians.io/barbarian```
```helm install barbarians/barbarian```

Launch a 3-node Hive metastore cluster with ```kubectl create -f yaml/metastore.yaml```
## Hive container image

Launch a 4-node LLAP cluster with ```kubectl create -f yaml/hiveserver2.yaml```
This repo contains the configuration files and build scripts for the Barbarian Hadoop Distribution **Hive container image**.

*** please ensure that you set up the necessary secrets files and use the secrets setup script in the repo [barbarian-tooling](https://github.com/go-barbarians/barbarian-tooling) or the Metastore's database will not be accessible ***
*** please ensure that you initialize an RDS instance using ```hive/bin/schematool -init``` or the Metastore's database will not be useable ***
The latest release of the Hive container image is based on the following Apache Foundation software releases:
- Apache Hive 3.1

## Features
## Releases

The image includes support for the following Hive features:
- An initpod to launch Hive LLAP daemons (low latency analytics processing aka live long and process) on the cluster
- An initpod to wait for the Hive LLAPd service to come up
- Apache Tez execution engine
- Standalone Metastore in HA
- Hiveserver2
| Release | Notes |
| -- | -- |
| 0.1 | Prelease 1 |
| 0.2 | Barbarian Data System r2 |
| -- | -- |

## What's still to do
## Building

- Support for LDAP, kerberos & Apache Ranger will follow
See .codefresh

## Running

This image is designed to be run as a part of the Barbarian Hadoop distribution - a Kubernetes based platform for data processing at scale, founded on free software developed by the [Apache Software Foundation](https://www.apache.org/).

## Features

The image includes support for the following features:
- Hive
- Tez
- LLAP
- LLAP on YARN Services deployment model

0 comments on commit 1d81253

Please sign in to comment.