hbase-hive-solr-Integrated

Requirements

Docker version 20 and above
Linux System (preferably Ubuntu 18 and above, with over 12 GB of RAM and sufficient disk space, say 20+ GB)

Instructions

Running Solr using Compose

After cloning the repository, in one terminal instance, run the following

sudo docker-compose up

Or

sudo docker compose up -d

Before running the following commands download the data.csv file from here

Running Hbase and Hive Integrations :

In another terminal instance,

Build the image from the repository,
```
sudo docker build -t <image-name> .
```

Create and Run the container instance,

sudo docker run -it --name <container-name> --hostname bigdata -p 50070:50070 -p 8088:8088 -p 10020:10020 -p 9042:9042 -p 10000:10000 -p 10001:10001 -p 10002:10002 <image-name>

When the Hive Shell pops up, enter the following Hive Queries,

CREATE TABLE data (id STRING, step BIGINT, type STRING, amount FLOAT, nameOrig STRING, oldbalanceOrg FLOAT, newbalanceOrig FLOAT, nameDest STRING, oldbalanceDest FLOAT, newbalanceDest FLOAT,isFruad  TINYINT, isFlaggedFruad TINYINT)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:step,cf1:type,cf1:amount,cf1:nameOrig,cf1:oldbalanceOrg,cf1:newbalanceOrig,cf1:nameDest,cf1:oldbalanceDest,cf1:newbalanceDest,cf1:isFruad,cf1: isFlaggedFruad")
TBLPROPERTIES ("hbase.table.name" = "data");
CREATE TABLE temp_data (id STRING, step BIGINT, type STRING, amount FLOAT, nameOrig STRING, oldbalanceOrg FLOAT, newbalanceOrig FLOAT, nameDest STRING, oldbalanceDest FLOAT, newbalanceDest FLOAT,isFruad  TINYINT, isFlaggedFruad TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
LOAD DATA LOCAL INPATH 'data.csv' OVERWRITE INTO TABLE temp_data;
INSERT OVERWRITE TABLE data SELECT b.* FROM temp_data b;
select count(*) from data;
CREATE EXTERNAL TABLE solr_2 (id STRING, step_l BIGINT, type_s STRING, amount_f FLOAT, nameOrig_s STRING, oldbalanceOrg_f FLOAT, newbalanceOrig_f FLOAT, nameDest_s STRING, oldbalanceDest_f FLOAT, newbalanceDest_f FLOAT,isFruad_i  TINYINT, isFlaggedFruad_i TINYINT)
STORED BY 'com.lucidworks.hadoop.hive.LWStorageHandler'
TBLPROPERTIES('solr.server.url' = 'http://172.17.0.3:8983/solr',
'solr.collection' = 'solr_analysis',
'solr.query' = '*:*');
describe solr_2;
INSERT OVERWRITE TABLE solr_2 SELECT b.* FROM data b;

Note that " 'solr.server.url' = 'http://172.17.0.3:8983/solr' " is the Docker IP allocated to the before said Solr container, which can be viewed by

sudo docker network inspect bridge

Now the data in 'data.csv' file is indexed and mapped to Solr collection using Hbase-Hive bridge.

Note

You can make changes in the Dockerfile provided if you don't want to pop up the hive shell (Details are provided inside Dockerfile comments)
The Solr instance is using local file system for storing indexes, not HDFS. (ToDo)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Dockerfile		Dockerfile
README.md		README.md
commands.hql		commands.hql
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hbase-hive-solr-Integrated

Contents :

Here is the apropriate Docker Images for setting up Apache Hbase + Hive in one container and Apache Solr in another. The steps to run these are as follows

Requirements

Instructions

Running Solr using Compose

Running Hbase and Hive Integrations :

Note

About

Releases

Packages

Languages

Shru-04/hbase-hive-solr-Integrated

Folders and files

Latest commit

History

Repository files navigation

hbase-hive-solr-Integrated

Contents :

Here is the apropriate Docker Images for setting up Apache Hbase + Hive in one container and Apache Solr in another. The steps to run these are as follows

Requirements

Instructions

Running Solr using Compose

Running Hbase and Hive Integrations :

Note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages