This tutorial is an introduction on loading Cloudant data into Apache Spark and saving the data into Db2. In this tutorial you will:
- Create a Python notebook to load the Cloudant data in Watson Studio.
- Save the Apache Spark DataFrame into Db2 Warehouse on Cloud.
- View the data in the Db2 Warehouse on Cloud table.
These are the IBM Watson Studio services required to run this tutorial:
Watch the Getting Started on IBM Cloud video to create an account. You can download the Python notebook referenced in this tutorial or create your own notebook by cutting/pasting the code into a new notebook.
Note: For Db2 Warehouse on Cloud
service, you'll need to locate and copy the service credentials.
These will be required for saving the Spark data into a Db2 Warehouse table.
-
Run the statement below which imports and initializes SparkSession.
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate()
-
Paste the following statement into the second cell, and then click Run. This command reads the
animaldb
database from the Cloudantexamples
account and assigns it to thecloudantdata
variable.cloudantdata = spark.read.format("org.apache.bahir.cloudant")\ .option("cloudant.host", "examples.cloudant.com")\ .load("animaldb")
-
Paste the following statement into the third cell, and then click Run. This next command lets you take a look at that schema.
cloudantdata.printSchema
- Paste the following statement into the fourth cell, and then click Run. This line persists the DataFrame into a DB2 Warehouse table.
This command writes 10 documents into a table named
animaldb
. Replacedb2_jdbc_url
,user
, andpassword
with fieldsjdbcurl
,username
, andpassword
from your DB2 Warehouser service credentials.
properties = {
'user': 'username',
'password': 'password',
'driver': 'com.ibm.db2.jcc.DB2Driver'
}
db2_jdbc_url = 'jdbc:db2://***:50000/BLUDB'
# Save Spark DataFrame to Db2 Warehouse
cloudantdata.write.jdbc(db2_jdbc_url, 'animaldb', properties=properties)
- In the Bluemix dashboard, go to your Db2 Warehouse on Cloud service.
- On the Manage tab, click the Open button:
- In the Db2 console, click on the Explore tab and select the schema that matches your username.
- Select the
ANIMALDB
table under the selected schema and click View Data: - You should now see a list of ten documents each with a unique animal name:
To learn more about Cloudant and work with your own Cloudant database, check out the Cloudant NoSQL DB IBM Watson Studio service.