Introduction to NoSQL Databases (Peer Graded Assignment)

Course 10 from 13 (Part of IBM Data Engineering Professional Certificate by Coursera)

Having hard time on up and running IBM Skill Network Lab, I decided doing the assignment fully using WSL Ubuntu 20.04 on my local Windows 10 Pro x64. There are some preparation step to do it:

Install npm
Install openjdk 8 jre
Using npm, install couchimport
Get mongodb-database-tools for ubuntu
Install mongodb (make sure you can access mongodb cli)
Install cassandra (make sure you can access the cqlsh)
Set up the IBM Cloudant connection (use credential from IBM Cloudant instance), using curl - make sure you can access your Cloudant DB from local ubuntu terminal

Here is my complete syntax on each step, for the whole assignment (screenshoot provided):


== 2 create index on table movies
curl -X POST $CLOUDANTURL/movies/_index \
-H"Content-Type: application/json" \
-d'{
    "index": {
        "fields": ["Director"]
    }
}'
== 3 create query on specific director
curl -X POST $CLOUDANTURL/movies/_find 

-H"Content-Type: application/json" 

-d'{
"selector":
{
"Director":"Richard Gage"
}
}'
== 4 create index title
curl -X POST $CLOUDANTURL/movies/_index 

-H"Content-Type: application/json" 

-d'{
"index": {
"fields": ["Title"]
}
}'
==5 query to list only the “year” and “Director” keys for the ‘Top Dog’
curl -X POST $CLOUDANTURL/movies/_find 

-H"Content-Type: application/json" 

-d'{ "selector":
{
"title":"Top Dog"

},
"fields":["title","year","Director"]
}'
==6 export from cloudant to json local
couchexport --url $CLOUDANTURL --db movies --type jsonl > movies.json
==7 import json to mongodb
mongoimport --authenticationDatabase admin --db entertainment --collection movies --file movies.json
==8 query to find the year in which most number of movies were released
db.movies.aggregate([
{"$group" : {_id:"$year", count:{$sum:1}}},
{$sort:{count:-1}},
{"$limit":1}
])
db.movies.aggregate([
{"$group" : {_id:"$year", count:{$sum:1}}},
{$sort:{count:-1}}
])
==9 query to find the count of movies released after the year 1999
db.movies.aggregate([
{"$group" : {_id:"$year", count:{$sum:1}}},
{$match: {_id: {"$gt":1999}}},
{$sort:{_id:1}}
])
==10 query to find out the average votes for movies released in 2007
db.movies.aggregate([
{$match: {year: 2007}},
{$group: {_id: "$title",averageVotes: {$avg: "$Votes"}}},
{$sort:{avg_vote:-1}}
]);
==11 Export the fields _id, “title”, “year”, “rating” and “director” from the ‘movies’ collection into a file named partial_data.csv
mongoexport --authenticationDatabase admin --db entertainment --collection movies --out partial_data.csv --type=csv --fields _id,title,year,rating,Director
==12 Import partial_data.csv into cassandra server into a
==keyspace named entertainment and table named movies
CREATE KEYSPACE entertainment WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 3};
use entertainment;
CREATE TABLE movies(id int PRIMARY KEY,title text,year int,rating int,Director text);
==13 Write a cql query to count the number of rows in the movies table
select count(*) from entertainment.movies;
==14 Create an index for the column rating in the movies table using cql
create index rating_index on movies(rating);
==15 Write a cql query to count the number of in the movies that are rated 'G'
select count(*) from entertainment.movies where rating='G';

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
1 replication.jpg		1 replication.jpg
10 vote 2007.jpg		10 vote 2007.jpg
11 partial export.jpg		11 partial export.jpg
12 import cassandra.jpg		12 import cassandra.jpg
13 query movies.jpg		13 query movies.jpg
14 index movies.jpg		14 index movies.jpg
15 rating G.jpg		15 rating G.jpg
2 index movies.jpg		2 index movies.jpg
3 query director.jpg		3 query director.jpg
4 index title.jpg		4 index title.jpg
5a query title.jpg		5a query title.jpg
6 export.jpg		6 export.jpg
7 import mongodb.jpg		7 import mongodb.jpg
8a year count.jpg		8a year count.jpg
9 movies after 1999.jpg		9 movies after 1999.jpg
README.md		README.md
movie.json		movie.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction to NoSQL Databases (Peer Graded Assignment)

Course 10 from 13 (Part of IBM Data Engineering Professional Certificate by Coursera)

About

Releases

Packages

andreiramani/Introduction-to-NoSQL-Databases

Folders and files

Latest commit

History

Repository files navigation

Introduction to NoSQL Databases (Peer Graded Assignment)

Course 10 from 13 (Part of IBM Data Engineering Professional Certificate by Coursera)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages