DDF - Distributed DataFrame

DDF aims to make Big Data easy yet powerful, by bringing together the best ideas from R Data Science, RDBMS/SQL, and Big Data distributed processing.

It exposes high-level abstractions like RDBMS tables, SQL queries, data cleansing and transformations, machine-learning algorithms, even collaboration and authentication, etc., while hiding all the complexities of parallel distributed processing and data handling.

DDF is a general abstraction that can be implemented on multiple execution and data engines. We are providing a native implementation on Apache Spark, as it is today the most expressive in its DAG parallelization and also most powerful in its in-memory distributed dataset abstraction (RDD). With this release, DDF provides native Spark support for R, Python, Java, Scala.

An aim of the DDF project is to shine a focus of Big Data conversations on top-down, user-focussed simplicity and power, where "users" include business analysts, data scientists, and high-level Big Data engineers.

Directory Structure

Directory	Description
bin	useful helper scripts
exe	DDF execution/launch scripts and executables
conf	DDF configuration files
clients	DDF client code, e.g., R, Python, etc.
contrib	Contributed DDF code that has not/does not fit into the core API
core	DDF core API
spark	DDF Spark implementation
examples	DDF example API-user code
project	Scala build config files

Getting Started

First clone or fork a copy of DDF, e.g.:

$ git clone http://git.adatao.com/DDF

Now you need to prepare the build, which prepares the libraries, creates pom.xml in the various sub-project directories, and Eclipse .project and .classpath files.

$ cd DDF
$ bin/run-once.sh

If you ever need to regenerated the pom.xml files:

$ bin/make-poms.sh

The following regenerates Eclipse .project and .classpath files:

$ bin/make-eclipse-projects.sh

Building `DDF_core` or `DDF_spark`

$ (cd core ; mvn clean package)
$ (cd spark ; mvn clean package)

Running tests

$ bin/sbt test

or

$ (cd core ; mvn test)
$ (cd spark ; mvn test)

Name		Name	Last commit message	Last commit date
Latest commit History 1,966 Commits
bin		bin
ddf-conf		ddf-conf
exe		exe
pa		pa
project		project
resources/test		resources/test
spark_adatao		spark_adatao
style		style
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dummy		dummy
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DDF - Distributed DataFrame

Directory Structure

Getting Started

Building `DDF_core` or `DDF_spark`

Running tests

About

Releases

Packages

Contributors 8

Languages

License

adatao/DDF

Folders and files

Latest commit

History

Repository files navigation

DDF - Distributed DataFrame

Directory Structure

Getting Started

Building DDF_core or DDF_spark

Running tests

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Building `DDF_core` or `DDF_spark`

Packages