Skip to content

Latest commit

 

History

History
 
 

hadoop

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Hadoop MapReduce InputFormat/OutputFormat for TFRecords

This directory contains a Apache Hadoop MapReduce InputFormat/OutputFormat implementation for TensorFlow's TFRecords format. This can also be used with Apache Spark.

Prerequisites

  1. Apache Maven

  2. Tested with Hadoop 2.6.0. Patches are welcome if there are incompatibilities with your Hadoop version.

Breaking changes

  • 08/20/2018 - Reverted artifactId back to org.tensorflow.tensorflow-hadoop
  • 05/29/2018 - Changed the artifactId from org.tensorflow.tensorflow-hadoop to org.tensorflow.hadoop

Build and install

  1. Compile the code

    mvn clean package

    Alternatively, if you would like to build jars for a different version of TensorFlow, e.g., 1.5.0:

    mvn versions:set -DnewVersion=1.5.0
    mvn clean package
  2. Optionally install (or deploy) the jars

    mvn install

    After installation (or deployment), the package can be used with the following dependency:

    <dependency>
      <groupId>org.tensorflow</groupId>
      <artifactId>tensorflow-hadoop</artifactId>
      <version>1.10.0</version>
    </dependency>

Use with MapReduce

The Hadoop MapReduce example can be found here.

Use with Apache Spark

The Spark-TensorFlow-Connector uses TensorFlow Hadoop to load and save TensorFlow's TFRecords format using Spark DataFrames.