Skip to content

Persistence Framework

regunathb edited this page Nov 6, 2012 · 5 revisions

The Trooper persistence framework and libraries abstract common functionalities of read-write to data stores to certain well-defined actions and interfaces. In the process, also adds some useful features for deploying and using a Data-Access-Layer in production environments.

Feature List

  • Provide a consistent interface for common persistence actions - Create, Read, Update & Delete (CRUD). This set of actions meets many, if not most, of application persistence needs. The design is loosely based on JPA (http://en.wikipedia.org/wiki/Java_Persistence_API) but extends to non-relational stores as well.
  • Support multiple implementations of the Persistence Provider interfaces in order to enable persistence to SQL and NoSQL data stores.
  • Provide a number of useful features related to persistence:
    • Sharding i.e. distributing data across a number of data nodes
    • Performance metrics logging - ability to capture metrics for data read/write in production environments and when required.
    • Transaction support in persistence calls to specific data stores. Currently the framework supports single schema transaction support for RDBMS persistence provider.

Maven artifacts

GroupID/Org ArtifactID/Name Description
org.trpr platform-model XML schema project that defines common data types/structures
org.trpr platform-core Core API for persistence and default implementations where relevant. Also includes features like sharding and mapping persistence entities to providers.

API Design

The persistence framework is designed using the following entities:

  • Persistence manager – Primary interface to the persistence framework for all client calls
  • Persistence provider – Registered with the Persistence Manager and normally maps to one data store type – RDBMS, DFS etc.
  • Persistence handler – Registered with the Persistence Provider and implements the persistence calls.
  • Persistent entity – Any business entity that requires persistence
  • Criteria – Meta data container for the persistence call. Includes custom queries for data loading

Code sample

The most common usage of the persistence framework is via Spring dependency injection where Data sources, Persistence Providers, Handlers and their mappings to Persistent entities are configured. Code for invoking persistence operations are very straightforward as shown below:

// create the PersistentEntity and populate attributes
HBaseEarthling testEntity = new HBaseEarthling();
testEntity.setName("Jone Doe");
testEntity.setXXX(yyyy);
.........
// persist to the configured data store via the PersistenceManager
persistenceManager.makePersistent(testEntity);

Sample Spring bean XML configuration is as shown below:

<!-- Mapping between PersistenceProvider and the PersistentEntity-->
<bean id="hbasePersistenceManager" class="org.trpr.platform.core.impl.persistence.PersistenceManagerProvider">
    <property name="providersForEntity">
        <map>
            <entry key="org.trpr.example.batch.hbase.test.entity.HBaseEarthling" value-ref="hbaseProvider" />
        </map>
    </property>
    <property name="persistenceDelegate" ref="hbaseDelegate" />
</bean>

<!-- The PersistenceProvider bean declaration (Hbase in this case) -->
<bean id="hbaseProvider" class="org.trpr.dataaccess.hbase.persistence.HBaseProvider">
    <property name="handler" ref="hbaseHandler" />
</bean>
<!-- The PersistenceDelegate that does most of the provider's work -->
<bean id="hbaseDelegate" class="org.trpr.platform.core.impl.persistence.PersistenceDelegate" />

<!-- The PersistenceHandler that has configuration and behavior specific to the data store (HBase in this case)-->
<bean id="hbaseHandler" class="org.trpr.dataaccess.hbase.persistence.HBaseHandler">
    <!-- HBase configuration properties (org.apache.hadoop.conf.Configuration) -->
    <property name="hbaseConfigProps">
        <props>
            <prop key="hbase.rootdir">hdfs://stage-pf4.nm.flipkart.com:8020/hbase</prop>
            <prop key="hbase.master.port">60000</prop>
            <prop key="hbase.zookeeper.quorum">stage-pf4.nm.flipkart.com:2181</prop>
            <prop key="hbase.client.write.buffer">2097152</prop>
        </props>
    </property>
    <!-- HBaseMappings information -->
    <property name="hbaseMappings">
        <list>
            <value>hbase-earthling.hbase.xml</value>
        </list>
    </property>
</bean>

Working example

Complete configuration and usage in a sample Trooper batch job is available here:

Trooper/examples/example-hbase/src/main/resources/external/hbaseTestTaskletJob/spring-batch-config.xml

Persistence Performance Metrics

All Persistence Provider implementations in Trooper expose on-demand performance metrics logging for the persistence API methods. This is exposed via JMX where the elapsed time threshold can be specified and the handler logs metrics only for those calls that exceed the threshold. The JMX bean available with the above mentioned working example is:

spring.application--> Trooper--> Performance-Metrics--> HBaseMetrics-Hbase_Test_Runtime--> TestHBaseOps-HbaseHandler

Performance metrics logging is implemented using Perf4J(http://perf4j.codehaus.org/). The Trooper metrics when parsed from logs using Perf4J Log parser produces an output like this:

$ java -jar perf4j-0.9.15.jar hbase-perf.log 

Performance Statistics   2012-11-06 10:52:30 - 2012-11-06 10:53:00
Tag                                                  Avg(ms)         Min         Max     Std Dev       Count
HBaseHandler.findEntity                               1113.0          63        3213      1484.9           3

Performance Statistics   2012-11-06 11:52:00 - 2012-11-06 11:52:30
Tag                                                  Avg(ms)         Min         Max     Std Dev       Count
HBaseHandler.findEntity                                731.3          37        2116       979.1           3

Also See

Persistence Sharding

Persistence Providers