-
Notifications
You must be signed in to change notification settings - Fork 35
Persistence Framework
The Trooper persistence framework and libraries abstract common functionalities of read-write to data stores to certain well-defined actions and interfaces. In the process, also adds some useful features for deploying and using a Data-Access-Layer in production environments.
- Provide a consistent interface for common persistence actions - Create, Read, Update & Delete (CRUD). This set of actions meets many, if not most, of application persistence needs. The design is loosely based on JPA (http://en.wikipedia.org/wiki/Java_Persistence_API) but extends to non-relational stores as well.
- Support multiple implementations of the Persistence Provider interfaces in order to enable persistence to SQL and NoSQL data stores.
- Provide a number of useful features related to persistence:
- Sharding i.e. distributing data across a number of data nodes
- Performance metrics logging - ability to capture metrics for data read/write in production environments and when required.
- Transaction support in persistence calls to specific data stores. Currently the framework supports single schema transaction support for RDBMS persistence provider.
GroupID/Org | ArtifactID/Name | Description |
---|---|---|
org.trpr | platform-model | XML schema project that defines common data types/structures |
org.trpr | platform-core | Core API for persistence and default implementations where relevant. Also includes features like sharding and mapping persistence entities to providers. |
The persistence framework is designed using the following entities:
- Persistence manager – Primary interface to the persistence framework for all client calls
- Persistence provider – Registered with the Persistence Manager and normally maps to one data store type – RDBMS, DFS etc.
- Persistence handler – Registered with the Persistence Provider and implements the persistence calls.
- Persistent entity – Any business entity that requires persistence
- Criteria – Meta data container for the persistence call. Includes custom queries for data loading
The most common usage of the persistence framework is via Spring dependency injection where Data sources, Persistence Providers, Handlers and their mappings to Persistent entities are configured. Code for invoking persistence operations are very straightforward as shown below:
// create the PersistentEntity and populate attributes
HBaseEarthling testEntity = new HBaseEarthling();
testEntity.setName("Jone Doe");
testEntity.setXXX(yyyy);
.........
// persist to the configured data store via the PersistenceManager
persistenceManager.makePersistent(testEntity);
Sample Spring bean XML configuration is as shown below:
<!-- Mapping between PersistenceProvider and the PersistentEntity-->
<bean id="hbasePersistenceManager" class="org.trpr.platform.core.impl.persistence.PersistenceManagerProvider">
<property name="providersForEntity">
<map>
<entry key="org.trpr.example.batch.hbase.test.entity.HBaseEarthling" value-ref="hbaseProvider" />
</map>
</property>
<property name="persistenceDelegate" ref="hbaseDelegate" />
</bean>
<!-- The PersistenceProvider bean declaration (Hbase in this case) -->
<bean id="hbaseProvider" class="org.trpr.dataaccess.hbase.persistence.HBaseProvider">
<property name="handler" ref="hbaseHandler" />
</bean>
<!-- The PersistenceDelegate that does most of the provider's work -->
<bean id="hbaseDelegate" class="org.trpr.platform.core.impl.persistence.PersistenceDelegate" />
<!-- The PersistenceHandler that has configuration and behavior specific to the data store (HBase in this case)-->
<bean id="hbaseHandler" class="org.trpr.dataaccess.hbase.persistence.HBaseHandler">
<!-- HBase configuration properties (org.apache.hadoop.conf.Configuration) -->
<property name="hbaseConfigProps">
<props>
<prop key="hbase.rootdir">hdfs://stage-pf4.nm.flipkart.com:8020/hbase</prop>
<prop key="hbase.master.port">60000</prop>
<prop key="hbase.zookeeper.quorum">stage-pf4.nm.flipkart.com:2181</prop>
<prop key="hbase.client.write.buffer">2097152</prop>
</props>
</property>
<!-- HBaseMappings information -->
<property name="hbaseMappings">
<list>
<value>hbase-earthling.hbase.xml</value>
</list>
</property>
</bean>
Complete configuration and usage in a sample Trooper batch job is available here:
Trooper/examples/example-hbase/src/main/resources/external/hbaseTestTaskletJob/spring-batch-config.xml
All Persistence Provider implementations in Trooper expose on-demand performance metrics logging for the persistence API methods. This is exposed via JMX where the elapsed time threshold can be specified and the handler logs metrics only for those calls that exceed the threshold. The JMX bean available with the above mentioned working example is:
spring.application--> Trooper--> Performance-Metrics--> HBaseMetrics-Hbase_Test_Runtime--> TestHBaseOps-HbaseHandler
Performance metrics logging is implemented using Perf4J(http://perf4j.codehaus.org/). The Trooper metrics when parsed from logs using Perf4J Log parser produces an output like this:
$ java -jar perf4j-0.9.15.jar hbase-perf.log
Performance Statistics 2012-11-06 10:52:30 - 2012-11-06 10:53:00
Tag Avg(ms) Min Max Std Dev Count
HBaseHandler.findEntity 1113.0 63 3213 1484.9 3
Performance Statistics 2012-11-06 11:52:00 - 2012-11-06 11:52:30
Tag Avg(ms) Min Max Std Dev Count
HBaseHandler.findEntity 731.3 37 2116 979.1 3