Skip to content

Persistence Sharding

regunathb edited this page Nov 7, 2012 · 8 revisions

Sharding is a useful design approach for accommodating large data sets in persistence stores. Support for sharding is not a commonly supported feature, especially with RDBMSs. The Trooper persistence framework supports implicit and application influenced sharding that extends equally across all Persistence Providers.

Hibernate Shards library was considered as an implementation choice but was soon discarded for two reasons - Heavy, Support limited to only RDBMSs.

Design Approach

Sharding support in Trooper is built on the following assumptions:

  • Data distribution is influenced by data contained in Persistent Entities i.e. domain objects. Each Sharded Persistent Entity is capable of returning a "shard hint" from the data it holds.
  • Persistence Providers have the logic to validate shard hints, demarcate transactions(if supported) and perform the persistence operation. The providers also manage results aggregation from shards, where required.
  • Data Sources (or) wrappers interpret the mapping between logical shards (identified by the shard hints) to physical data-store/nodes and provide relevant handle(say connection) for executing the persistence operation.

Using Sharding

Using sharding in Trooper is intended to be made easy, especially when you use it via the Persistence Framework In order to use sharding, an application will need to do the following:

  • Implement sharding API call-back methods and(or) extend/use existing classes
  • Configure data sources (or) wrappers that map to application shard hints
  • Configure Persistence Providers and their mapping to data sources, persistent entities
  • Invoke the standard persistence API

The following sections explain the above using examples:

Implementing sharding API call-back methods

Any PersistentEntity that requires sharding support needs to implement:

public interface ShardedEntity {
    // Return the shard hint derived from data contained in the domain object
    public String getShardHint();
}

A sample implementation of this interface:

public class ShardedEarthling extends PersistentEntity implements ShardedEntity {
    private Earthling earthing;
    public ShardedEarthling(Earthling earthling) {
        this.earthling = earthling;
    }
    public String getShardHint() {
        // return the string value of month from the Earthing's Date of Birth 
        // implying we can have a max of 12 logical shards
        return String.valueOf(this.earthling.getDateOfBirth().get(Calendar.MONTH));
    }
}

Configuring Data sources

The Spring beans configuration for data sources that map to logical shards is shown below:

<!-- Define the shard routing data source -->
<bean id="shardedDataSource" class="org.trpr.dataaccess.orm.ShardRoutingDataSource">  
    <property name="targetDataSources">
        <map>
            <!-- One entry for each possible shard hint, 12 in total i.e. one per month -->
            <entry key="00" value-ref="shardDataSource_00_proxy"/>
            <entry key="01" value-ref="shardDataSource_01_proxy"/>
            . . . . . . . 
            <entry key="11" value-ref="shardDataSource_11_proxy"/>
        </map>
    </property>
    <property name="defaultTargetDataSource" ref="shardDataSource_00_proxy"/>
</bean>

<!-- Define the parent data source, minus the URL param -->
<bean id="parentDataSource" class="org com.mchange.v2.c3p0.ComboPooledDataSource" destroy-method="close">
    <property name="driverClass" value="com.mysql.jdbc.Driver" />
    <property name="user" value="dbuser" />
    <property name="password" value="secret" />
    ......... <!-- other data source params except "url"-->
</bean> 

<!-- Define the various physical shard data sources -->
<bean id="shardDataSource_00" parent="parentDataSource"><property name="url" value="jdbc:mysql://host0:3306/test"/></bean>
<bean id="shardDataSource_01" parent="parentDataSource"><property name="url" value="jdbc:mysql://host1:3306/test"/></bean>
. . . . . . . . . 
<bean id="shardDataSource_11" parent="parentDataSource"><property name="url" value="jdbc:mysql://host11:3306/test"/></bean>

<!-- Define transactional proxies around shard data sources -->
<bean id="shardDataSource_00_proxy" class=" org.springframework.jdbc.datasource.TransactionAwareDataSourceProxy">
    <constructor-arg ref=”shardDataSource_00”/>
</bean>
<bean id="shardDataSource_01_proxy" class=" org.springframework.jdbc.datasource.TransactionAwareDataSourceProxy">
    <constructor-arg ref=”shardDataSource_01”/>
</bean>
. . . . . . . . . . . 
<bean id="shardDataSource_11_proxy" class=" org.springframework.jdbc.datasource.TransactionAwareDataSourceProxy">
    <constructor-arg ref=”shardDataSource_11”/>
</bean>               

Configuring Persistence providers & mapping to sharded data source, persistent entities

This example uses the Trooper ORM (Object-Relational) Persistence Provider that is based on Hibernate. The Persistence Provider configuration that uses the above defined "shardedDataSource" bean is as follows:

<bean id="ormPersistenceManager" class="org.trpr.platform.core.impl.persistence.PersistenceManagerProvider">
    <property name="providersForEntity">
        <map>
            <entry key="org.trpr.example...ShardedEarthling" value-ref="ormProvider" />
        </map>
    </property>
    <property name="persistenceDelegate" ref="persistenceDelegate" />
</bean>
<!-- The PersistenceProvider bean declaration (RDBMS in this case) -->
<bean id="ormProvider" class="org.trpr.dataaccess.RDBMSProvider">
    <property name="handler" ref="ormHandler" />
</bean>
<!-- The PersistenceDelegate that does most of the provider's work -->
<bean id="persistenceDelegate" class="org.trpr.platform.core.impl.persistence.PersistenceDelegate" />
<!-- The Hibernate ORM persistence handler -->
<bean id="ormHandler" class="org.trpr.dataaccess.orm.handler.HibernateHandler">
    <property name="template" ref="hibernateTemplate" />
</bean>
<!-- The Hibernate template declaration -->
<bean id="hibernateTemplate" class="org.springframework.orm.hibernate3.HibernateTemplate">
    <property name="sessionFactory" ref="sessionFactory"></property>
</bean>	
<!-- The Hibernate session factory bean -->
<bean id="sessionFactory" class="org.springframework.orm.hibernate3.LocalSessionFactoryBean">
    <property name="dataSource" ref="shardedDataSource" />
    <property name="mappingResources">
        <list>
            <value>packaged/hibernate/sample/Earthling.hbm.xml</value>
	</list>
    </property>
    ...... <!-- Define other properties for the session factory -->
</bean>
	
<!-- TX related beans for the RDBMS -->    
<bean id="ormTransactionManager" class="org.springframework.orm.hibernate3.HibernateTransactionManager">
    <property name="sessionFactory" ref="sessionFactory"/>
</bean>
<bean id="ormTransactionInterceptor" class='org.springframework.transaction.interceptor.TransactionInterceptor'>
    <property name="transactionManager">
        <ref bean="transactionManager" />
    </property>
    <property name="transactionAttributeSource">
        <ref bean="transactionAttributeSource" />
    </property>
</bean>
<bean id="transactionAttributeSource"
        class="org.springframework.transaction.annotation.AnnotationTransactionAttributeSource"/>    
<!-- The auto proxy creator for transaction-enabling specified beans -->
<bean class="org.springframework.aop.framework.autoproxy.BeanNameAutoProxyCreator">
    <property name="beanNames">
        <list>
            <value>persistenceDelegate</value>
        </list>
    </property>
    <property name="interceptorNames">
        <list>
            <value>transactionInterceptor</value>
        </list>
    </property>
    <property name='proxyTargetClass' value="true" />
</bean> 

Invoking persistence API for storing data in shards

The Persistence API methods for storing either sharded or non-sharded entities are the same. Sample below:

// create the PersistentEntity and populate attributes
Earthling testEntity = new Earthling();
testEntity.setName("Jone Doe");
testEntity.setXXX(yyyy);
.........
// persist to the configured data store via the PersistenceManager. Notice use of the sharded ShardedEarthling type that was created above
persistenceManager.makePersistent(new ShardedEarthling(testEntity));

Maven artifacts

The following Maven artifacts are used in example configurations described earlier on this page. The RDBMS(ORM) provider is used for persisting data into RDBMS shards.

GroupID/Org ArtifactID/Name Description
org.trpr platform-model XML schema project that defines common data types/structures
org.trpr platform-core Core API for persistence and default implementations where relevant. Also includes features like sharding and mapping persistence entities to providers.
org.trpr dataaccess-orm RDBMS persistence provider built over Hibernate ORM library