Skip to content

Persistence Sharding

regunathb edited this page Nov 7, 2012 · 8 revisions

Sharding is a useful design approach for accommodating large data sets in persistence stores. Support for sharding is not a commonly supported feature, especially with RDBMSs. The Trooper persistence framework supports implicit and application influenced sharding that extends equally across all Persistence Providers.

Hibernate Shards library was considered as an implementation choice but was soon discarded for two reasons - Heavy, Support limited to only RDBMSs.

Design Approach

Sharding support in Trooper is built on the following assumptions:

  • Data distribution is influenced by data contained in Persistent Entities i.e. domain objects. Each Sharded Persistent Entity is capable of returning a "shard hint" from the data it holds.
  • Persistence Providers have the logic to validate shard hints, demarcate transactions(if supported) and perform the persistence operation. The providers also manage results aggregation from shards, where required.
  • Data Sources (or) wrappers interpret the mapping between logical shards (identified by the shard hints) to physical data-store/nodes and provide relevant handle(say connection) for executing the persistence operation.

Using Sharding

Using sharding in Trooper is intended to be made easy, especially when you use it via the Persistence Framework In order to use sharding, an application will need to do the following:

  • Implement sharding API call-back methods and(or) extend/use existing classes
  • Configure data sources (or) wrappers that map to application shard hints

The following sections explain the above using examples:

Implementing sharding API call-back methods

Any PersistentEntity that requires sharding support needs to implement:

public interface ShardedEntity {
    // Return the shard hint derived from data contained in the domain object
    public String getShardHint();
}

A sample implementation of this interface:

public class ShardedEarthling extends PersistentEntity implements ShardedEntity {
    private Earthling earthing;
    public ShardedEarthling(Earthling earthling) {
        this.earthling = earthling;
    }
    public String getShardHint() {
        // return the string value of month from the Earthing's Date of Birth 
        // implying we can have a max of 12 logical shards
        return String.valueOf(this.earthling.getDateOfBirth().get(Calendar.MONTH));
    }
}

Configuring Data sources

The Spring beans configuration for data sources that map to logical shards is shown below:

<bean id="shardedDataSource" class="org.trpr.dataaccess.orm.ShardRoutingDataSource">  
    <property name="targetDataSources">
        <map>
            <!-- One entry for each possible shard hint, 12 in total i.e. one per month -->
            <entry key="00" value-ref="shardDataSource_00_proxy"/>
            <entry key="01" value-ref="shardDataSource_01_proxy"/>
            . . . . . . . 
            <entry key="11" value-ref="shardDataSource_11_proxy"/>
        </map>
    </property>
    <property name="defaultTargetDataSource" ref="shardDataSource_00_proxy"/>
</bean>
<bean id="parentDataSource" class="org com.mchange.v2.c3p0.ComboPooledDataSource" destroy-method="close">
    <property name="driverClass" value="com.mysql.jdbc.Driver" />
    <property name="user" value="dbuser" />
    <property name="password" value="secret" />
    ......... <!-- other data source params except "url"-->
</bean> 
<bean id="shardDataSource_00" parent="parentDataSource"><property name="url" value="jdbc:mysql://host0:3306/test"/></bean>
<bean id="shardDataSource_01" parent="parentDataSource"><property name="url" value="jdbc:mysql://host1:3306/test"/></bean>
. . . . . . . . . 
<bean id="shardDataSource_11" parent="parentDataSource"><property name="url" value="jdbc:mysql://host11:3306/test"/></bean>