-
Notifications
You must be signed in to change notification settings - Fork 35
Persistence Sharding
Sharding is a useful design approach for accommodating large data sets in persistence stores. Support for sharding is not a commonly supported feature, especially with RDBMSs. The Trooper persistence framework supports implicit and application influenced sharding that extends equally across all Persistence Providers.
Hibernate Shards library was considered as an implementation choice but was soon discarded for two reasons - Heavy, Support limited to only RDBMSs.
Sharding support in Trooper is built on the following assumptions:
- Data distribution is influenced by data contained in Persistent Entities i.e. domain objects. Each Sharded Persistent Entity is capable of returning a "shard hint" from the data it holds.
- Persistence Providers have the logic to validate shard hints, demarcate transactions(if supported) and perform the persistence operation. The providers also manage results aggregation from shards, where required.
- Data Sources (or) wrappers interpret the mapping between logical shards (identified by the shard hints) to physical data-store/nodes and provide relevant handle(say connection) for executing the persistence operation.
Using sharding in Trooper is intended to be made easy, especially when you use it via the Persistence Framework In order to use sharding, an application will need to do the following:
- Implement sharding API call-back methods and(or) extend/use existing classes
- Configure data sources (or) wrappers that map to application shard hints
- Configure Persistence Providers and their mapping to data sources, persistent entities
The following sections explain the above using examples:
Any PersistentEntity that requires sharding support needs to implement:
public interface ShardedEntity {
// Return the shard hint derived from data contained in the domain object
public String getShardHint();
}
A sample implementation of this interface:
public class ShardedEarthling extends PersistentEntity implements ShardedEntity {
private Earthling earthing;
public ShardedEarthling(Earthling earthling) {
this.earthling = earthling;
}
public String getShardHint() {
// return the string value of month from the Earthing's Date of Birth
// implying we can have a max of 12 logical shards
return String.valueOf(this.earthling.getDateOfBirth().get(Calendar.MONTH));
}
}
The Spring beans configuration for data sources that map to logical shards is shown below:
<!-- Define the shard routing data source -->
<bean id="shardedDataSource" class="org.trpr.dataaccess.orm.ShardRoutingDataSource">
<property name="targetDataSources">
<map>
<!-- One entry for each possible shard hint, 12 in total i.e. one per month -->
<entry key="00" value-ref="shardDataSource_00_proxy"/>
<entry key="01" value-ref="shardDataSource_01_proxy"/>
. . . . . . .
<entry key="11" value-ref="shardDataSource_11_proxy"/>
</map>
</property>
<property name="defaultTargetDataSource" ref="shardDataSource_00_proxy"/>
</bean>
<!-- Define the parent data source, minus the URL param -->
<bean id="parentDataSource" class="org com.mchange.v2.c3p0.ComboPooledDataSource" destroy-method="close">
<property name="driverClass" value="com.mysql.jdbc.Driver" />
<property name="user" value="dbuser" />
<property name="password" value="secret" />
......... <!-- other data source params except "url"-->
</bean>
<!-- Define the various physical shard data sources -->
<bean id="shardDataSource_00" parent="parentDataSource"><property name="url" value="jdbc:mysql://host0:3306/test"/></bean>
<bean id="shardDataSource_01" parent="parentDataSource"><property name="url" value="jdbc:mysql://host1:3306/test"/></bean>
. . . . . . . . .
<bean id="shardDataSource_11" parent="parentDataSource"><property name="url" value="jdbc:mysql://host11:3306/test"/></bean>
<!-- Define transactional proxies around shard data sources -->
<bean id="shardDataSource_00_proxy" class=" org.springframework.jdbc.datasource.TransactionAwareDataSourceProxy">
<constructor-arg ref=”shardDataSource_00”/>
</bean>
<bean id="shardDataSource_01_proxy" class=" org.springframework.jdbc.datasource.TransactionAwareDataSourceProxy">
<constructor-arg ref=”shardDataSource_01”/>
</bean>
. . . . . . . . . . .
<bean id="shardDataSource_11_proxy" class=" org.springframework.jdbc.datasource.TransactionAwareDataSourceProxy">
<constructor-arg ref=”shardDataSource_11”/>
</bean>
This example uses the Trooper ORM (Object-Relational) Persistence Provider that is based on Hibernate. The Persistence Provider configuration that uses the above defined "shardedDataSource" bean is as follows:
<bean id="ormPersistenceManager" class="org.trpr.platform.core.impl.persistence.PersistenceManagerProvider">
<property name="providersForEntity">
<map>
<entry key="org.trpr.example...ShardedEarthling" value-ref="ormProvider" />
</map>
</property>
<property name="persistenceDelegate" ref="persistenceDelegate" />
</bean>
<!-- The PersistenceProvider bean declaration (RDBMS in this case) -->
<bean id="ormProvider" class="org.trpr.dataaccess.RDBMSProvider">
<property name="handler" ref="ormHandler" />
</bean>
<!-- The PersistenceDelegate that does most of the provider's work -->
<bean id="persistenceDelegate" class="org.trpr.platform.core.impl.persistence.PersistenceDelegate" />
<!-- The Hibernate ORM persistence handler -->
<bean id="ormHandler" class="org.trpr.dataaccess.orm.handler.HibernateHandler">
<property name="template" ref="hibernateTemplate" />
</bean>
<!-- The Hibernate template declaration -->
<bean id="hibernateTemplate" class="org.springframework.orm.hibernate3.HibernateTemplate">
<property name="sessionFactory" ref="sessionFactory"></property>
</bean>
<!-- The Hibernate session factory bean -->
<bean id="sessionFactory" class="org.springframework.orm.hibernate3.LocalSessionFactoryBean">
<property name="dataSource" ref="shardedDataSource" />
<property name="mappingResources">
<list>
<value>packaged/hibernate/sample/Earthling.hbm.xml</value>
</list>
</property>
...... <!-- Define other properties for the session factory -->
</bean>