- Design patterns: Repeatable best practice.
- Microsoft Patterns & Practices
- Created & maintained by patterns & practices team at Microsoft
- Available on GitHub (github.com/mspnp)
- Azure Architecture Center
- Makes it easier to design both current and future iterations of it.
- Existing modules can be extended, revised or replaced to iterate changes to full application.
- Modules can also be tested, distributed and otherwise verified in isolation.
- A stateful application saves data about each client session and uses that data the next time the client makes a request.
- A stateless app is an application program that does not save client data generated in one session for use in the next session with that client.
- Advantages of stateless applications
- Easy to scale horizontally
- Can be cached easily
- Less storage
- No need to bind the client to the server as in each request the client provides all its information necessary
- Problem
- Dependency on a storage mechanism is an overhead
- Requires the client to have access to the security credentials for the store
- After the client has a connection to the data store for direct access, the application can't act as the gatekeeper.
- It's no longer in control of the process and can't prevent subsequent uploads or downloads from the data store.
- Granular access: Lower performance, higher data transfer costs and the requirement to scale out.
- Solution
- The Valet Key pattern
- Mechanism where your application can validate the user's request without having to manage the actual download of the media.
- Still keeps the media private and locked away from anonymous access.
- E.g. SAS tokens in Azure Storage / EventHubs...
- Flow
- Client sends a request to the application to access a media asset
- Application then validates the request
- If the request is valid:
- Application goes to the external storage mechanism
- Generates a temporary access token
- Provides following to the client:
- the URI of the media asset to the client
- temporary access token.
- Result
- Up to the client application or browser to use the URI and temporary access token to download the media asset.
- Storage service is responsible for scaling up to manage all the requests for media
- Application focuses solely on other functionality.
- Segregate read & update operations to data using separate interfaces.
- Maximizes performance, scalability, and security.
- Supports the evolution of the system over time through higher flexibility
- Prevents update commands from causing merge conflicts at the domain level.
- Traditional CRUD designs
- Same entity, DTO (data transfer object) and DAL (data access layer) repository for read and write.
- Complex domain may introduce:
- Mismatch between the read and write representations of the data
- E.g. additional columns or properties that must be updated correctly even though they aren't required as part of an operation.
- Update conflicts caused by concurrent updates when optimistic locking is used
- Security and permissions more complex because each entity is subject to both read and write operations, which might expose data in the wrong context.
- Mismatch between the read and write representations of the data
- CQRS
- Data models used for querying and updates are different.
- The read model of a CQRS-based system provides materialized views of the data, typically as highly denormalized views, e.g. domain models.
- Separation of the read and write allows each to be scaled appropriately to match the load.
- E.g. reads encounter more load than writes.
- When the query/read model contains denormalized data (see Materialized View pattern), performance is maximized when reading data for each of the views in an application or when querying the data in the system.
- Data models used for querying and updates are different.
- A strategy to autoscaling is to allow applications to use resources only up to a limit, and then throttle them when this limit is reached.
- Enables the system to continue functioning and meet any SLAs.
- The system should monitor how it's using resources so that, when usage exceeds the threshold, it can throttle requests from one or more users.
- Errors that occur due to temporary interruptions in the service or due to excess latency.
- Many of these temporary issues are self-healing and can be resolved by exercising a retry policy.
- Handling transient errors: Big difference between on-premises and cloud applications.
- A break in the circuit must eventually be defined so that the retries are aborted if the error is determined to be of a serious nature and not just a temporary issue.
- Managing connections and implementing a retry policy
- Implemented in many .NET libraries such as Entity Framework and Azure SDK.
- If everyone is retrying due to a service failure, there could be so many requests queued up that the service gets flooded when it starts to recover.
- If the error is due to throttling and there's a window of time the service uses for throttling, continued retries could move that window out and cause the throttling to continue.
- At a certain retry threshold your app stops retrying and takes some other action, such as
- Custom fallback
- Fail silent (e.g. return null)
- Fail fast (e.g. exception with try again later message).
- When the initial connection fails, the failure reason is analyzed first to determine if the fault is transient or not.
- If the failure reason or the error code indicates that this request is unlikely to succeed even after multiple retries, then retries are not performed at all.
- Retries are performed until either the connection succeeds or a retry limit is reached.
- Implemented in many libraries such as Entity Framework and Enterprise Library.
- You can use a third-party queue to persist the requests beyond a temporary failure.
- Requests can also be audited independent of the primary application as they are stored in the queue mechanism.
- Use a queue that acts as a buffer between a task and a service it invokes in order to smooth intermittent heavy loads that can cause the service to fail or the task to time out.
- Help you design solutions that handle transient errors.
- Benefits
- Control costs
- The number of service instances deployed only have to be adequate to meet average load rather than the peak load.
- Maximize scalability
- Both the number of queues and the number of services can be varied to meet demand.
- Maximize availability
- Delays arising in services won't have an immediate and direct impact on the application, which can continue to post messages to the queue even when the service isn't available or isn't currently processing messages.
- Control costs
- Messaging supports asynchronous operations, enabling you to decouple a process that consumes a service from the process that implements the service.
- Asynchronous Messaging with Variable Quantities of Message Producers and Consumers
- Problem: handling variable quantities of requests
- Applications passes data through a messaging system to another service (a consumer service) that handles them asynchronously
- Business logic in the application is not blocked while the requests are being processed.
- In Azure: Storage Queues or Service Bus Queues
- Most message queues support three fundamental operations:
- A sender can post a message to the queue.
- A receiver can retrieve a message from the queue (the message is removed from the queue).
- A receiver can examine (or peek) the next available message in the queue (the message is not removed from the queue).
- Impractical to expect that cached data will always be completely consistent with the data in the data store.
- Consider a strategy that helps to ensure that the data in the cache is up to date as far as possible, but can also detect and handle situations that arise when the data in the cache has become stale.
- If the data is not in the cache, it is transparently retrieved from the data store and added to the cache. Any modifications to data held in the cache are automatically written back to the data store as well.
- Cache-aside strategy: This strategy effectively loads data into the cache on demand if it's not already available in the cache.
- Example implementation in Azure using Redis Cache
- When the web application loads for the first time it makes a request (GET) to the Redis Cache instance for the name of the featured player using the key: player:featured.
- The cache will return a nil value indicating that there is not a corresponding value stored in the cache for this key.
- The web application can then go to the Cosmos DB data store and gets the name of the featured player.
- The name of the featured player will be stored in the Redis Cache using a request (SET) and the key player:featured.
- The web application returns the name of the featured player and displays it on the homepage.
- Subsequent requests for the home page will use the cached version of the value as Redis Cache will successfully return a value for the name of the featured player.
- If an administrator updates the featured player, the web application will replace the value in both Redis Cache and Cosmos DB to maintain consistency.
- The application traffic or load is distributed among various endpoints by using algorithms.
- Flexibility to grow or shrink the number of instances in your application without changing the expected behavior.
- 📝 Load Balancing Strategy
- Decide whether you wish to use a physical or a virtual load balancer
- 💡 Deploy virtual load balancer (hosted in VMs) if a company requires a very specific load balancer configuration.
- Select a load balancing algorithm e.g. round robin or random choice.
- Round-robin: List is ordered on each request and sent to next instance.
- No standard for deciding which address will be used by the requesting application.
- E.g. geographically closer
- No standard for deciding which address will be used by the requesting application.
- Round-robin: List is ordered on each request and sent to next instance.
- (optional) Configure state (affinity or stickiness)
- Stickiness allows you determine whether a subsequent request from the same client machine should be routed to the same service instance
- Decide whether you wish to use a physical or a virtual load balancer
- Open source cache and message broker that you can deploy to support high availability.
- Key-value pair NoSQL storage.
- ❗ Azure Cache: Depreciated, use Redis Cache.
- SKUs
- Basic: Single node. Memory-sizes: 250 MB - 53 GB
- Standard: Two nodes in master/replica configuration, and SLA.
- Premium: On more powerful hardware, geo replication, VNet integration, and more.
- The physical separation of data for scale.
- Any data access operation will only occur on a smaller subset (volume) of data which in turn ensures that the operation will be more efficient when compared to a query over the entire superset of data for your application.
- Partitioning also spreads your data across multiple nodes which can individually be replicated or scaled.
- E.g. sharding for Azure SQL Database
- Manually create shards
- Use Elastic Scale feature
- Each partition is an instance of Azure SQL Database.
- If load to one shard gets high:
- Uses "Split" feature to create new shards from original
- If load to multiple shards get low
- Uses "Merge" feature to create new single shard from multiple shards.
- Problem: Hosting Large Volumes of Data in a Traditional Single-Instance Store
- Limitations:
- Finite storage space
- Finite computing resources:
- Finite network bandwidth
- Geography: Single region, hard with more.
- Scaling vertically is temporary solution that can postpone effects.
- Cloud application should be scale almost infinitely.
- Limitations:
- Solution: Partitioning Data Horizontally Across Many Nodes
- Divide the data store into horizontal partitions or shards.
- Each shard has the same schema, but holds its own distinct subset of the data.
- 💡 Abstract the physical location of the data in the sharding logic
- Provides a high level of control over which shards contain which data
- Enables data to migrate between shards without reworking the business logic of an application.
- The tradeoff
- The additional data access overhead required in determining the location of each data item as it is retrieved.
- To handle it, implement a sharding strategy with a shard key that supports the most commonly performed queries.