Table of Contents: System Design in general, System Design In Practice, System design for Azure

Data presented here is gathered from different courses: Stanford University, Carnegie Mellon University, Different Books, Udemy-DesignGuru-Alex Xu-Oreilly courses, etc.

Table of Contents: System Design in general, System Design In Practice, System design for Azure

AWS

AWS: Solution Architecture in Practice

Azure

Azure: Solution Architecture in Practice
a. Azure: Logging & Monitoring
b. Azure: Service Principal, Managed Identity, and Service Connection
c. Azure: Data Materials
d. Azure: Design Relational Data Storage
e. Azure: Active Geo-replication
f. Azure: Networking
g. Azure: High Availability and Traffic Manager
h. Azure: Architecture examples by Microsoft
i. Azure Migration Strategy by Microsoft
g. Azure Additional Links to Different Materials
k. Azure: Articles which covers everything in Azure. Third-Party repo I cloned
l. API Management and VNet + APIM with Gateway for On-Premise
Azure B2B, B2C, Entitlement, Guest Users. App roles & delegated permissions
a. Azure: B2B & B2C, Azure AD DS Domain Services
b. Azure Entitlement
c. Azure Delegate Permissions (User to Azure App)
d. Azure Assign App roles to another App. Api Permissions
Azure: High Availability. Disaster Recovery for Azure Functions. Strategies
a. Azure Service Bus: Performance Improvements Best Practices
b. Azure App Service Disaster Recovery. Active-Active, Active-Passive, Active-Cold
Azure: Skylines Academies materials
Data & Data Storage Materials
a. Design for Data Storage
b. Azure SQL. Data Retention
c. Azure SQL DB Data Retention for more than 35 days. Tip
Traffic Manager
Azure Data Factory
Azure Proximity Placement Groups. Configuring Low-latency VMs in one DataCenter
Azure App Configuration. Feature Flags and Configurations

Enterprise Architecture. ToGAF

Architecture. ATAM. Principles

Architecture: How Much of Architecture is needed
a. Architecture lifecycle
b. Architecture patterns
c. Architecture style vs pattern
d. Architecture Influence Cycle
Architecture: Jeffrey Richter's Course materials
Architecture: Attribute-Driven Design
Architecture: Quality Attributes and Tactics to achieve them
a. Functional vs Non-Functional. Test Process
b. Tactics to Achieve Quality Attributes
Architecture: Architecture Style
a. Module Styles
b. Component & Connectors Styles
c. Allocation Styles
Architecture Views. Documenting Software Architecture. Properties to document in your Architecture Document
Documenting Architecture. General Structure. General Principles. Mapping Requirements
a. Documenting: Combining Views. Hybrid View
b. Documenting: Interfaces, Behavior, Context
c. View Packets. Alternatives
d. Example: Architecture Decision Records (ADR\MADR). How to Document your architecture decisions and their consequences
Views and Beyond. Alternatives: DoDAF, ISO42010 \ IEEE1471-2000
Reviewing Architecture Documentation
Architecture Evaluation

General Architecture Principles. Data. Caching. Messaging. Distributed Systems

Data & Data Storage Materials
a. Data Replication. Leader-Follower & Quorum patterns in SQL
b. Blob Storage vs File Share vs Managed Disks
Cache. Read-Through, Cache-Aside
a. Cache Consistency Models
b. Cache Challenges
c. Cache Replacement Policies
d. Cache Performance Metrics
Governance and Compliance materials
Kafka & Messaging patterns
a. Kafka. Basics. Consumer Group. Compression & Batching. Load Balancing
b. Messaging Patterns suitable for Kafka and for other services. Q&A
Darp. The Distributed Application Runtime
CAP Theorem. PACELC
Consistent Hashing for Data Replication and Data Partitioning

Architecture in Practice. System Design Interview Questions. Main Problems

Architecture in Practice. System Design Interview. Q&A. Main Problems
a. System Design Interview In practice
b. URL Shortener
c. Pastebin
CAP Theorem. PACELC
Consistent Hashing for Data Replication and Data Partitioning

System Design Interview. Low-Level System Design Interview. High-Level System Design Interview

Table of Contents

System Design Interview. Low-Level SDI, High-level SDI
Step-by-step guide by DesignGurus

How Much of Architecture is needed?

Architecture lifecycle

QAW (workshop) => ADD (Attribute-Driven Design) => V&B (Views and Beyond) => ATAM (Architecture Tradeoffs Analysis)

Architecture patterns

Architecture style vs pattern

Pattern consists of 3 parts - Problem, Context, and Solution.
Patterns tells more about context in which Solution appeared.
On the other hand style is about Solution, it well describes what exactly was selected, without detailed explanation "why". (Leonard Bass)
Style is higher level of abstraction, style can demonstrate elements and their relations
Pattern shows exact way of "how to achieve that using exact way". Details: https://www.geeksforgeeks.org/difference-between-architectural-style-architectural-patterns-and-design-patterns/

Architecture Influence Cycle

Architecture Materials. General Materials

Materials

DDD, Hexagonal, Onion, Clean, CQRS: https://herbertograca.com/2017/11/16/explicit-architecture-01-ddd-hexagonal-onion-clean-cqrs-how-i-put-it-all-together/
Anemic model (Fowler): https://martinfowler.com/bliki/AnemicDomainModel.html
Anemic model vs Full Domain Model (Fowler): https://martinfowler.com/bliki/AnemicDomainModel.html

Jeffrey Richter's Course. Slides

?aaS Cloud course from Jeffrey Richter
Jeffry Richter Presentation, Why cloud apps, Embracing Failures, Orchestrators, Virtualization.pptx
Jeffrey Richter Presentation, Regions and Microservices.pptx
Jeffrey Richter, Scaling, 12-Factor, Containers.pptx
Jeffrey Richter, Docker, Hyper-V containers, Containerd runtime, CI and CD.pptx
Jeffrey Richter, API Versioning, Troubleshooting, Steps towards Microservices(what need to take into account).pptx
Jeffrey Richter, Idempotency, Retry Policy, Exactly Once Strategy.pptx

8 Fallacies of Distributed Computing Explained

8_Fallacies_of_Distributed_Computing_Explained.pdf

4 Reasons to split monolith onto microservices. Richter's slide

twelve factor application (12 factor explained)

The twelve factor application (12-factor)

Forward & Reverse Proxies

When people talk about proxy they usually mean forward proxy.
Forward proxy - kind of proxy when you have a several services which reach one resource in the internet. And you may create a service which adds some information (headers) to such request and\or transform them somehow before forwarding (change the address of destination).
Good forward proxy example: Fiddler;

Good reverse proxy examples: Nginx, IIS, API Gateway, Load Balancer, etc.
You may need them for:

Manage calls coming to your services (Being a facade of your services)
Check whether coming calls which have basic authentication - are able to interact with the service
Load Balancing (on OSI level4 for TCP and UDP traffic and for OSI level7 for HTTP\HTTPS traffic)
SSL termination - request coming to reverse proxy (Nginx for example) may be HTTPS, but forwarded request will be http
Caching mechanisms
Throttling - to control the input, e.g. amount of requests per seconds
Billing - to control the amount of requests and to help making a bill for them
DDos mitigation
Retry Policy. It may automatically make a retry to another service instance behind it if the certain one is unreachable

Reverse proxy example. RP-I is Reverse proxy and it plays a load balancer role here.

NOTE: It's impossible to keep endpoints in sync as service instances come\go. Client code must be robust against this

Another example of reverse proxy with load balancer + healthcheck role

Carnegie Mellon Univercity

https://sites.google.com/site/softwarearchitectureinpractice/home

Attribute-Driven Design (ADD)

Attribute-Driven Design in details

Quality Attributes. Tactics to achieve Quality Attributes.

Functional vs Non-Func Attributes. Testing Process

Functional vs Non-Functional attributes. Non-Functional Testing Process. Testing Cycle

Quality attributes. How better bescribe them. Concrete examples.

Instead of just saying that performance is important for you - you need to create scenarios.
Each scenario should have 6 points starting from what's happening, from where this stimulus comes, in which circumstances, and until how to react

Scenario for Availability. First slide - all spectre what's covered. The second and third - what we exactly handle and what we expect

Tactics to achieve Quality Attributes

Tactics: Availability. 3 types: fault detection, fault recovery, fault prevention

as example: for Availability you need to care about
- Passive redundancy - copy your data\state to extra instances and make hot replacement when needed
- Condition Monitoring - control when failure occured in your instance and react on this event - restart instance, isolate, etc.

Example of Tactics. Detect Faults

Voting is used in airplanes where several processes make same calculations in parallel and majority wins.
- Usually system element as Monitor is looking at the output of each process. If one of processes shows wrong results - it's deemed to be faulty.

Recovery from Faults

Prevent Faults

Tactics: Performance

Control Resource Demand

One way to increase performance is to carefully manage the demand for resources. This can be done by reducing the number of events processed or by limiting the rate at which the system responds to events. In addition, a number of techniques can be applied to ensure that the resources that you do have are applied judiciously

Manage work requests.
One way to reduce work is to reduce the number of requests coming into the system to do work. Ways to do that include the following:
Manage event arrival.
A common way to manage event arrivals from an external system is to put in place a service level agreement (SLA) that specifies the maximum event arrival rate that you are willing to support
Manage sampling rate.
In cases where the system cannot maintain adequate response levels, you can reduce the sampling frequency of the stimuli—for example, the rate at which data is received from a sensor or the number of video frames per second that you process. Of course, the price paid here is the fidelity of the video stream or the information you gather from the sensor data. Nevertheless, this is a viable strategy if the result is “good enough.”
Limit event response.
When discrete events arrive at the system (or component) too rapidly to be processed, then the events must be queued until they can be processed, or they are simply discarded. You may choose to process events only up to a set maximum rate, thereby ensuring predictable processing for the events that are actually processed.

Reduce computational overhead

Reduce indirection.
The use of intermediaries (so important for modifiability, as we saw in Chapter 8) increases the computational overhead in processing an event stream, so removing them improves latency. This is a classic modifiability/performance tradeoff. Separation of concerns—another linchpin of modifiability—can also increase the processing overhead necessary to service an event if it leads to an event being serviced by a chain of components rather than a single component
Co-locate communicating resources.
Context switching and intercomponent communication costs add up, especially when the components are on different nodes on a network. One strategy for reducing computational overhead is to co-locate resources.z
Periodic cleaning.
A special case when reducing computational overhead is to perform a periodic cleanup of resources that have become inefficient. For example, hash tables and virtual memory maps may require recalculation and reinitialization.
Increase efficiency of resource usage.
Improving the efficiency of algorithms used in critical areas can decrease latency and improve throughput and resource consumption.

Manage Resources

Even if the demand for resources is not controllable, the management of these resources can be. Sometimes one resource can be traded for another.

Increase resources. Faster processors, additional processors, additional memory
Introduce concurrency. If requests can be processed in parallel, the blocked time can be reduced
Maintain multiple copies of computations. This tactic reduces the contention that would occur if all requests for service were allocated to a single instance.
Maintain multiple copies of data. Two common examples of maintaining multiple copies of data are data replication and caching.
Bound queue sizes. This tactic controls the maximum number of queued arrivals and consequently the resources used to process the arrivals.
Schedule resources. Whenever contention for a resource occurs, the resource must be scheduled. Processors are scheduled, buffers are scheduled, and networks are scheduled

Patterns

Service Mesh
The service mesh pattern is used in microservice architectures. The main feature of the mesh is a sidecar—a kind of proxy that accompanies each microservice, and which provides broadly useful capabilities to address application-independent concerns such as interservice communications, monitoring, and security. A sidecar executes alongside each microservice and handles all interservice communication and coordination.
Load Balancer
A load balancer is a kind of intermediary that handles messages originating from some set of clients and determines which instance of a service should respond to those messages. The key to this pattern is that the load balancer serves as a single point of contact for incoming messages—for example, a single IP address
Throttling
The throttling pattern is a packaging of the manage work requests tactic. It is used to limit access to some important resource or service. In this pattern, there is typically an intermediary—a throttler—that monitors (requests to) the service and determines whether an incoming request can be serviced.
Map-Reduce The map-reduce pattern efficiently performs a distributed and parallel sort of a large data set and provides a simple means for the programmer to specify the analysis to be done. Unlike our other patterns for performance, which are independent of any application, the map-reduce pattern is specifically designed to bring high performance to a specific kind of recurring problem: sort and analyze a large data set. This problem is experienced by any organization dealing with massive data

Tactics: Deployability. Patterns

Sample of Deployability scenario
Tactics

Manage Deployment Pipeline

Scale rollouts.

Rather than deploying to the entire user base, scaled rollouts deploy a new version of a service gradually, to controlled subsets of the user population, often with no explicit notification to those users. (The remainder of the user base continues to use the previous version of the service.) By gradually releasing, the effects of new deployments can be monitored and measured and, if necessary, rolled back. This tactic minimizes the potential negative impact of deploying a flawed service. It requires an architectural mechanism (not part of the service being deployed) to route a request from a user to either the new or old service, depending on that user’s identity.

Roll back.

If it is discovered that a deployment has defects or does not meet user expectations, then it can be “rolled back” to its prior state. Since deployments may involve multiple coordinated updates of multiple services and their data, the rollback mechanism must be able to keep track of all of these, or must be able to reverse the consequences of any update made by a deployment, ideally in a fully automated fashion.

Script deployment commands.

Deployments are often complex and require many steps to be carried out and orchestrated precisely. For this reason, deployment is often scripted. These deployment scripts should be treated like code—documented, reviewed, tested, and version controlled. A scripting engine executes the deployment script automatically, saving time and minimizing opportunities for human error.

Manage Deployed System

Manage service interactions.

This tactic accommodates simultaneous deployment and execution of multiple versions of system services. Multiple requests from a client could be directed to either version in any sequence. Having multiple versions of the same service in operation, however, may introduce version incompatibilities. In such cases, the interactions between services need to be mediated so that version incompatibilities are proactively avoided. This tactic is a resource management strategy, obviating the need to completely replicate the resources so as to separately deploy the old and new versions.

Package dependencies.

This tactic packages an element together with its dependencies so that they get deployed together and so that the versions of the dependencies are consistent as the element moves from development into production. The dependencies may include libraries, OS versions, and utility containers (e.g., sidecar, service mesh), which we will discuss in Chapter 9. Three means of packaging dependencies are using containers, pods, or virtual machines; these are discussed in more detail in Chapter 16.

Feature toggle.

Even when your code is fully tested, you might encounter issues after deploying new features. For that reason, it is convenient to be able to integrate a “kill switch” (or feature toggle) for new features. The kill switch automatically disables a feature in your system at runtime, without forcing you to initiate a new deployment. This provides the ability to control deployed features without the cost and risk of actually redeploying services.

Patterns for Deployability

### Code organization point of view

Benefits
Tradeoffs

Deployment changes point of view

Benefits and Tradeoffs

Quality Attribute Utility Tree. Example

Architecture Styles and their Documentation. Module Styles, C&C Styles, Allocation Styles.

Module Styles. General Information

Module Styles: Decomposition, Uses, Generalization, Layered, Data Model, Aspects styles

Module styles are closer to code. Used for:

code construction,
analysis (impact of changes, planning, or for budgeting concerns).
education (onboarding new team-members).
Build vs Buy. You can use Decomposition in order to understand what is better for you - build on your own or buy this product from 3rd party vendor.

Module Styles:

Decomposition Style
Uses Style
Generalization Style
Layered Style
Data Model Style
Aspects Style

Notations and Usage. Informal or semi-formal notation: Box and Lines or UML

Module Styles in details.

Module Styles: Decomposition Style. Decomposition Refinement

Decomposition Style. Notations

Decomposition Refinement

Need to understand if inner structure is needed

Decomposition refinement in UML

Module Styles: Uses Style

Useful for planning an incremental development when you have several dependencies and you need to know where all of them will be available for you;
Useful in debugging and testing because you can stub and mock your dependencies, or you want to isolate from where the issue comes;
Useful to validate dependencies and avoid circular dependencies which not letting you make incremental deployment and delivery
Tracing changes when you want to guarantee that other dependencies will not suffer

Example:

Module Styles: Generalization Style

Useful for code analyzing
Useful for Architecture Representation Creation.
Useful for Application Skeleton creation

Example:

Module Styles: Layered Style

Useful for Modificability
Concentric diagram could not be equivalent to stack diagram because there is ambiguity and it's not clear if B1-B2-B3 can use each other (especially B1-B3 because they touch each other).

Informal notation in Layered style

Module Styles: Aspects Style (Also known as Multi-dimentional separation of concerns)

Style to depict relations between classes or set of classes (their aspect modules)
Could be used to understand the strategy for error handling. When Set of modules want to use one protocol (handling policy) for handling error if it occurs.
Could be useful for application decomposition
Could be useful to understand the scalability of solution

Example

Colored bar represents the aspect in module
Could be used only when code is already exist
To understand the scalability the best strategy is to create UML Class diagram
For very large apps you need to use non-graphical diagram because in very large apps with dozens of classes and their aspects your UML diagram will be a hell

Good example of diagram which uses UML without crosscut arrows

Instead of crosscut arrows all aspects are bundled into the package
Annotation is attached to each aspect

Module Styles: Data Model Style. ERD & UML Notation

Useful to depict Data Entities and relations between them
Useful for impact analysis
Helps to implement modules which have access to the data
Useful in Domain Analysis
Useful for performance optimizations

Example

In ERD Notation
The same diagram in UML Notation

Data Model Evolution over time

Components and Connectors Styles

Component and Connector Styles: Pipe and Filter, Client-Server, Service-Oriented (SOA), Publish-Subscribe, Shared-Data styles

Used for:

Performance, Security, Availability (Runtime attributes) analysis
Education
Construction. C&C Style could describe behavior that elements must demonstrate when they work together
Could help to describe how very specific part of the system works

Components and Connectors style:

Pipe-and-filter Style
Client-Server Style
Service-Oriented Style
Publish-Subscribe Style
Shared-Data Style
Repository Style
Others

DataFlow, Call-Return, Event-Based, Repository are Sub-families of C&C Styles

** Rules of C&C Style:**

In C&C style Component could be a runtime of interaction or Data Store. Component has ports to the outside world;
Connector connects components. Connector has roles which could be called;
Ports and Roles are just special interfaces of Components and Connectors respectively;
The Only one relation between elements - Attachment; It describes Attachment between Components and Connectors
In Architecture Connector is not just a procedure call, but could be very sophisticated computation;
C&C Diagram could have Quality attributes which help with analysis;
For different Quality Attributes could be useful different C&C Styles

C&C Style: Pipe and Filter. Pipes and Filters in UML. Yahoo! Pipes

Good for cases when data is transformed serially
Series of Filters or Series of Pipes one after another is prohibited by Style
Good for Functional composition Data Analysis (what the output could be knowing the function and input)

In UML

Dependency arrow is not recommended to use, but it's possible. Better to use straight line as connector

Yahoo! pipes.

Introduced by Yahoo.
7 Filters and Blue Pipes from top to bottom. On the right there are support modules for mainline "components"(in Yahoo terminology).

C&C Style: Client-Server

Useful for performance, availability, dependability, security analysis
Useful when you need to depict amount of clients at the same time
Useful when you need to create a strategy "what to do if one server becomes compromised"

Example

C&C Style: Service-Oriented (SOA). Web Services

Useful for property analysis which could be associated with service or with service client;
Services could be not discoverable and not dynamically bound
ESB - Enterprise Service Bus, special component which takes routing of messages

Use cases: - services made by different languages, for different platforms, or by different teams\organizations - services which have different styles - helps with integration of external components - for repackaging legacy systems. you can rehabilitate pieces of legacy system one by one

Web Services

Web Services is not a synonym of SOA: SOA is architecture style but Web Services is one of many technologies you may choose to implement SOA
Micro-service Architecture, WebServices, Micro-Macro Service Architecture, COBRA - are part of SOA Style

SOA In practice

C&C Style: Publish-Subscribe

Pure event-based style
Useful if you dont know all of your subscribers or you dont know their amount.
Useful for sending information for unknown recipients
Event distributor can be depicted as a bus or a component

Constraints

Need to understand which components can listen to which events. Some events could be not public
Components can listen their own events (sometimes answer yes, sometimes no)

Example

this diagram shows more than just publish-subscribe and it's okay

C&C Style: Shared-Data

If you see big fat database in the middle of the diagram - it's definitely Shared-Data Style
Useful for enterprise management system

Example

C&C Style: Combination of styles

same diagram depicted in another style, it loses stylistic richness
doesnt let you draw your conclusion in any certainty

C&C Style: Crosscutting issues in different C&C Styles. Grouping in Tiers. Tiers vs Layers. Multi-tier Notation

Tiers vs Layers

Layers are used more in Model View. Tiers in C&C view.
They can be mapped one-to-one, but it's also possible to map several layers to one tier
Tiers are used to check cohesion of components in C&C view
Tiers also could be transformed into Packages in UML Notation
Cross-communication processes issues (when different processes share resources)

Tiers. Commonly in Client-Server style

Useful to declare connections between tiers (only allowed to communicate with these exact tiers and not with others)
Tiers could be pass-through, but should be exclicitly declared
Grouping components in tiers, for example Tier execution
Client Tier also could be described what clients it has: thin or fat: Thin clients are generally embedded within a web browser. Fat must be installed on the client's machine.
Tiers could be depicted as packages in UML notation
We can declare how tiers are communicating

Multi-tier notation

Tiers could be depicted as packages in UML notation

Dynamic Creation and Destruction

Could be done in State-Machine diagram

Allocation Styles

Allocation Styles: Deployment, Install, Work Assignment styles

Allocation Style:

Deployment Style
Install Style
Work Assignment Style

Allocation Styles: Deployment

Puts Software to Hardware mapping
Good for performance, availability, durability, disaster recovery analysis

Examples

Boxes depict hardware boxes or processors

Allocation Styles: Install

Useful for creation build-deploy procedures

Example

Same in UML with demonstration internal components
EAR - Enterprice Archive. JAR - Java Archive files

Allocation Styles: Work Assignment. "Who will do the job". Specializations: Platform-style, Competence-center, Open-Source

Example

Specialization. Platform-style vs Competence-center vs Open-Source

Architecture Views. Documenting Software Architecture. Properties to document in your Architecture Document

General. Important Concepts of documentation. View-Based Documentation. Types of Views

General. Select Properties to document. Examples in terms of Quality Attribute

Example:

Performance Attribute: you need to document best and worst response time properties. Or maximum number of event that element can service per time unit (per secord or per minute).
Security: perhaps you need to document the level of encryption and authorization rules for different elements and relations.

General. Apply Views. Information on Views. Structural vs Behavioral. Traced-Oriented vs Comprehensive Language

Documenting Behavior. 99% of time in practice you work with traced-oriented language

View Type: Model View

View Type: (C&C) Component And Connector View

Shortly: how your code maps on resources. Where it executes. What major parameters it has:

Processor cores vs processes
Sockets
REST APIs (dependency relationships between information carriers)
Where your code is executed: Client Machine vs Server, PC or Mobile.

** Attachments:**

Output from one port to another Input port

** Quality of service information**

Amount of requests per hour
Latency

For what?

View Type: Allocation View: Deployment, Implementation views. Their Refinements

Details: https://sites.google.com/site/softwarearchitectureinpractice/9-documenting-software-architecture/d-allocation-views

Shortly: About requirements where and how application runs

Memory requirements
Processor requirements
Execution in processes and threads

Allocation View consist of:

Deployment view (how and where you deploy your app)
Implementation View

Implementation Refinement

On the right we reveal that our connector is event dispatcher
It might help to understand how teams need to change their interfaces to interact with dispatcher appropriately
New questions to architect may appear and potentially you need to put your attention to how exactly event bus will work and what limitations now you have
diagram on the left could be interested to people who are not so interested in tech details. The right - people who are more concerned about design

Heterogenerity Architecture Style & View: Demonstrate a mix of approaches on one diagram. Examples

Notations for Architecture View. Model View. Uses View. Generalization View

In General

Notations for Model View. Decomposition View

Notations for Uses View

Notation for Generalization View

Notation for Layered View. Usually used for portability and usability

What View You need to Document. General pieces of advice

Check who is your stakeholders
What diagrams each of them need to understand and sell your product
Consolidate views (if their number is too high)
Rationale of your design decisions
Functional, Non-functional attributes and constraints of your system
Legend on all diagrams

Put the legent to each diagram:

Documenting Architecture. General Structure. General Principles. Mapping Requirements

General Document Structure. General Principles

General Structure

Principles

Mapping requirements to design Decisions. Microsoft template

Where to record such mapping

Allocation view could be used to depict mapping between requirements and design decisions.

if could be a part of software design explanation

Summary. Microsoft Reference Template

Combine Views. Hydrid View

What Views to choose. 3 step principle: Build Stakeholders table, Combine Views (rule of thumb), Prioritize documentation. Part 1

Stakeholders table

PS: It's also important to include at least one Model, one C&C diagrams to your documentation.
Build Stakeholders table
Dont accept all their wishes, you need to document architectures which help them in their work, not "just nice to have"

Which View to apply?

Example using real system

D - detailed, S - some information, O - Overview

Combine Views. use Rule of Thumb

Example. View set

Instead of 12 diagrams we selected 8 and decide to make combine view for 2. It means we need 7 diagrams
It still could make our documentation redundant, so we need to prioritize them and combine-simplify them if they are not super important

Prioritize and Stage documentation

Hints
Go with high overview (not detailed architecture). It lets developers start their job and will help forming budgets
High overview helps to begin the analysis

Merging different views together

Having 7 views after prioritization we receive 3 primary and 4 minor views.

Summary

Combine Views. Hybrid Views. Overlay Part 2

Overlay approach. Examples

Case. Example

we had 2 small diagrams and decided to merge them together
such overlaying might recognize the component which is not well defined and understand how we gonna built it

Tips and Tricks

in case of aspects and generalization: aspects are special kind of classes and in generalization view they overlay very easily

Summary

Documenting Interfaces, Behavior, Context

Documenting: Interfaces

General idea

Examples

UML Example with explanation

Documenting: Behavior. Dynamic properties vs static properties. time to response (TTR), Throughput

Behavior documentation is needed to declare dynamic properties of built system: time to response, throughput, etc.
It supports system analysis as it executes.
Answers on "in what order components interact".
Depicts transition from State to State to State
What the system status under certain circumstances
How the system startup
Can guarantee that the system works as indended under variety of conditions

Documenting: Behavior. When and why. Trace-oriented vs Comprehensive language

useful in documenting interfaces and templates
Both Comprehensive and Trace-oriented may represent the whole system and any part of sub-systems

Documenting: Behavior. Trace-oriented Language & Comprehensive Language. BPMN Notation. Diagrams: Collaboration, Sequence, Activity

Trace-oriented Language

Trace-oriented language answers on the question "what happens when particular stimulus arrives or in specific state"
It does not help us to capture all possible behaviors unless you are collecting them

Comprehensive Language

Shows complete behavior of that system
usually Statemachine or StateChart diagrams

Activity diagram

Activity diagram example

BPMN

BPMN Example (Same diagram as Activity diagram, but with another focus)

Sequence (rely on Trace-oriented Language)

Sequence diagram shows only one trace, not all possible traces

Sequence diagram example

Sequence diagram in UML

State Machine diagram

Not good for showing interactions among components becaue states are not software modules, but state of subsystem or system
Tend to be well understood by developers and programmers, but not managers without giving extra explanation

State Machine example

It's possible that two different transactions could become eligible
State machine diagram could depict parallel processes
One state machine could depend on another state machine

Documenting: Context Diagrams. Their Notations. System, environment, Relations. in C&C and Layered view.

Example and explanation

General idea

apply Context diagram only to one view in order to not repeat yourself each time
you may show on context diagram the system itself and its runtime interactions with environment
Runtime suggests C&C View for that

Context in layered View

In other views

Notations. Boxes and Lines, UML.

Documenting: Decisions. Capturing Complex Architecture Decisions. 12 steps-parts document

General rule. When and What

If decision took 5 minutes - probably it's not worth to document.
If 5 days - so, yes.
Document the future steps especially if you have concerns regarding decisions you have made. Good start point in the next communication with your manager.

Complex Architecture Decisions

Consists of 12 steps. All are on slides
in section 9 you may declare what temporarly decisions you have made. It's useful because you can declare unfortunate cases when you need to move back and reconsider because you hit the dead end.
Under affected Artifacts you can but what's affected - budgets and\or schedules.

Documenting: Evaluation of Alternatives and Objectives

View Packets. Alternatives. How to build and document

Documenting: View Packets

View packets is the way to split complex diagrams on parts.
Each view has parent, child, and\or siblings.
In case you have view packets the overall document will lot a bit another
View packets could also be very useful in ADD because inside View packet you can solve specific attribute questions.
Create view and assign it particular responsibilities
It also useful to save chronological predecessors

Examples when view packets could be helpful

Alternatives

Make huge unwieldy diagram but use the tool to present and use it with zoon-in, zoom-out, and fly-through abilities
Series of diagrams

Summary

Views and Beyond. Alternatives: ISO 42010 (ISO\IEC 42010), DoDAF, Documentation in Agile

Views and Beyond. Template for Beyong view and rationale

General structure

Mapping

Background

ISO\IEC 42010 (also known as IEEE 1471-2000). ISO42010 vs "Views and Beyond".Alternative to "Views and Beyond"

ISO\IEC 42010. Also known as IEEE 1471-2000. Enterprise standard.

Structure

42010 Also defines view as representation of the whole system from perspective of related concerns

![image](https://user-images.githubusercontent.com/4239376/224509958-7a6c3bf4-936d-496d-9ad0-ca90d831ae67.png)

42010. Content Requirements

ISO 42010 vs Views and Beyond. Differences and Similarities

ISO42010 vs Beyong Views

42010 requires explicit identification of stakeholders and their concerns
42010 requires that you define the viewpoints. In Views and Beyond it's close to "architecture styles"

DoDAF. Alternatives to "Views and Beyond"

Summary. NOT A GREAT CHOICE.

General Structure

DoDAF Pros

DoDAF Cons

Operational View. All-view

Documentation in Agile. Alternatives to "Views and Beyond"

Summary

ISO42010 works well in Agile environment, so better to go straight with this format
Views and Beyond works well in Agile as well. Views and Beyond could be made compliant to ISO42010 in Agile environment as well

General

Points

BDUF and SAFe

Reviewing Architecture Documentation. 6 Steps. Active Reviewing

6 steps of Documentation Review Process

Overall

Avoid the type of review process when you gather people two weeks before deadline.

Step 1. Identify Stakeholders

Step 2. Identify Artifacts needed to be on hand for the review

Also it's important to understand what else stakeholders need along with architecture diagrams

Step 3. Build question set

Worst scenario if you ask reviewers "let us know what you think!"

Step 4. Plan the Review. Manage date, participants, assign questions

Options: group workshop, all in one room or separate meetings in parallel. One-one interviews are also applicable

New option: Active design review

Avoid the type of review process when you gather people two weeks before deadline.

Step 5. Perform the Review session

Gather questions and answers
Gather strenght and weaknesses of your document

Step 6. Summarize results

Aggregate questions from reviewers
Find problems in document. It's not simple "Pass\Fail", it's about finding weaknesses in documentaion

Architecture Evaluation. Measure and Output. ATAM

Architecture Evaluation. Approaches and Techniques. Evaluation Output

Measuring Techniques:

Evaluation Output:

ATAM. Risk identification method

ATAM Conceptual flow

The point of ATAM is only to find risks, not to mitigate them You can do that through elisit the right questions to architects, senior designers and key developers So, its risk identification method, not risk resolution method We do not provide precise analysis

When to use:

After creating architecture, but not so much code is in place;
Check existing system architecture, evaluate it;
Decide whether we will build this system or buy it from 3rd party vendor;

ATAM Phases

Phase 0: gather a small group of architects and evaluators and discuss what you are going to evaluate, what you have, etc.

Phase 1: explain what ATAM is about, what the process, who needs to be there, expectations

Business goals from Project Owners;

Architect presents an architecture;

Utility Tree; L,M,H - how important in terms of business (1st value, Highly important, Low importance) and how risky (2nd, High risks, Low risks).

H,H scenarios - our main business scenarios and drivers we must focus on

Scenarios:

Risks and Tradeoffs. Non-risks:

Non-risks may become risks if situation changed

Phase 2. More like QAW, workshop, scenario brainstorming. we do more architecture analysis, gather more scenarios from broader group and map on architecture approaches:

Phase 3. Formal meeting\presentation about findings

Architecture Rule of Thumb. Rules to follow in architecture

https://www.linkedin.com/pulse/architecture-10-rules-thumb-matthew-golzari/
https://medium.com/@i.gorton/six-rules-of-thumb-for-scaling-software-architectures-a831960414f9

Carnegie Mellon Univercity Slides. Solution Architecture Principles in Practice

Solution Architecture Principles in Practice.pdf
Solution Architecture Principles in Practice,_Student_Workbook_2020.pdf

SEI Exams

SEI Software Architecture Professional Exam: https://www.sei.cmu.edu/education-outreach/courses/course.cfm?courseCode=V19 Service-based Architecture Professional Cert: https://www.sei.cmu.edu/education-outreach/credentials/credential.cfm?customel_datapageid_14047=15189

Architecture Materials. AZ-304 SA Azure Architecture Design Exam materials

Powershell intro

powershell materials coming from Skylines Academy

SLA, SLO, SLI

Difference: https://www.atlassian.com/incident-management/kpis/sla-vs-slo-vs-sli * SLI Explained in details: http://cs.brown.edu/courses/csci2952-f/slides/Class9.pdf SLI Of The Platform - Critical Replica Threshold = CRT - Available replicas = min(total available pods, CRT) - Replica availability = (available replicas / CRT) * 100 - Critical Replica Availability = mean(replica availability of each service)

AZ-300 AZ-301 AZ-304 AZ-305 Exam tips

AZ+301 SKYLINES ACADEMY Slides_Student_Version.pdf

SKYLINES ACADEMY MATERIALS:

SECTION-1 Workload requirements
SECTION-2 Identity and Security
SECTION-3 Data Platform Solutions
SECTION-4 Business Continuity
SECTION-5 Deployment, Migration, Integration
SECTION-6 Infrastructure Strategy

Logging. Monitoring. Design solutions for them

Insights and everything related to that.

All information related to logging and monitoring are grouped here: Logging and Monitoring Information

Cost Optimizations materials

Security materials

Design Authorization & Authentication, Azure AD, Key Vault, Managed Identities, SAS

Azure AD, Azure AD Connect, Azure AD Connect sync, Azure AD B2B, Azure AD B2C, Azure AD Conditional Policies
Managed Identities, User-defined, System-defined
Azure Key Vault, when to use
SAS Token, when to use

All information related to Authorization and Authentication is here: Authentication, Authorization, Azure AD and features

Managed Identities, Authentication Vault

Design Security

RBAC:

Policy:

Consider how Azure policy is different from role-based access control (RBAC).
It’s important not to confuse Azure Policy and Azure RBAC. Azure RBAC and Azure Policy should be used together to achieve full scope control.

You use Azure Policy to ensure that the resource state is compliant to your organization’s business rules. Compliance doesn’t depend on who made the change or who has permission to make changes. Azure Policy will evaluate the state of a resource, and act to ensure the resource stays compliant.
You use Azure RBAC to focus on user actions at different scopes. Azure RBAC manages who has access to Azure resources, what they can do with those resources, and what areas they can access. If actions need to be controlled, then use Azure RBAC. If an individual has access to complete an action, but the result is a non-compliant resource, Azure Policy still blocks the action.

Area	Azure Policy	Role-based Access Control
Description	Ensure resources are compliant with a set of rules.	Authorization system to provide fine-grained access controls.
Focus	Focused on the properties of resources.	Focused on what resources the users can access.
Implementation	Specify a set of rules.	Assign roles and scopes.
Default access	By default, rules are set to allow.	By default, all access is denied.

Trust Center, Compliance Manager, Data Protection, Azure Security and Compliance, Blueprints

Microsoft Trust Center

In-depth information access tp FedRAMP, ISO, SOC audit reports, data protection white papers and different assessment reports
Centralized resources around security, compliance and privacy

Compliance Manager

Manage compliance from a central location
Proactive risk assessment
Insights and recommended actions
Prepare compliance reports for audit

Data Protection Resources

Trust documents, GDPR, Compliance guides, Pen test and Security Assessment tests

Azure Security and Compliance, Blueprints

Industry-specific overview and guidance
Customer responsibilities matrix
References architectures with threat models

Cache

Cache. Patterns. Challenges. Info

Cache consistency models

Cache Challenges

Cache. Challenges

Cache Replacement Policies

Cache. Cache Replacement Policies. Performance Metrics

Cache Replacement Policies

Cache Performance Metrics

Governance and Compliance materials

Design Governance

Lab. Microsoft Docs. Azure Blueprints. Additional Materials

Lab:
https://docs.microsoft.com/en-us/learn/paths/design-identity-governance-monitor-solutions/

Microsoft docs:
https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/govern/guides/
https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/govern/guides/standard/

Azure blueprints

With Azure Blueprints, the relationship between the blueprint definition (what should be deployed) and the blueprint assignment (what was deployed) is preserved.

In other words, Azure creates a record that associates a resource with the blueprint that defines it. This connection helps you track and audit your deployments. Azure Blueprints orchestrates the deployment of various resource templates and other artifacts.

How are Azure Blueprints different from Azure Policy

A policy is a default allow and explicit deny system focused on resource properties during deployment and for already existing resources. It supports cloud governance by validating those resources within a subscription adhere to requirements and standards.
A policy can be included as one of many artifacts in a blueprint definition. Including a policy in a blueprint enables the creation of the right pattern or design during assignment of the blueprint. The policy inclusion makes sure that only approved or expected changes can be made to the environment to protect ongoing compliance to the intent of the blueprint.

Additional Materials

Build a cloud governance strategy on Azure
Describe core Azure architectural components
Microsoft Cloud Adoption Framework for Azure
Intro to Azure blueprints

Data materials

Design for Data Storage

Azure SQL Server vs Azure SQL managed instance. Difference

* Azure SQL Server vs Azure SQL Managed Instance, difference: [CHECK THIS LINK](https://medium.com/awesome-azure/azure-difference-between-azure-sql-database-and-azure-sql-managed-instance-sql-mi-2e61e4485a65)

AKS. Persistent Volumes - Types of replication

In Case of AKS:

Infrastructure-based asynchronous replication

Your apps might require persistent storage. In Kubernetes, you can use persistent volumes to persist data storage. These persistent volumes are mounted to a node VM and then exposed to the pods. Typically, you provide a common storage point where apps write their data. This data is then replicated across regions and accessed locally, as displayed in the following graphic.

Application-based asynchronous replication

Kubernetes currently provides no native implementation for application-based asynchronous replication. However, because containers and Kubernetes are loosely coupled, you should be able to use any traditional app or language approach to replicate storage.

Consider Azure Backup or Velero
As with any app, it's important you back up the data related to your AKS clusters and their apps. When your apps consume and store data which is persisted on disks or in files, you should schedule frequent backups or take regular snapshots of that data. You can use several tools for these backup operations, including:

Azure Disks: Azure Disks can use built-in snapshot technologies. However, your apps might need to flush writes-to-disk before the snapshot operation.
Velero: Velero can back up persistent volumes along with additional cluster resources and configurations.

Design for Relational Data Storage

Business Critical Tier. General Purpose

General Purpose:
Business Critical Tier:
The next service tier to consider is Business Critical, which can generally achieve the highest performance and availability of all Azure SQL service tiers (General Purpose, Hyperscale, Business Critical). Business Critical is meant for mission-critical applications that need low latency and minimal downtime.

Azure SQL Types. Difference

Azure SQL Data Replication types and strategies. Quorum and Leader-Follower patterns: Pros & Cons

Patterns used in Databases

Hyperscale The Hyperscale service tier is currently available for Azure SQL Database, and not Azure SQL Managed Instance. This service tier has a unique architecture because it uses a tiered layer of caches and page servers to expand the ability to quickly access database pages without having to access the data file directly.

Data Replication

Leader and Follower Pattern

Quorum in Databases

Active Geo-Replication

Active geo-replication is available for:

Azure SQL Database: You can configure active geo-replication for any database in any elastic database pool.
You can use active geo-replication to:
Create a readable secondary replica in a different region.
Fail over to a secondary database if your primary database fails or needs to be taken offline.

Data Retention in Azure SQL DBs (Managed Instance and SQL Server)

How to manage situations when you need data retention for more than 35 days

Design for Storage Accounts: File Share, Managed Disks (used by Azure VMs), Blob Storage

Materials are taken from this site: https://rajanieshkaushikk.com/2023/04/08/azure-blob-storage-vs-file-storage-vs-disk-storage-which-is-right-for-you/#:~:text=Azure%20File%20storage%20is%20not,low%20latency%20and%20high%20IOPS.

Azure File Share vs Azure Managed Disks vs Blob Storage

Design for Backups and Recovery

Design Networking

Traffic Manager

Traffic manager. General Info

Failover scenarios:

Manually, by using Azure DNS, this failover solution uses the standard DNS mechanism to fail over to your backup site. This option works best when used in conjunction with the cold standby or the pilot light approaches.
Automatically, by using Traffic Manager, with more complex architectures and multiple sets of resources capable of performing the same function, you can configure Azure Traffic Manager (based on DNS). Traffic Manager checks the health of your resources and routes the traffic from the non-healthy resource to the healthy resource automatically.

High Availability with Traffic Manager:

Approach	Description
Active/Passive with cold standby	Your VMs (and other appliances) that are running in the standby region aren't active until needed. However, your production environment is replicated to a different region. This approach is cost-effective but takes longer to undertake a complete failover.
Active/Passive with pilot light	You establish the standby environment with a minimal configuration; it has only the necessary services running to support a minimal and critical set of apps. In its default form, this approach can only execute minimal functionality. However, it can scale up and spawn more services, as needed, to take more of the production load during a failover.
Active/Passive with warm standby	Your standby region is pre-warmed and is ready to take the base load. Auto scaling is on, and all the instances are up and running. This approach isn't scaled to take the full production load but is functional, and all services are up and running.

Messaging Systems. Messaging Patterns. Kafka

Kafka. Kafka Basics. Kafka Cluster

Kafka Basics. Record. Topics. Consumers. Consumer Groups. Load Balancing. Compression and Batching

Consumers

Consumers are the applications that subscribe to (read and process) data from Kafka topics. Consumers subscribe **to one or more topics** and consume published messages by pulling data from the brokers.

Records

A record is a message or an event that gets stored in Kafka. Essentially, it is the data that travels from producer to consumer through Kafka. A record contains a key, a value, a timestamp, and optional metadata headers.

Consumer Groups. Consumer group can have multiple consumers that subscribe to the same topic, allowing the system to process messages in parallel.

Consumer groups are a way to manage multiple consumers of a messaging system that work together to process messages from one or more topics.  
Each consumer group ensures that all messages in the topic are processed, and each message is processed by only one consumer within the group.  
This approach allows for parallel processing and load balancing among consumers.  
For example, in Apache Kafka, a consumer group can have multiple consumers that subscribe to the same topic, allowing the system to process messages in parallel and distribute the workload evenly.

Load Balancing in Kafka (balancing messages in partitions)

Kafka uses a referred to as "sticky partitioning" to provide load balancing functionality.  
This algorithm locks each message to a specific partition, and then distributes new messages to the next available partition in a round-robin fashion. This ensures that load is spread evenly across partitions and that remain balanced.

Compression and Batching in Kafka

Message batching is the process of combining multiple messages into a single batch before processing or transmitting them.  
This approach can improve throughput and reduce the overhead of processing individual messages.  
  
Compression, on the other hand, reduces the size of the messages, leading to less network bandwidth usage and faster transmission.  
For example, Apache Kafka supports both batching and compression:  
Producers can batch messages together, and the system can compress these batches using various compression algorithms like Snappy or Gzip, reducing the amount of data transmitted and improving overall performance.

Kafka Cluster. Zoo Keeper

Kafka cluster

Kafka is deployed as a cluster of one or more servers, where each server is responsible for running one Kafka broker.

ZooKeeper

ZooKeeper is a distributed key-value store and is used for coordination and storing configurations. It is highly optimized for reads.  
Kafka uses ZooKeeper to coordinate between Kafka brokers; ZooKeeper maintains metadata information about the Kafka cluster.

Kafka vs RabbitMQ vs ActiveMQ vs Azure ServiceBus vs AWS SQS

Kafka vs ServiceBus

Kafka is ideal for streaming data in an at-least-once manner and provides powerful features such as transmission of data over partitions, replication, and high-availability across multiple data centers.  
It is optimized for large-scale data streaming and has built-in support for high throughput, low latency and scalability.   
  
Azure Service Bus is used for messaging and not for streaming, and generally provides higher throughput and lower latency for scenarios where at-most-once messaging is required.    
It supports mobile devices, and provides integration with other Azure services such as Azure Storage and Service Fabric.

Kafka vs AWS SQS

Kafka is ideal for streaming data in an at-least-once manner and provides powerful features such as transmission of data over partitions, replication, and high-availability across multiple data centers.  
It can be used to ingest data from multiple sources to multiple destinations and is optimized for large-scale streaming.  

AWS SQS is not suitable for streaming and is used for message queuing scenarios where at-most-once delivery is required.  
It supports mobile devices, and provides integration with other AWS services.

Patterns with Kafka and Other Services. Q&A

Messaging Patterns with Kafka. Point to Point. Pub-Sub. Request-Response. Fan-Out/Fan-In (Scatter-Gather). Dead Letter Queue

Patterns are not covered in images:

Competing consumers
Guaranteed delivery
Content-based routing
Routing slip
Correlation identifier
Routing by header
Receiver-initiated workflow
Routing using selectors
Sagas

Q&A:

Which messaging pattern fits better for data stream processing?
The best messaging pattern for data stream processing is Publish/Subscribe. This pattern is typically used for passing data between applications, decoupling producers and consumers, and ensuring that messages are distributed to all interested parties in the system.

System Design Interview. Low-level System Design Interview. High-level System Design Interview

CAP Theorem. PACELC Theorem. Examples

CAP Theorem

Consistency ( C ): All nodes see the same data at the same time. This means users can read or write from/to any node in the system and will receive the same data. It is equivalent to having a single up-to-date copy of the data.
Availability ( A ): Availability means every request received by a non-failing node in the system must result in a response. Even when severe network failures occur, every request must terminate. In simple terms, availability refers to a system's ability to remain accessible even if one or more nodes in the system go down.
Partition tolerance ( P ): A partition is a communication break (or a network failure) between any two nodes in the system, i.e., both nodes are up but cannot communicate with each other.
A partition-tolerant system continues to operate even if there are partitions in the system. Such a system can sustain any network failure that does not result in the failure of the entire network.
Data is sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.

PACELC Theorem. General Info: ACID vs BASE. PACELC Theorem Examples

General Info.

We cannot avoid partition in a distributed system, therefore, according to the CAP theorem, a distributed system should choose between consistency or availability.
ACID (Atomicity, Consistency, Isolation, Durability) databases, such as RDBMSs like MySQL, Oracle, and Microsoft SQL Server, chose consistency (refuse response if it cannot check with peers), while BASE (Basically Available, Soft-state, Eventually consistent) databases, such as NoSQL databases like MongoDB, Cassandra, and Redis, chose availability (respond with local data without ensuring it is the latest with its peers).

Examples

Dynamo and Cassandra are PA/EL systems: They choose availability over consistency when a partition occurs; otherwise, they choose lower latency.
BigTable and HBase are PC/EC systems: They will always choose consistency, giving up availability and lower latency.
MongoDB can be considered PA/EC (default configuration): MongoDB works in a primary/secondaries configuration. In the default configuration, all writes and reads are performed on the primary.
As all replication is done asynchronously (from primary to secondaries), when there is a network partition in which primary is lost or becomes isolated on the minority side, there is a chance of losing data that is unreplicated to secondaries, hence there is a loss of consistency during partitions.
Therefore it can be concluded that in the case of a network partition, MongoDB chooses availability, but otherwise guarantees consistency. Alternately, when MongoDB is configured to write on majority replicas and read from the primary, it could be categorized as PC/EC.

Consistent Hashing. Data Partitioning. Data Replication.

Q: Where Consistent Hashing is used for Data Partitioning?
A: Amazon's Dynamo and Apache Cassandra use Consistent Hashing to distribute and replicate data across nodes
Q: In what other scenarios we may use Consistent Hashing for Data Servers?
A: In the following scenarios:
Any system working with a set of storage (or database) servers and needs to scale up or down based on the usage, e.g., the system could need more storage during Christmas because of high traffic.
Any distributed system that needs dynamic adjustment of its cache usage by adding or removing cache servers based on the traffic load.
Any system that wants to replicate its data shards to achieve high availability.

Data Partitioning. Data Replication. Naive approach

Basics

Data partitioning: It is the process of distributing data across a set of servers. It improves the scalability and performance of the system.
Data replication: It is the process of making multiple copies of data and storing them on different servers. It improves the availability and durability of the data across the system.
Data partition and replication strategies lie at the core of any distributed system. A carefully designed scheme for partitioning and replicating the data enhances the performance, availability, and reliability of the system and also defines how efficiently the system will be scaled and managed.

Naive Approach

How do we know on which node a particular piece of data will be stored?
When we add or remove nodes, how do we know what data will be moved from existing nodes to the new nodes? Additionally, how can we minimize data movement when nodes join or leave?

PROS:

Easy to create and understand

CONS:

Hard to add or delete node

Consistent Hashing for Data Partitioning. Algorithm: MD5

Problem statement

Distributed systems can use Consistent Hashing to distribute data across nodes. Consistent Hashing maps data to physical nodes and ensures that only a small set of keys move when servers are added or removed.
Consistent Hashing stores the data managed by a distributed system in a ring. Each node in the ring is assigned a range of data.

How consistent hashing may help us

Whenever the system needs to read or write data, the first step it performs is to apply the MD5 hashing algorithm to the key. The output of this hashing algorithm determines within which range the data lies and hence, on which node the data will be stored.
hus, the hash generated from the key tells us the node where the data will be stored.

PROS:

When node added or deleted only limited amount of data is affected
When node deleted the next node starts being responsible for all operations of removed node

CONS:

Each node in Consistent Hashing represents a real server. Therefore, it shows not great load distribution
Works well only in homogenious systems. If you have different servers you cant balance them well
High chance of hotspot issue (when one server uses more often than others)

Consistent Hashing. Virtual Nodes

PROS:

The load spreads more evenly across the physical nodes on the cluster by dividing the hash ranges into smaller subranges, this speeds up the rebalancing process after adding or removing nodes
When a new node is added, it receives many Vnodes from the existing nodes to maintain a balanced cluster
Many nodes participate in the rebuild process when a node needs to be rebuilt
It's easier to maintain the data cluster if it consists of different machines (heterogenious servers). More powerful machines may have more Vnodes than others

Consistent Hashing. Data Replication using Consistent Hashing

System Design Interview: General Rules. Step by Step guide

Step by Step guide by Design Gurus. 3 Steps: Clarify Requirements, Expectations & Estimations, System interface definition

Clarify Requirements

Ask questions about problems you are trying to solve; Design questions on interview are open-ended
a) It has not ONE correct answer. You must clarify ambiguities
b) Need to know on which aspects you must focus

Question examples:

Will users of our service be able to post tweets and follow other people?
Should we also design to create and display the user's timeline?
Will tweets contain photos and videos?
Are we focusing on the backend only, or are we developing the front-end too?
Will users be able to search tweets?
Do we need to display hot trending topics?
Will there be any push notification for new (or important) tweets?

Define API Interface

Define what APIs are expected from the system. This will establish the exact contract expected from the system and ensure if we haven't gotten any requirements wrong. Some examples of APIs for our Twitter-like service will be:
postTweet(user_id, tweet_data, tweet_location, user_location, timestamp, …) generateTimeline(user_id, current_time, user_location, …) markTweetFavorite(user_id, tweet_id, timestamp, …)

Step by Step guide by Design Gurus. Next 4 Steps: Define data model, Degine Database Type, High-level design

Defining Data Model

Defining the data model in the early part of the interview will clarify how data will flow between different system components. Later, it will guide for data partitioning and management.
The candidate should identify various system entities, how they will interact with each other, and different aspects of data management like storage, transportation, encryption, etc. Here are some entities for our Twitter-like service:

User: UserID, Name, Email, DoB, CreationDate, LastLogin, etc.
Tweet: TweetID, Content, TweetLocation, NumberOfLikes, TimeStamp, etc.
UserFollow: UserID1, UserID2
FavoriteTweets: UserID, TweetID, TimeStamp

Defining Database Type

Which database system should we use? Will NoSQL like Cassandra best fit our needs, or should we use a SQL-like solution? What kind of block storage should we use to store photos and videos?

Questions here:
1) Do we need to be ACID-compliant?
2) Do we need to support Strong Data Consistency and Transactions?
3) What data structure do we have?
4) Amount of Read\Write operations?
In case of Tweeter the answer is possible NoSQL, we can sacrifice Strong Consistency over Availability (Low Latency).
In terms of type we need to estimate the balance between read and write operations. Potentially, we read more often. So, it's a NO for Cassandra and probably it's better to go with MongoDB.

PS: If we are choosing among native NoSQL solutions, for DynamoDB there is only one good DB pattern named "One Big Table". Lots of patterns eventually say you to go with One Big Table: https://www.alexdebrie.com/posts/dynamodb-single-table/

High-Level Design

Draw a block diagram with 5-6 boxes representing the core components of our system. We should identify enough components that are needed to solve the actual problem from end to end.
For Twitter, at a high level, we will need multiple application servers to serve all the read/write requests with load balancers in front of them for traffic distributions.
If we're assuming that we will have a lot more read traffic (compared to write), we can decide to have separate servers to handle these scenarios. On the back-end, we need an efficient database that can store all the tweets and support a large number of reads. We will also need a distributed file storage system for storing photos and videos.

Step by Step guide by Design Gurus. Last Step: Low-Level Design with details. Identify Bottlenecks

Name		Name	Last commit message	Last commit date
Latest commit History 433 Commits
service-principal-example/react-serviceprincipal-test		service-principal-example/react-serviceprincipal-test
togaf		togaf
webapi-service-principal-test/WebApi-Service-Principal-test		webapi-service-principal-test/WebApi-Service-Principal-test
Azure B2B, B2C, Azure AD DS.md		Azure B2B, B2C, Azure AD DS.md
Darp.md		Darp.md
Entitlement: Azure AD B2B Access to third party users.md		Entitlement: Azure AD B2B Access to third party users.md
High Level system design Interview.md		High Level system design Interview.md
Logging,Monitoring.md		Logging,Monitoring.md
Pastebin in practice.md		Pastebin in practice.md
README.md		README.md
Url Shortener.md		Url Shortener.md
aad-application-proxy.md		aad-application-proxy.md
azure-additional-links-resources.md		azure-additional-links-resources.md
azure-datafactory.md		azure-datafactory.md
azure-landing-zones.md		azure-landing-zones.md
azure-proximity-placement-groups.md		azure-proximity-placement-groups.md
high-availability-strategies-in-azure.md		high-availability-strategies-in-azure.md
service principals,managed Identities,vault authentication.md		service principals,managed Identities,vault authentication.md
togaf.md		togaf.md

Glareone/AZ-304-305-SA-And-Architecture-Design-In-Depth

Folders and files

Latest commit

History

Repository files navigation

Table of Contents: System Design in general, System Design In Practice, System design for Azure

AWS

Azure

Enterprise Architecture. ToGAF

Architecture. ATAM. Principles

General Architecture Principles. Data. Caching. Messaging. Distributed Systems

Architecture in Practice. System Design Interview Questions. Main Problems

System Design Interview. Low-Level System Design Interview. High-Level System Design Interview

Other Good Articles & Must Read Books

How Much of Architecture is needed?

Architecture lifecycle

QAW (workshop) => ADD (Attribute-Driven Design) => V&B (Views and Beyond) => ATAM (Architecture Tradeoffs Analysis)

Architecture patterns

Architecture style vs pattern

Architecture Influence Cycle

Architecture Materials. General Materials

Materials

Jeffrey Richter's Course. Slides

Carnegie Mellon Univercity

Attribute-Driven Design (ADD)

Quality Attributes. Tactics to achieve Quality Attributes.

Functional vs Non-Func Attributes. Testing Process

Scenario for Availability. First slide - all spectre what's covered. The second and third - what we exactly handle and what we expect

Tactics to achieve Quality Attributes

Example of Tactics. Detect Faults

Recovery from Faults

Prevent Faults

Control Resource Demand

Reduce computational overhead

Manage Resources

Patterns

Manage Deployment Pipeline

Scale rollouts.

Roll back.

Script deployment commands.

Manage Deployed System

Manage service interactions.

Package dependencies.

Feature toggle.

Patterns for Deployability

Deployment changes point of view

Architecture Styles and their Documentation. Module Styles, C&C Styles, Allocation Styles.

Module Styles. General Information

Notations and Usage. Informal or semi-formal notation: Box and Lines or UML

Module Styles in details.

Decomposition Style. Notations

Decomposition Refinement

Decomposition refinement in UML

Example:

Example:

Informal notation in Layered style

Example

Good example of diagram which uses UML without crosscut arrows

Example

Data Model Evolution over time

Components and Connectors Styles

In UML

Yahoo! pipes.

Example

Web Services

SOA In practice

Constraints

Example

Example

Tiers vs Layers

Tiers. Commonly in Client-Server style

Multi-tier notation

Dynamic Creation and Destruction

Allocation Styles

Examples

Example

Example

Specialization. Platform-style vs Competence-center vs Open-Source

Architecture Views. Documenting Software Architecture. Properties to document in your Architecture Document

Documenting Behavior. 99% of time in practice you work with traced-oriented language