Skip to content

Commit

Permalink
review (#168)
Browse files Browse the repository at this point in the history
  • Loading branch information
sr-remsha authored Aug 13, 2024
1 parent 1c78ff1 commit c0c3159
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 8 deletions.
10 changes: 5 additions & 5 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Modular architecture of AI DIAL allows implementing scalable and customized solu

The [Unified API](https://epam-rail.com/dial_api) enables universal connectivity between models (including models of different modalities), unified access to different embedding models, and facilitates communication with the AI DIAL Core for both external and internal clients.

Applications can form an **ecosystem and interact with each other** through the Unified API with access to all DIAL Core features among which is connectivity to models, file storage, access control, per-request API keys and other - see the following illustration. To enhance performance and fault tolerance, AI DIAL Core employs a proprietary load balancer and a retry mechanism. This significantly reduces delays and boosts throughput especially during peak demand. Refer to the [document](./tutorials/high-load-performance) with the overview of the performance tests to learn more.
Applications can form an **ecosystem and interact with each other** through the Unified API with access to all DIAL Core features among which is connectivity to models, file storage, access control, per-request API keys and other - see the following illustration. To enhance performance and fault tolerance, AI DIAL Core employs a proprietary load balancer and a retry mechanism. This significantly reduces delays and boosts throughput especially during peak demand. Refer to the [document](/tutorials/high-load-performance) with the overview of the performance tests to learn more.

![](./img/arch-intro.svg)

Expand Down Expand Up @@ -75,23 +75,23 @@ AI DIAL Core is headless and is the **only mandatory component**. It includes al

#### Authentication and Authorization

AI DIAL provides native support for [OpenID Connect](https://openid.net/developers/how-connect-works/) and [OAuth2](https://oauth.net/2/) and offers [integration with various Identity Providers (IDP)](./Auth/Web/overview) such as Azure AD, Auth0, Okta, Microsoft Entra, Google OAuth2, and AWS Cognito where you can define user roles and attributes to support your custom permissions model. Additionally, you can leverage Keycloak to work with even wider range of IDPs.
AI DIAL provides native support for [OpenID Connect](https://openid.net/developers/how-connect-works/) and [OAuth2](https://oauth.net/2/) and offers [integration with various Identity Providers (IDP)](/Auth/Web/overview) such as Azure AD, Auth0, Okta, Microsoft Entra, Google OAuth2, and AWS Cognito where you can define user roles and attributes to support your custom permissions model. Additionally, you can leverage Keycloak to work with even wider range of IDPs.

There are two methods of CORE API calls authorization supported: JWT token and key. Both options provide granular permission management, allowing you to control access to specific functionalities or resources. Additionally, these authorization methods also enable rate and cost control, giving you the ability to manage the frequency of API calls.

> Refer to [Auth](./Auth/overview) to learn how to authenticate API keys and chat users and to [Roles & Access Control](/Roles%20and%20Access%20Control/overview) to learn how to implement a custom role-based access policy.
> Refer to [Auth](/Auth/overview) to learn how to authenticate API keys and chat users and to [Roles & Access Control](/Roles%20and%20Access%20Control/overview) to learn how to implement a custom role-based access policy.
#### Load Balancer

For self-hosted models, you can use the standard load balancer (LB) capabilities provided by the target cloud platform. As for cloud-deployed models like Azure OpenAI and others, we typically rely on our custom-developed load balancing solution.

In this approach, a configuration file includes multiple upstream endpoints for a model. When a request is received, it is forwarded to one of the endpoints using the round-robin method. If an upstream returns an overload limit error such as a 429 (Too Many Requests) or a 504 (Gateway Timeout), the system attempts another upstream and temporarily excludes the one that generated the error. This strategy ensures efficient load distribution and fault tolerance for optimal performance and reliability. Refer to the [document](./tutorials/high-load-performance) with the overview of the performance tests to learn more.
In this approach, a configuration file includes multiple upstream endpoints for a model. When a request is received, it is forwarded to one of the endpoints using the round-robin method. If an upstream returns an overload limit error such as a 429 (Too Many Requests) or a 504 (Gateway Timeout), the system attempts another upstream and temporarily excludes the one that generated the error. This strategy ensures efficient load distribution and fault tolerance for optimal performance and reliability. Refer to the [document](/tutorials/high-load-performance) with the overview of the performance tests to learn more.

#### Rate Limits & Cost Control

A well-distributed rate-limiting mechanism ensures the control over the total number of tokens that can be sent to a model (typically a one-minute or 24-hour window) by any Application, Addon, or Assistant.

> Refer to [AI DIAL Configuration](./Deployment/configuration#dynamic-settings) to learn more about roles and rate limits.
> Refer to [AI DIAL Configuration](/Deployment/configuration#dynamic-settings) to learn more about roles and rate limits.
#### Extension Framework

Expand Down
4 changes: 1 addition & 3 deletions docs/user-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -334,9 +334,7 @@ To add a file into a folder it is necessary to upload it.

**To Unshare a File**:

Using the application that supports the attachments of files, you can create a conversation and share it with someone. When a conversation is shared, all the attached to the conversation files are shared as well. In the attachments manager, such shared files are tagged with a blue arrow:

![](./img/unshare.png)
Using the application that supports the attachments of files, you can create a conversation and share it with someone. When a conversation is shared, all the attached to the conversation files are shared as well. In the attachments manager, such shared files are tagged with a blue arrow.

In the file menu, you can choose to unshare it:

Expand Down

0 comments on commit c0c3159

Please sign in to comment.