Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

++Enable caching for LLM requests with configurable cache names #677

Merged
merged 3 commits into from
Aug 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

37 changes: 14 additions & 23 deletions docs/src/content/docs/reference/scripts/cache.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,21 @@
---

import { FileTree } from "@astrojs/starlight/components"

Check warning on line 10 in docs/src/content/docs/reference/scripts/cache.mdx

View workflow job for this annotation

GitHub Actions / build

The statement about LLM requests caching has been changed. It was previously stated that LLM requests are cached by default, but the updated content states that they are not cached by default. This is a significant change and should be reviewed for accuracy.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement about LLM requests caching has been changed. It was previously stated that LLM requests are cached by default, but the updated content states that they are not cached by default. This is a significant change and should be verified for accuracy.

generated by pr-docs-review-commit content_change

LLM requests are cached by default. This means that if a script generates the same prompt for the same model, the cache may be used.
LLM requests are **NOT** cached by default. However, you can turn on LLM request caching from `script` metadata or the CLI arguments.

- the `temperature` is less than 0.5
- the `top_p` is less than 0.5
- no [functions](./functions.md) are used as they introduce randomness
- `seed` is not used
```js "cache: true"
script({
...,
cache: true
})
```

or

```sh "--cache"
npx genaiscript run ... --cache
```

Check notice on line 24 in docs/src/content/docs/reference/scripts/cache.mdx

View workflow job for this annotation

GitHub Actions / build

New content has been added explaining how to enable LLM request caching from script metadata or the CLI arguments. This is a useful addition as it provides clear instructions to the user.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New content has been added to explain how to enable LLM request caching. This includes a JavaScript code snippet and a shell command. Ensure that these instructions are correct and clear for users.

generated by pr-docs-review-commit content_addition

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New code examples have been added to illustrate how to enable LLM request caching. Ensure these examples are correct and clear to the reader.

generated by pr-docs-review-commit code_example_added


The cache is stored in the `.genaiscript/cache/chat.jsonl` file. You can delete this file to clear the cache.
This file is excluded from git by default.
Expand All @@ -26,32 +34,15 @@

</FileTree>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section on disabling the cache has been removed. If this information is still relevant and useful, consider adding it back to the documentation.

generated by pr-docs-review-commit content_removal

## Disabling

You can always disable the cache using the `cache` option in `script`.

```js
script({
...,
cache: false // always off
})
```

Or using the `--no-cache` flag in the CLI.

```sh
npx genaiscript run .... --no-cache
```

## Custom cache file

Use the `cacheName` option to specify a custom cache file name.
The name will be used to create a file in the `.genaiscript/cache` directory.

```js
script({
...,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The property name in the JavaScript code snippet has been changed from 'cacheName' to 'cache'. This could potentially confuse users if not properly explained in the surrounding text.

generated by pr-docs-review-commit content_change

cacheName: "summary"
cache: "summary"

Check warning on line 45 in docs/src/content/docs/reference/scripts/cache.mdx

View workflow job for this annotation

GitHub Actions / build

The property name for specifying a custom cache file name has been changed from 'cacheName' to 'cache'. This change should be reviewed for accuracy and consistency with the rest of the codebase.
})

Check failure on line 46 in docs/src/content/docs/reference/scripts/cache.mdx

View workflow job for this annotation

GitHub Actions / build

The section on disabling the cache has been removed. This information might be important for users who want to disable caching. Consider adding it back or providing an alternative way to disable caching.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section on disabling the cache has been removed. If this information is still relevant, it should be included in the documentation.

generated by pr-docs-review-commit content_removed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section on disabling the cache has been removed. This information might be important for users who want to disable caching. Consider adding it back or providing an alternative way to disable caching.

generated by pr-docs-review-commit content_removal

```

Expand Down
6 changes: 3 additions & 3 deletions genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 11 additions & 10 deletions packages/cli/src/run.ts
Original file line number Diff line number Diff line change
Expand Up @@ -101,13 +101,15 @@
exitCode === SUCCESS_ERROR_CODE ||
UNRECOVERABLE_ERROR_CODES.includes(exitCode)
)
break

const delayMs = 2000 * Math.pow(2, r)
console.error(
`error: run failed with ${exitCode}, retry #${r + 1}/${runRetry} in ${delayMs}ms`
)
await delay(delayMs)
if (runRetry > 1) {
console.error(
`error: run failed with ${exitCode}, retry #${r + 1}/${runRetry} in ${delayMs}ms`
)
await delay(delayMs)
}

Check failure on line 112 in packages/cli/src/run.ts

View workflow job for this annotation

GitHub Actions / build

The retry logic has been changed to only retry if `runRetry` is greater than 1. This could potentially skip necessary retries if `runRetry` is 1, which might lead to unexpected behavior. Please ensure this is the intended behavior.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry logic has been changed to only retry if runRetry is greater than 1. This could potentially skip necessary retries if runRetry is 1, which might lead to unexpected behavior. Please ensure this is the intended behavior.

generated by pr-review-commit retry_logic

}
process.exit(exitCode)
}
Expand Down Expand Up @@ -156,7 +158,7 @@
const jsSource = options.jsSource

const fail = (msg: string, exitCode: number) => {
logVerbose(msg)
logError(msg)
return { exitCode, result }
}

Expand Down Expand Up @@ -303,9 +305,6 @@
return fail("runtime error", RUNTIME_ERROR_CODE)
}
if (!isQuiet) logVerbose("") // force new line
if (result.status !== "success" && result.status !== "cancelled")
logVerbose(result.statusText ?? result.status)

if (outAnnotations && result.annotations?.length) {
if (isJSONLFilename(outAnnotations))
await appendJSONL(outAnnotations, result.annotations)
Expand Down Expand Up @@ -485,8 +484,10 @@
}
}
// final fail
if (result.error && !isCancelError(result.error))
return fail(errorMessage(result.error), RUNTIME_ERROR_CODE)
if (result.status !== "success" && result.status !== "cancelled") {
const msg = errorMessage(result.error) ?? result.statusText
return fail(msg, RUNTIME_ERROR_CODE)
}

if (failOnErrors && result.annotations?.some((a) => a.severity === "error"))
return fail("error annotations found", ANNOTATION_ERROR_CODE)
Expand Down
2 changes: 1 addition & 1 deletion packages/core/src/chattypes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ export interface ChatCompletionsOptions {
requestOptions?: Partial<Omit<RequestInit, "signal">>
maxCachedTemperature?: number
maxCachedTopP?: number
cache?: boolean
cache?: boolean | string
cacheName?: string
retry?: number
retryDelay?: number
Expand Down
4 changes: 1 addition & 3 deletions packages/core/src/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@ export const CHANGE = "change"
export const TRACE_CHUNK = "traceChunk"
export const RECONNECT = "reconnect"
export const OPEN = "open"
export const MAX_CACHED_TEMPERATURE = 0.5
export const MAX_CACHED_TOP_P = 0.5
export const MAX_TOOL_CALLS = 10000

// https://learn.microsoft.com/en-us/azure/ai-services/openai/reference
Expand Down Expand Up @@ -211,7 +209,7 @@ export const GITHUB_API_VERSION = "2022-11-28"
export const GITHUB_TOKEN = "GITHUB_TOKEN"

export const AI_REQUESTS_CACHE = "airequests"
export const CHAT_CACHE = "chatv2"
export const CHAT_CACHE = "chat"
export const GITHUB_PULL_REQUEST_REVIEWS_CACHE = "prr"
export const GITHUB_PULLREQUEST_REVIEW_COMMENT_LINE_DISTANCE = 5

Expand Down
6 changes: 3 additions & 3 deletions packages/core/src/genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

35 changes: 14 additions & 21 deletions packages/core/src/openai.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
import { LanguageModelConfiguration, host } from "./host"
import {
AZURE_OPENAI_API_VERSION,
MAX_CACHED_TEMPERATURE,
MAX_CACHED_TOP_P,
MODEL_PROVIDER_OPENAI,
TOOL_ID,
} from "./constants"
Expand Down Expand Up @@ -50,13 +48,10 @@
options,
trace
) => {
const { temperature, top_p, seed, tools } = req
const {
requestOptions,
partialCb,
maxCachedTemperature = MAX_CACHED_TEMPERATURE,
maxCachedTopP = MAX_CACHED_TOP_P,
cache: useCache,
cache: cacheOrName,
cacheName,
retry,
retryDelay,
Expand All @@ -69,18 +64,12 @@
const { model } = parseModelIdentifier(req.model)
const encoder = await resolveTokenEncoder(model)

const cache = getChatCompletionCache(cacheName)
const caching =
useCache === true || // always use cache
(useCache !== false && // never use cache
seed === undefined && // seed is not cacheable (let the LLM make the run deterministic)
!tools?.length && // assume tools are non-deterministic by default
(isNaN(temperature) ||
isNaN(maxCachedTemperature) ||
temperature < maxCachedTemperature) && // high temperature is not cacheable (it's too random)
(isNaN(top_p) || isNaN(maxCachedTopP) || top_p < maxCachedTopP))
trace.itemValue(`caching`, caching)
const cachedKey = caching
const cache = getChatCompletionCache(
typeof cacheOrName === "string" ? cacheOrName : cacheName
)
trace.itemValue(`caching`, !!cache)
trace.itemValue(`cache`, cache?.name)
const cachedKey = !!cacheOrName
? <ChatCompletionRequestCacheKey>{
...req,
...cfgNoToken,
Expand Down Expand Up @@ -160,7 +149,11 @@
try {
body = await r.text()
} catch (e) {}
const { error } = JSON5TryParse(body, {}) as { error: any }
const { error, message } = JSON5TryParse(body, {}) as {
error: any
message: string
}
if (message) trace.error(message)
if (error)
trace.error(undefined, <SerializedError>{
name: error.code,
Expand All @@ -169,7 +162,7 @@
})
throw new RequestError(
r.status,
r.statusText,
message ?? error?.message ?? r.statusText,
error,
body,
normalizeInt(r.headers.get("retry-after"))
Expand Down Expand Up @@ -263,11 +256,11 @@
responseSoFar: chatResp,
tokensSoFar: numTokens,
responseChunk: progress,
inner
inner,
})
}
pref = chunk
}

Check failure on line 263 in packages/core/src/openai.ts

View workflow job for this annotation

GitHub Actions / build

The error handling logic has been changed to include a `message` field. However, the `message` field is not always guaranteed to be present in the response. This could potentially lead to undefined values being used in error messages. Please ensure to handle the case where `message` might be undefined.
}

async function listModels(
Expand Down
2 changes: 1 addition & 1 deletion packages/core/src/server/messages.ts
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,10 @@
maxTokens: string
maxToolCalls: string
maxDataRepairs: string
model: string

Check failure on line 70 in packages/core/src/server/messages.ts

View workflow job for this annotation

GitHub Actions / build

The type of `cache` has been changed from `boolean` to `boolean | string`. This could potentially break existing code that expects `cache` to be a boolean. Please ensure that all usage of `cache` has been updated to handle the new type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of cache has been changed from boolean to boolean | string. This could potentially break existing code that expects cache to be a boolean. Please ensure that all usage of cache has been updated to handle the new type.

generated by pr-review-commit cache_type_change

embeddingsModel: string
csvSeparator: string
cache: boolean
cache: boolean | string
cacheName: string
applyEdits: boolean
failOnErrors: boolean
Expand Down
6 changes: 3 additions & 3 deletions packages/core/src/types/prompt_template.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -176,13 +176,13 @@ interface ModelOptions extends ModelConnectionOptions {
seed?: number

/**
* If true, the prompt will be cached. If false, the LLM chat is never cached.
* Leave empty to use the default behavior.
* By default, LLM queries are not cached. If true, the LLM request will be cached. Use a string to override the default cache name
*/
cache?: boolean
cache?: boolean | string

/**
* Custom cache name. If not set, the default cache is used.
* @deprecated Use `cache` instead with a string
*/
cacheName?: string
}
Expand Down
3 changes: 1 addition & 2 deletions packages/sample/genaisrc/cache.genai.mts
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
script({
model: "openai:gpt-3.5-turbo",
cache: true,
cacheName: "gpt-cache",
cache: "gpt-cache",
tests: [{}, {}], // run twice to trigger caching
})

Expand Down
6 changes: 3 additions & 3 deletions packages/sample/genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions packages/sample/genaisrc/node/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions packages/sample/genaisrc/python/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions packages/sample/genaisrc/style/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion packages/sample/genaisrc/summary-of-summary-gpt35.genai.js
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ for (const file of env.files) {
_.def("FILE", file)
_.$`Summarize FILE. Be concise.`
},
{ model: "gpt-3.5-turbo", cacheName: "summary_gpt35" }
{ model: "gpt-3.5-turbo", cache: "summary_gpt35" }
)
// save the summary in the main prompt
def("FILE", { filename: file.filename, content: text })
Expand Down
4 changes: 2 additions & 2 deletions packages/sample/genaisrc/summary-of-summary-phi3.genai.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ script({
tests: {
files: ["src/rag/*.md"],
keywords: ["markdown", "lorem", "microsoft"],
}
},
})

// summarize each files individually
Expand All @@ -15,7 +15,7 @@ for (const file of env.files) {
_.def("FILE", file)
_.$`Extract keywords for the contents of FILE.`
},
{ model: "ollama:phi3", cacheName: "summary_phi3" }
{ model: "ollama:phi3", cache: "summary_phi3" }
)
def("FILE", { ...file, content: text })
}
Expand Down
Loading