Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformers.js sample #581

Merged
merged 7 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions docs/src/content/docs/guides/transformers-js.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
---
title: Transformer.js

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filename Transformer.js should be transformers.js to match the library's actual name.

generated by pr-docs-review-commit filename_incorrect

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title field in the metadata section should be 'Transformers.js' instead of 'Transformer.js'.

generated by pr-docs-review-commit metadata_field_incorrect

sidebar:
order: 20
---
import { Code } from "@astrojs/starlight/components"
import sampleSrc from "../../../../../packages/sample/genaisrc/summary-with-transformers.genai?raw"


HuggingFace [Transformers.js](https://huggingface.co/docs/transformers.js/index) is a JavaScript library

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL provided for Transformers.js documentation is incorrect; it should be https://huggingface.co/transformers/.

generated by pr-docs-review-commit incorrect_url

that lets you run pretrained models locally on your machine. The library uses [onnxruntime](https://onnxruntime.ai/)
to leverage the CPU/GPU capabilities of your hardware.

In this guide, we will show how to create [summaries](https://huggingface.co/tasks/summarization) using the [Transformers.js](https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.SummarizationPipeline) library.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL provided for Transformers.js summarization pipeline is incorrect; it should be https://huggingface.co/transformers/main_classes/pipelines.html#transformers.SummarizationPipeline.

generated by pr-docs-review-commit incorrect_url


:::tip

Transformers.js has an extensive list of tasks available. This guide will only cover one but checkout their [documentation](https://huggingface.co/docs/transformers.js/pipelines#tasks)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL provided for Transformers.js tasks documentation is incorrect; it should be https://huggingface.co/transformers/task_summary.html.

generated by pr-docs-review-commit incorrect_url

for more.

:::

## Installation

Following the [installation instructions](https://huggingface.co/docs/transformers.js/installation),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL provided for Transformers.js installation instructions is incorrect; it should be https://huggingface.co/transformers/installation.html.

generated by pr-docs-review-commit incorrect_url

we add the [@xenova/transformers](https://www.npmjs.com/package/@xenova/transformers) to the current project.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package name @xenova/transformers is incorrect; it should be @huggingface/transformers.

generated by pr-docs-review-commit incorrect_package_name


```bash
npm install @xenova/transformers

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package name @xenova/transformers is incorrect in the installation command; it should be npm install @huggingface/transformers.

generated by pr-docs-review-commit incorrect_package_name

```

You can also install this library globally to be able to use on any project

```bash "-g"
npm install -g @xenova/transformers

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package name @xenova/transformers is incorrect in the global installation command; it should be npm install -g @huggingface/transformers.

generated by pr-docs-review-commit incorrect_package_name

```

## Import the pipeline

The snippet below imports the Transformers.js library and loads the summarizer pipeline and model.
You can specify a model name or let the library pick the latest and greatest.

```js
import { pipeline } from "@xenova/transformers"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import statement uses an incorrect package name @xenova/transformers; it should be @huggingface/transformers.

generated by pr-docs-review-commit incorrect_package_import

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import statement should use '@huggingface/transformers' instead of '@xenova/transformers'.

generated by pr-docs-review-commit incorrect_import

const summarizer = await pipeline("summarization")
```

Allocating and loading the model can take some time,
so it's best to do this at the beginning of your script
and only once.

:::note[Migrate your script to `.mjs`]

To use the `Transformers.js` library, you need to use the `.mjs` extension for your script (or `.mts` for TypeScript support).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advice to migrate scripts to .mjs is incorrect and not necessary for using the Transformers.js library.

generated by pr-docs-review-commit incorrect_extension_advice

If your script is ending in `.genai.js`, rename it to `.genai.mjs`.

:::

## Invoke the pipeline

The summarizer pipeline has a single argument, the content to summarize. It returns an array of summaries
which we need to unpack to access the final summary text. This is what we do below and `summary_index` contains the summary text.

```js
const [summary] = await summarizer(content)
// @ts-ignore
const { summary_text } = summary
```

## Final code

The example below generates a summary of each input file
before letting the model generate a full summary.

<Code

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file reference transformers.genai.mjs is incorrect; it should match the actual file name used in the project.

generated by pr-docs-review-commit incorrect_file_reference

title="transformers.genai.mjs"
code={sampleSrc}
wrap={true}
lang="js"
/>
3 changes: 2 additions & 1 deletion packages/cli/src/nodehost.ts
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ import {
} from "../../core/src/models"
import { createBundledParsers } from "../../core/src/pdf"
import { AbortSignalOptions, TraceOptions } from "../../core/src/trace"
import { unique } from "../../core/src/util"
import { logVerbose, unique } from "../../core/src/util"

class NodeServerManager implements ServerManager {
async start(): Promise<void> {
Expand Down Expand Up @@ -73,6 +73,7 @@ class ModelManager implements ModelService {
if (provider === MODEL_PROVIDER_OLLAMA) {
if (this.pulled.includes(modelid)) return { ok: true }

logVerbose(`ollama: pulling ${modelid}...`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logVerbose function is called without any context. It might be better to provide more information about the current state of the application. 🤔

generated by pr-review-commit log_without_context

const conn = await this.getModelToken(modelid)
const res = await fetch(`${conn.base}/api/pull`, {
method: "POST",
Expand Down
6 changes: 3 additions & 3 deletions packages/core/src/importprompt.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ import { assert } from "console"
import { host } from "./host"
import { logError } from "./util"
import { TraceOptions } from "./trace"
import { pathToFileURL } from "url"
import { fileURLToPath, pathToFileURL } from "url"

function resolveGlobal(): any {
if (typeof window !== "undefined")
Expand Down Expand Up @@ -46,10 +46,10 @@ export async function importPrompt(
import.meta.url ??
pathToFileURL(__filename ?? host.projectFolder()).toString()

trace?.itemValue(`import`, `${modulePath}, parent: ${parentURL}`)
const onImport = (file: string) => {
trace?.itemValue("📦 import", file)
// trace?.itemValue("📦 import", fileURLToPath(file))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have commented out the trace function call. This might lead to loss of important debugging information. Please ensure this is intentional. 🕵️‍♀️

generated by pr-review-commit commented_code

onImport(modulePath)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The onImport(modulePath) function call seems unnecessary as it is called before the register function where it should be used. This might lead to unexpected behavior. 🧐

generated by pr-review-commit unnecessary_code

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The onImport function is called with modulePath but the result is not used or stored. This could lead to unexpected behavior. 😕

generated by pr-review-commit unused_function_call

const { tsImport, register } = await import("tsx/esm/api")
unregister = register({ onImport })
const module = await tsImport(modulePath, {
Expand Down
6 changes: 5 additions & 1 deletion packages/core/src/runpromptcontext.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import { isJSONSchema } from "./schema"
import { consoleLogFormat } from "./logging"
import { resolveFileDataUri } from "./file"
import { isGlobMatch } from "./glob"
import { logVerbose } from "./util"

export function createChatTurnGenerationContext(
options: GenerationOptions,
Expand All @@ -28,7 +29,10 @@ export function createChatTurnGenerationContext(

const log = (...args: any[]) => {
const line = consoleLogFormat(...args)
if (line) trace.log(line)
if (line) {
trace.log(line)
logVerbose(line)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logVerbose function is called with the same argument as trace.log. This could lead to duplicate log entries. 🙃

generated by pr-review-commit duplicate_logging

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logVerbose function is called inside the trace.log condition. This could lead to excessive logging and performance issues. Consider moving it outside the condition or adding additional checks. 😮

generated by pr-review-commit log_in_trace

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logVerbose function is called right after trace.log with the same argument. This could lead to duplicate logs. Consider removing one of them to avoid unnecessary duplication. 🔄

generated by pr-review-commit log_duplication

}
}
const console = Object.freeze<PromptGenerationConsole>({
log,
Expand Down
28 changes: 28 additions & 0 deletions packages/sample/genaisrc/summary-with-transformers.genai.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
script({
title: "summary of summary - transformers.js",
model: "ollama:phi3",
files: ["src/rag/markdown.md"],
tests: {
files: ["src/rag/markdown.md"],
keywords: ["markdown"],
},
})

console.log("loading summarizer transformer")
import { pipeline } from "@xenova/transformers"
const summarizer = await pipeline("summarization")

for (const file of env.files) {
console.log(`summarizing ${file.filename}`)
const [summary] = await summarizer(file.content)
// @ts-ignore
const { summary_text } = summary
def("FILE", {
filename: file.filename,
// @ts-ignore
content: summary_text,
})
}

console.log(`summarize all summaries`)
$`Summarize all the contents in FILE.`
3 changes: 2 additions & 1 deletion packages/sample/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
"test:scripts:view": "node ../cli/built/genaiscript.cjs test view"
},
"devDependencies": {
"@tidyjs/tidy": "^2.5.2"
"@tidyjs/tidy": "^2.5.2",
"@xenova/transformers": "^2.17.2"
}
}
Loading
Loading