Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tabletojson dependency and implement HTMLTablesToJSON function #672

Merged
merged 2 commits into from
Aug 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
"gptool",
"gptools",
"gptoolsjs",
"limitrows",
"llmify",
"llmrequest",
"localai",
Expand All @@ -34,6 +35,7 @@
"promptfoo",
"stringifying",
"sysr",
"tabletojson",
"treesitter",
"typecheck",
"unfence",
Expand Down
28 changes: 27 additions & 1 deletion docs/genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
script({
model: "openai:gpt-4",
title: "generating tests from samples",
system: ["system"],
parameters: {
Expand Down
28 changes: 27 additions & 1 deletion genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 0 additions & 2 deletions packages/cli/src/playwright.ts
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,6 @@ export class BrowserManager {

logVerbose(`browsing ${ellipseUri(url)}`)
const browser = await this.launchBrowser(options)
logVerbose(`navigating...`)
let page: Page
if (incognito) {
const context = await browser.newContext(rest)
Expand All @@ -96,7 +95,6 @@ export class BrowserManager {
}
if (timeout !== undefined) page.setDefaultTimeout(timeout)
if (url) await page.goto(url)
logVerbose(`page ready`)
return page
}
}
3 changes: 2 additions & 1 deletion packages/core/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
"sanitize-html": "^2.13.0",
"semver": "^7.6.3",
"serialize-error": "^11.0.3",
"tabletojson": "^4.1.4",
"toml": "^3.0.0",
"tree-sitter-wasms": "^0.1.11",
"ts-dedent": "^2.2.0",
Expand All @@ -76,4 +77,4 @@
"@types/turndown": "^5.0.5",
"turndown": "^7.2.0"
}
}
}
28 changes: 27 additions & 1 deletion packages/core/src/genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion packages/core/src/globals.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import {
updateFrontmatter,
} from "./frontmatter"
import { JSONLStringify, JSONLTryParse } from "./jsonl"
import { HTMLToMarkdown, HTMLToText } from "./html"
import { HTMLTablesToJSON, HTMLToMarkdown, HTMLToText } from "./html"

export function resolveGlobal(): any {
if (typeof window !== "undefined")
Expand Down Expand Up @@ -58,6 +58,7 @@ export function installGlobals() {
},
})
glb.HTML = Object.freeze<HTML>({
convertTablesToJSON: HTMLTablesToJSON,
convertToMarkdown: HTMLToMarkdown,
convertToText: HTMLToText,
})
Expand Down
20 changes: 19 additions & 1 deletion packages/core/src/html.test.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,26 @@
import test, { describe } from "node:test"
import { HTMLToMarkdown, HTMLToText } from "./html"
import { HTMLTablesToJSON, HTMLToMarkdown, HTMLToText } from "./html"
import assert from "node:assert/strict"

describe("html", () => {
test("convert HTML table to JSON", () => {
const html = `
<table>
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>Value 1</td>
<td>Value 2</td>
</tr>
</table>
`
const expected = [{ "Header 1": "Value 1", "Header 2": "Value 2" }]
const result = HTMLTablesToJSON(html)[0]
console.log(JSON.stringify(result, null, 2))
assert.deepStrictEqual(result, expected)
})
test("converts HTML to text", () => {
const html = "<p>Hello, world!</p>"
const expected = "Hello, world!"
Expand Down
10 changes: 8 additions & 2 deletions packages/core/src/html.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
import { convert } from "html-to-text"
import { convert as convertToText } from "html-to-text"
import { TraceOptions } from "./trace"
import Turndown from "turndown"
import { tabletojson } from "tabletojson"

export function HTMLTablesToJSON(html: string, options?: {}): object[][] {
const res = tabletojson.convert(html, options)
return res
}

Check failure on line 9 in packages/core/src/html.ts

View workflow job for this annotation

GitHub Actions / build

The function `HTMLTablesToJSON` does not have error handling. Consider adding a try-catch block to handle potential exceptions. πŸ› οΈ
pelikhan marked this conversation as resolved.
Show resolved Hide resolved

export function HTMLToText(
html: string,
Expand All @@ -11,7 +17,7 @@
const { trace } = options || {}

try {
const text = convert(html, options)
const text = convertToText(html, options)
return text
} catch (e) {
trace?.error("HTML conversion failed", e)
Expand Down
2 changes: 1 addition & 1 deletion packages/core/src/promptdom.ts
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ export function createFileOutput(output: FileOutput): FileOutputNode {
return { type: "fileOutput", output }
}

export function createDefDataNode(
export function createDefData(
name: string,
data: object | object[],
options?: DefDataOptions
Expand Down
2 changes: 1 addition & 1 deletion packages/core/src/promptrunner.ts
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@
assert(model !== undefined)

try {
trace.itemValue("🧠 model", model ?? "??")
trace.heading(3, `🧠 running ${template.id} with model ${model ?? ""}`)

Check failure on line 88 in packages/core/src/promptrunner.ts

View workflow job for this annotation

GitHub Actions / build

The logging message `🧠 running ${template.id} with model ${model ?? ""}` could be ambiguous if the model is undefined or an empty string. Consider improving the message for better clarity. πŸ“
pelikhan marked this conversation as resolved.
Show resolved Hide resolved
if (cliInfo) traceCliArgs(trace, template, options)

const vars = await resolveExpansionVars(
Expand Down
4 changes: 2 additions & 2 deletions packages/core/src/runpromptcontext.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
appendChild,
createAssistantNode,
createChatParticipant,
createDefDataNode,
createDefData,
createDefNode,
createFileOutput,
createFunctionNode,
Expand Down Expand Up @@ -114,7 +114,7 @@
return name
},
defData: (name, data, defOptions) => {
appendChild(node, createDefDataNode(name, data, defOptions))
appendChild(node, createDefData(name, data, defOptions))

Check failure on line 117 in packages/core/src/runpromptcontext.ts

View workflow job for this annotation

GitHub Actions / build

The method `createDefDataNode` was renamed to `createDefData`. Ensure that this change does not break any existing references to the old method name. πŸ”„

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method createDefDataNode was renamed to createDefData. Ensure that this change does not break any existing references to the old method name. πŸ”„

generated by pr-review-commit method_renaming

return name
},
fence(body, options?: DefOptions) {
Expand Down
28 changes: 27 additions & 1 deletion packages/core/src/types/prompt_template.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1156,14 +1156,40 @@ interface XML {
parse(text: string, options?: XMLParseOptions): any
}

interface HTMLTableToJSONOptions {
useFirstRowForHeadings?: boolean
headers?: HeaderRows
stripHtmlFromHeadings?: boolean
stripHtmlFromCells?: boolean
stripHtml?: boolean | null
forceIndexAsNumber?: boolean
countDuplicateHeadings?: boolean
ignoreColumns?: number[] | null
onlyColumns?: number[] | null
ignoreHiddenRows?: boolean
id?: string[] | null
headings?: string[] | null
containsClasses?: string[] | null
limitrows?: number | null
}

interface HTML {
/**
* Converts all HTML tables to JSON.
* @param html
* @param options
*/
convertTablesToJSON(
html: string,
options?: HTMLTableToJSONOptions
): object[][]
/**
* Converts HTML markup to plain text
* @param html
*/
convertToText(html: string): string
/**
* Converts HMTL markup to markdown
* Converts HTML markup to markdown
* @param html
*/
convertToMarkdown(html: string): string
Expand Down
9 changes: 4 additions & 5 deletions packages/sample/genaisrc/browse-text.genai.mts
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@ const page = await host.browse(
)
const table = page.locator('table[data-testid="csv-table"]')
const html = await table.innerHTML()
console.log(`HTML:` + html)
const csv = HTML.convertToText(html)
console.log(`TEXT: ` + csv)
def("DATA", csv)
$`Analyze DATA.`
const csv = HTML.convertTablesToJSON("<table>" + html + "</table>")[0]
csv.forEach((row) => delete row[Object.keys(row)[0]]) // remove the first column
defData("DATA", csv, { format: "csv" })
$`Analyze DATA and provide a statistical summary.`
28 changes: 27 additions & 1 deletion packages/sample/genaisrc/genaiscript.d.ts

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading