Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tabletojson dependency and implement HTMLTablesToJSON function #672

Merged
merged 2 commits into from
Aug 29, 2024

Conversation

pelikhan
Copy link
Member

@pelikhan pelikhan commented Aug 29, 2024

  • The codebase has seen a big change where a functionality to convert HTML tables to JSON is implemented. πŸ”€πŸ—οΈ

    • The implementation uses tabletojson library that is added in packages/core/package.json.
    • The function HTMLTablesToJSON doing the conversion is added into html.ts.
    • The tests for this function are added in html.test.ts.
    • The newly implemented function was also added to global HTML object in globals.ts.
    • As such, the HTML-JSON converter functionality is user-facing as it's part of the public API listed in prompt_template.d.ts.
  • There are modifications around tracing and debugging, which improves coverage and quality of logs. πŸ“πŸ”

    • In promptrunner.ts, the model is now part of the heading used in tracing.
  • The way the data is defined and stored is changed.

    • The function createDefDataNode is renamed to createDefData in promptdom.ts and runpromptcontext.ts.
  • A minor change is made to the sample code in browse-text.genai.mts. Now, the code converts an HTML table to JSON using the newly implemented converter and removes the first column from the result before defining "DATA".

  • A part of the code where verbose logs are made is removed reducing noise in the output. This happens in playwright.ts where certain log statements are removed providing a cleaner output for users. πŸ—‘οΈ 🎯

  • Minor formatting and typo corrections are carried out, for better code readability and understanding. πŸ“˜πŸ”€

  • Note that import lines and added comments are ignored as per the task instruction.

generated by pr-describe

…pdate HTML table JSON conversion and prompt generation, add limitrows to settings
@@ -114,7 +114,7 @@ export function createChatTurnGenerationContext(
return name
},
defData: (name, data, defOptions) => {
appendChild(node, createDefDataNode(name, data, defOptions))
appendChild(node, createDefData(name, data, defOptions))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method createDefDataNode was renamed to createDefData. Ensure that this change does not break any existing references to the old method name. πŸ”„

generated by pr-review-commit method_renaming

Copy link

The changes can be summarized as follows:

  1. Deletion of two log verbose lines from the playwright.ts file which probably were meant for debugging.
  2. Addition of a new function HTMLTablesToJSON in globals.ts and html.ts that converts HTML tables to JSON format.
  3. Corresponding unit test for HTMLTablesToJSON function has been added in html.test.ts.
  4. Renamed createDefDataNode to createDefData in promptdom.ts and modified all its references in other files.
  5. Modified trace output in promptrunner.ts to give more descriptive information about the model and template id.

Concerns:

  1. The convertTablesToJSON function uses external library tabletojson.convert but there is no error handling for it. The function could break if the library function doesn't work as expected.

Suggestion:

+export function HTMLTablesToJSON(html: string, options?: {}): object[][] {
+    try {
+        const res = tabletojson.convert(html, options)
+        return res
+    } catch(e) {
+        console.error("Error converting HTML Tables to JSON", e)
+    }
+    return []
+}
  1. It would be more efficient to handle logging levels centrally, enabling and disabling debugging logs based on environment variables rather than manually commenting and uncommenting such logs.

LGTM with the above suggestions. πŸš€

generated by pr-review

@pelikhan pelikhan merged commit e3a5aa2 into main Aug 29, 2024
11 checks passed
@pelikhan pelikhan deleted the cheerio branch August 29, 2024 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant