Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml2kvp: convert child nodes to text from target node #392

Open
ghukill opened this issue Apr 12, 2019 · 0 comments
Open

xml2kvp: convert child nodes to text from target node #392

ghukill opened this issue Apr 12, 2019 · 0 comments

Comments

@ghukill
Copy link
Contributor

ghukill commented Apr 12, 2019

While using Combine to analyze ~26k XML files of a relatively unknown structure, got the following from a naive field mapping:

Screen Shot 2019-04-12 at 8 36 43 AM

Unfortunately, this XML contains elements that only serve a presentation function, e.g. <italic>, which don't provide any semantic meaning.

It would be nice if field mappings configurations, xml2kvp, would accept some kind of configuration to ignore child elements of a targeted element. Or, better yet, take all text and child elements of a target node and convert to string.

In this example, it would be beneficial to stop at:

book_body_book-part_body_book-part_book-part-meta_abstract

and produce only raw text for all child text and elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants