Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Untidy tags produce invalid Markdown #99

Open
fauno opened this issue Jul 16, 2021 · 1 comment
Open

Untidy tags produce invalid Markdown #99

fauno opened this issue Jul 16, 2021 · 1 comment

Comments

@fauno
Copy link

fauno commented Jul 16, 2021

I may work on a patch when I have the time, but I'll leave this here for discussion :)

I'm converting a large site with user-edited HTML content through a WYSIWYG editor, so I'm finding many cases where stuff like this <em>word</em> is actually <em>wo</em><em>rd</em>, which this gems converts into _wo__rd_. I think it'd be good to add an option to sanitize HTML by removing empty tags and tidying a bit before, but I'm not sure if it'd be a task for reverse markdown. Maybe passing something who can do the cleanup, like Loofah?

@fauno fauno changed the title Empty tags produce invalid Markdown Untidy tags produce invalid Markdown Jul 16, 2021
@fauno
Copy link
Author

fauno commented Jul 30, 2021

FWIW, I ended up cleaning the HTML with this

html_string.gsub(%r{</(strong|em|i|b)>(\s*)<\1>}, '\\2')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant