Display (sanitized) HTML when viewing documents #1020

j-f1 · 2018-09-25T18:40:03Z

j-f1
Sep 25, 2018

Some documents use HTML formatting to make themselves easier to read (example). However, this HTML is currently displayed as either the source (on the document viewing page) or with the tags stripped (in the review page). This makes long documents with multiple paragraphs or containing lists difficult to read. Would it be possible to render some (sanitized perhaps?) HTML in these views so documents can be read more easily.

Vinnl · 2018-09-25T19:27:36Z

Vinnl
Sep 25, 2018
Maintainer

Hi @j-f1, this is something we have looked at and very much would have liked to implement. However, to be able to track whether selected excerpts are still present when the terms change, we need to include the original HTML when we're saving the excerpt. (You don't see the tags, but they're actually present in your selection when you annotate an excerpt.)

I think we can leave this open since there could theoretically be a way to do this (perhaps always strip all tags and normalise whitespace when checking whether an excerpt is present in newer versions of the terms?), but at this point in time, I'm afraid we don't currently have the capacity to implement this, or it might not even be possible at all.

That said, if anyone who can code would like to take a shot at this, I'd be happy to guide them and answer questions - just let me know :)

0 replies

poperigby · 2019-09-08T02:25:50Z

poperigby
Sep 8, 2019

Why can't you render the HTML normally?

0 replies

Vinnl · 2019-09-08T08:35:26Z

Vinnl
Sep 8, 2019
Maintainer

Hi @poperigby. As an example, let's say the document contains the following HTML:

<p>Some terms</p>

Now, if we were to render the HTML, it would look like this:

Some terms

Then let's say you want to highlight those words. When you select them, the browser tells us you have selected characters 0-10 - but in the source document, it would be characters 3-13!

We've tried working around this a bit by counting how many characters were stripped out, and calculating the offset, but there were a lot of edge cases there: for example, white space does not get rendered, it's hard to detect how much of the HTML we'd have to strip out to have it sanitised, etc.

It might be possible, but we didn't manage to complete it in a reasonable timeframe without bugs. But as said, if someone wants to give it a stab, happy to answer questions :)

0 replies

poperigby · 2019-09-08T16:16:06Z

poperigby
Sep 8, 2019

I get it. Thanks for the explanation.

0 replies

JCBerger · 2019-12-28T03:56:06Z

JCBerger
Dec 28, 2019

So maybe I'm missing something (highly likely), but instead of trying to count the characters (with white space removed or otherwise stripped out), why not keep all of the characters, but just present the pages (to the contributors) with the HTML tags active?

0 replies

Vinnl · 2019-12-28T10:43:21Z

Vinnl
Dec 28, 2019
Maintainer

@JCBerger If I understand you correctly, your suggestion is to produce something like this:

Some <b>&lt;b&gt;bolded&gt;</b> text

Which would then show up as

Some <b>bolded</b> text

Right?

I think that might be possible as well, although one potential problem there is that whitespace in HTML tags gets compressed to a single space character, so inserting extra tags might insert extra whitespace. This is extra clear when it comes to e.g. bullet lists, where I think the browser adds additional unpredictable whitespace to a selection.

Again, the main problem with that solution would be that we don't have the resources to implement that, but we'd be happy with anyone giving it a shot :)

0 replies

JCBerger · 2019-12-30T20:16:48Z

JCBerger
Dec 30, 2019

Oh. I think I probably over-simplified it into thinking that it could just be rendered, as if the tags were active (making the view for the end-user more "pretty"), and leaving the the backend to view view the whole document with the tags being inert. But I guess the problem with that would be converting the selections that a user would mark (in order to associate the different categories of legal clauses) with the "inert" tags, in order to be able to track changes, correct?

I think that explains why my idea wouldn't work, correct?

0 replies

Vinnl · 2019-12-30T20:27:11Z

Vinnl
Dec 30, 2019
Maintainer

Yes, I think that covers it exactly. (With the reservation that I haven't got the specifics of this issue on top of mind, so there is some likelihood that I've forgotten something.)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Display (sanitized) HTML when viewing documents #1020

{{title}}

Replies: 8 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Display (sanitized) HTML when viewing documents #1020

j-f1 Sep 25, 2018

Replies: 8 comments

Vinnl Sep 25, 2018 Maintainer

poperigby Sep 8, 2019

Vinnl Sep 8, 2019 Maintainer

poperigby Sep 8, 2019

JCBerger Dec 28, 2019

Vinnl Dec 28, 2019 Maintainer

JCBerger Dec 30, 2019

Vinnl Dec 30, 2019 Maintainer

j-f1
Sep 25, 2018

Vinnl
Sep 25, 2018
Maintainer

poperigby
Sep 8, 2019

Vinnl
Sep 8, 2019
Maintainer

poperigby
Sep 8, 2019

JCBerger
Dec 28, 2019

Vinnl
Dec 28, 2019
Maintainer

JCBerger
Dec 30, 2019

Vinnl
Dec 30, 2019
Maintainer