Words in labels of dot and GraphML output are being merged #81

mweidling · 2023-01-10T19:39:11Z

Dear CollateX team,

for my collation I use a tokenized JSON input as described in the documentation. A very simplified version of it looks like this:

{"witnesses":
[
    {
        "id": "file_1",
        "tokens": [
            {
                "t": "Different",
                "id": "asdf1"
            },
            {
                "t": "beginning.",
                "id": "asdf2"
            }
        ]
    },
    {
        "id": "file_2",
        "tokens": [
            {
                "t": "An",
                "id": "asdfb1"
            },
            {
                "t": "example",
                "id": "asdfb2"
            },
            {
                "t": "text",
                "id": "asdfb3"
            }
        ]
    },
    {
        "id": "file_3",
        "tokens": [
            {
                "t": "Example",
                "id": "asdfbc2"
            },
            {
                "t": "text",
                "id": "asdfbc3"
            }
        ]
    }
]
}

When using CollateX to create a dot or a GraphML file based on this input, the labels lack white spaces between the tokens as can be seen in the following SVG:

I would have expected the words to be separated. AFAICT this issue doesn't occur for plain text or untokenized JSON input, though. Would it be possible to fix it?

Many thanks in advance!
Best,
Michelle

djbpitt · 2023-01-10T22:53:45Z

@mweidling The simplest fix might be to append a space character after each token in the t property values when you create the tokenized JSON input. I think CollateX just string-joins the t properties when it renders a node that contains multiple tokens, so if the space characters is in the input, it should emerge in the output. You'll have a spurious space character at the end of the merged string, but I suspect that the effect won't be noticeable. Does this sound like a useful way forward?

mweidling · 2023-01-11T07:16:20Z

@djbpitt Thank you for your swift response! I think your assumption is right, and your suggestion yields the output I expected. Thank you for this workaround!

(I wonder, however, if any updates on CollateX are planned since the last push to master has been a while ago?)

rhdekker · 2023-01-11T15:59:33Z

@mweidling A new version of CollateX is in development, with a new collation algorithm. It will be a while before it is made public. It has to be able to do what the current release does and surpass it first.

mweidling · 2023-01-11T19:18:31Z

@rhdekker I'm glad to hear that and will eagerly await the new release!

rerouj mentioned this issue Mar 3, 2023

dependencies update #82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Words in labels of dot and GraphML output are being merged #81

Words in labels of dot and GraphML output are being merged #81

mweidling commented Jan 10, 2023

djbpitt commented Jan 10, 2023

mweidling commented Jan 11, 2023

rhdekker commented Jan 11, 2023

mweidling commented Jan 11, 2023

Words in labels of dot and GraphML output are being merged #81

Words in labels of dot and GraphML output are being merged #81

Comments

mweidling commented Jan 10, 2023

djbpitt commented Jan 10, 2023

mweidling commented Jan 11, 2023

rhdekker commented Jan 11, 2023

mweidling commented Jan 11, 2023