Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Words in labels of dot and GraphML output are being merged #81

Open
mweidling opened this issue Jan 10, 2023 · 4 comments
Open

Words in labels of dot and GraphML output are being merged #81

mweidling opened this issue Jan 10, 2023 · 4 comments

Comments

@mweidling
Copy link

Dear CollateX team,

for my collation I use a tokenized JSON input as described in the documentation. A very simplified version of it looks like this:

{"witnesses":
[
    {
        "id": "file_1",
        "tokens": [
            {
                "t": "Different",
                "id": "asdf1"
            },
            {
                "t": "beginning.",
                "id": "asdf2"
            }
        ]
    },
    {
        "id": "file_2",
        "tokens": [
            {
                "t": "An",
                "id": "asdfb1"
            },
            {
                "t": "example",
                "id": "asdfb2"
            },
            {
                "t": "text",
                "id": "asdfb3"
            }
        ]
    },
    {
        "id": "file_3",
        "tokens": [
            {
                "t": "Example",
                "id": "asdfbc2"
            },
            {
                "t": "text",
                "id": "asdfbc3"
            }
        ]
    }
]
}

When using CollateX to create a dot or a GraphML file based on this input, the labels lack white spaces between the tokens as can be seen in the following SVG:

example

I would have expected the words to be separated. AFAICT this issue doesn't occur for plain text or untokenized JSON input, though. Would it be possible to fix it?

Many thanks in advance!
Best,
Michelle

@djbpitt
Copy link
Collaborator

djbpitt commented Jan 10, 2023

@mweidling The simplest fix might be to append a space character after each token in the t property values when you create the tokenized JSON input. I think CollateX just string-joins the t properties when it renders a node that contains multiple tokens, so if the space characters is in the input, it should emerge in the output. You'll have a spurious space character at the end of the merged string, but I suspect that the effect won't be noticeable. Does this sound like a useful way forward?

@mweidling
Copy link
Author

@djbpitt Thank you for your swift response! I think your assumption is right, and your suggestion yields the output I expected. Thank you for this workaround!

(I wonder, however, if any updates on CollateX are planned since the last push to master has been a while ago?)

@rhdekker
Copy link
Member

@mweidling A new version of CollateX is in development, with a new collation algorithm. It will be a while before it is made public. It has to be able to do what the current release does and surpass it first.

@mweidling
Copy link
Author

@rhdekker I'm glad to hear that and will eagerly await the new release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants