Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate that this module still work and make sure we can replicate the process #55

Closed
maxime-rainville opened this issue Aug 9, 2023 · 6 comments

Comments

@maxime-rainville
Copy link

maxime-rainville commented Aug 9, 2023

Turns out we don't necessarily have a great process for knowing that this module works well. It looks like it doesn't quite work on SCPS.

Acceptance criteria

  • The module is functional on SCPS.
  • We have clear regression test scenario for the module and those test are documented in our test suite.
  • We've validated that the regression test scenario are passing right now.
  • We have some instruction for getting a dev environment working for people who don't have docvert-python3 running locally with some sort of mock ServiceConnector.

Notes

  • We may end up implementing an alternative ServiceConnector connector with phpoffice/phpword. So maybe that becomes the mock ServiceConnector.
  • Please keep @StephenMakrogianni appraise of this.
@emteknetnz
Copy link
Member

emteknetnz commented Aug 31, 2023

First impressions aren't great. Rather than testing directly on SCPS I tried getting docvert running locally in a ubuntu 22 docker container with python 3.10. If I cannot get this running locally then the long term future of using docvert doesn't look great since it would rely on running on ancient versions of things.

docvert was developed 9 years ago, when python 3.3 was the current version, rather than 3.10

It has python3-imaging listed as a requirement, which no longer seems to be a thing? It cannot be installed via apt or pip. I think that it was superseded by pillow

I tried the following inside my docker container to get requirements sorted:

ssh in as root

apt update
apt install -y librsvg2-2
apt install -y pdf2svg
apt install -y libreoffice
apt install -y python3-pip

ssh in as regular www-root user

python3 -m pip install --upgrade pip
python3 -m pip install uno
python3 -m pip install lxml
python3 -m pip install pillow
/usr/bin/soffice --headless --norestore --nologo --norestore --nofirststartwizard --accept="socket,port=2002;urp;"

In another terminal ssh in as www-root

cd docvert-python3
python3 ./docvert-web.py

I get the following error

Error: Unable to find Bottle libraries in ['/var/www/docvert-python3/lib/bottle', '/var/www/docvert-python3', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '/home/www-data/.local/lib/python3.10/site-packages', '/usr/local/lib/python3.10/dist-packages', '/usr/lib/python3/dist-packages']. Exiting...

bottle has been included in docvert as a thirdparty library in lib/bottle - however it's not importing for whatever reason

Trying to include it via pip:

python3 -m pip install bottle
python3 ./docvert-web.py

now yields

/var/www/docvert-python3/core/docvert.py:126: DeprecationWarning: invalid escape sequence '\w'
  pretty_xml = re.sub("&?\w+;", to_ncr, pretty_xml)
/var/www/docvert-python3/core/docvert.py:127: DeprecationWarning: invalid escape sequence '\w'
  pretty_xml = re.sub('&(\w+);', '&\\1', pretty_xml)
<frozen importlib._bootstrap>:914: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
/var/www/docvert-python3/core/docvert_xml.py:51: DeprecationWarning: invalid escape sequence '\?'
  return re.sub('<\?.*?\?>','<?xml version="1.0"?>', xml_string)
Error: Unable to find Python UNO libraries in ['/var/www/docvert-python3', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '/home/www-data/.local/lib/python3.10/site-packages', '/usr/local/lib/python3.10/dist-packages', '/usr/lib/python3/dist-packages', '/opt/libreoffice/program/', '/usr/lib/libreoffice/program/', '/usr/share/libreoffice/program/', '/usr/lib/openoffice.org/program/', '/usr/lib/openoffice.org2.0/program/'].
Are Python UNO libraries somewhere else?
Alternatively, Docvert is currently running Python 3.10.12. Maybe the libraries are available under a different version of Python?)
Exiting...

I think docvert is probably past its used by date ...

@maxime-rainville
Copy link
Author

maxime-rainville commented Sep 13, 2023

Can we at least validate what the thing does on SCPS? I think @StephenMakrogianni managed to get it working after a bit of screaming.

I'm all for killing the dependency on docvert and replacing it with phpoffice/phpword. But I would like to make sure we understand what docvert is meant to be doing:

  • Does it just extract the text?
  • Does it import styles or links as well?
  • What about images?

Once we know what the expected behaviour is, that will give us a better understanding of how difficult or easy it will be to replicate the feature with a different lib.

@emteknetnz
Copy link
Member

emteknetnz commented Sep 13, 2023

Well yeah it works, sort of. It's pretty rough.

  • After uploading a doc it hangs with the progress bar at 100% in firefox. Refreshing page shows loaded content.
  • Images don't work, they just render as empty boxes. Also they display as inline with text wrapping around in a really ugly way, whereas in my .doc file they're more like "blocks",
  • Inline styles didn't work
  • Headings worked
  • Tables worked

So, probably a handful of .doc features work and lots don't is probably a fair assessment. Presumably switching to a different library would get a larger set of things working

I'd say just start again with the new lib

@maxime-rainville
Copy link
Author

I created two other cards:

The last two ACs still seems relevant:

  • We have clear regression test scenario for the module and those test are documented in our test suite.
  • We've validated that the regression test scenario are passing right now.

We can either put this card back into the ready column and pick it back once #58 is done. Or we can close this card and create a new one with the ACs to update cucumber studio.

@emteknetnz
Copy link
Member

We can either put this card back into the ready column and pick it back once #58 is done

This makes sense

@emteknetnz emteknetnz removed their assignment Sep 14, 2023
@maxime-rainville
Copy link
Author

We concluded that getting this working off CWP is too low a priority to justify working on this any further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants