Convert .DOC to .DOCX in NextCloud

NC 28.0.4 on Ubuntu 22.0.4LTS - OnlyOffice-documentserver-ee 8.0.1-31 on Ubuntu 22.04LTS (separate server).

System is working fine on local network.

I’m doing a personal project for an NPO that includes archiving ~400 .DOC documents created back in the early '00’s.

Since OnlyOffice treats .DOC documents differently than .DOCX, I’d like to convert them all to .DOCX format. The documents use a specific Cyrillic TTF font that is handled properly in OO(Collabra does not for some reason).
All that said, I’d like to batch process the existing files and setup a workFLOW to handle new ones.

Can someone give me a starting point if I were to do this “in” the NextCloud/OnlyOffice environment?
Or, let me know if I should pursue this offline, as it were.
Thanks

I would do this offline, it would be a more efficient option tbh. Using pandoc the find command in a bash script would probably be your best bet. It should be able to ingest .doc files and output to .docx or .odt

That said you could use the workflow_script app to add the capability to Nextcloud to automagically convert .doc files when you tag/upload/whatever you want using pandoc installed on your server

1 Like

Pandoc does not accept .DOC format for the input.

That’s why I’m looking to use OnlyOffice. It seems to handle .DOC better than others.

I swore they had .doc support, but apparently not, sorry about that.

In that case you can use libreoffice via cli to headlessly convert the files using the soffice command. Also I’ve read you may need to add the --headless as it’s not really documented on this page:

https://help.libreoffice.org/latest/en-US/text/shared/guide/convertfilters.html