Using Workflow External script to generate searchable PDF

Hi all,

I am using Nextcloud to store scanned PDF invoices. The paper invoice is scanned with a multifunctional printer as a PDF and stored in a Nextcloud directory that is shared for the printer via Samba. A cronjob executes an occ scan_files command every minute on the particular directory.

In order to make the PDF searchable I would like to setup a Workflow with an external script.

The flow is

When File created
And File name matches /..pdf$/i*

Run script

/usr/bin/ocrmypdf -l deu %n %n

that adds a text layer to the PDF. Unfortunately the script is never executed.

Even a flow

When File created
And File name matches /./i*

Run script

/usr/bin/touch /tmp/test.txt

is never executed.

Any hints how flows can be debugged? Is see there is a flow.log file in the /var/www/nextcloud/data directory but it is always empty.

Thanks for you help

  1. Create a helper batch script to execute the desired commands.

  2. Add the following two lines of code at the beginning of the script to trace all output:

    #!/bin/sh
    exec 2> /tmp/nc-flow-trace$$.log
    set -x
    ...
    
  3. Please keep in mind that the “%n” variable only contains a relative file path from the Nextcloud data directory. You need to prefix it with the absolute Nextcloud data path to get a full file path.

  4. It shouldn’t be necessary to parse the source file name (%n) twice to the script.

  5. Parsing the source file name (%n) twice to the command to be executed is most likely also not the right way to go. Better is to rename the original file to e.g. “.pdf-tmp-$$” first and then execute the command on this file, so that you can use the original file name as destination file. Make sure that you delete the unrequired “.pdf-tmp-$$” file afterwards.

BTW, scanning the directory every minute isn’t necessary at all, if you execute the occ files:scan ... from the batch script on the file itself, right after the ocr command has been finished :wink:

1 Like

Thanks for the hint. With the approach you proposed finally I found out that the script is not triggered if a file is created outside of Nextcloud.

When the PDF file is created using Nextcloud UI by .i.e. copying a PDF the script execution is triggered. When it is created by copying a PDF it into a Nextcloud directly from the command line the script is not triggered.

Even forcing it with sudo -u www-data php -f /var/www/nextcloud/cron.php does not help. There is a github issue for that: https://github.com/nextcloud/workflow_script/issues/65

This issue somehow limits the usefullness of workflow.

1 Like

I think it works as designed at the moment. Copying files to a Nextcloud controlled directory without using the Nextcloud GUI makes it difficult for the system to react on events which are not triggered at all.
Nevertheless the mentioned feature request might be the right way to get additional events added to the system/app so that newly scanned files are processed too.

1 Like