hm this does not help me (or I do not understand it). Yes I already saw that the language can be passed as parameter on the command line. But how does this help me? I am not starting it directly. The app or solr is calling it.
all answers until now just refer to the installation and availability of the different language files. But just being available does not yet mean that they the different languages are used/called by the OCR engine.
The OCR engine is only using the language that is specified while calling the OCR. the OCR engine does not automatically select the correct one.
I now understood the explanation on github regqarding the *.properties file
I have to extract (unzip) the original JAR file (/opt/solr/contrib/extraction/lib/tika-parsers-1.13.jar), modify the “TesseractOCRConfig.properties” file, zip the JAR again and then replace the original with the modified JAR.
Based on that file the language that is used for the OCR is specified. This is not a dynamic process. All documents that are processed by nextant are run through OCR with the language that is specified within the properties file that is located within the JAR file.