Re: About Tesseract OCR.


Javi Domínguez
 

Hello again.

I have run wia-cmd-scanner.exe directly from the command line, in the Windows CMD, and it has worked perfectly.

However from the addon it doesn't work. stderr shows:

b'The system cannot find the specified path.\r\n'

I hope this helps. It's late and I can't do any more research. Tomorrow dawns very early.

Good nitht

Javi

El 05/06/2022 a las 22:24, Javi Domínguez via groups.io escribió:
Hello.


* It is a limitation of the routine to get the complete path of the file...

If you can help bettering the routine, I will be glad!

OK. I'll take a look at that.


* The scanner is recognized as WIA compatible?

what do you mean with "is recognized as WIA compatible"? Yes, the scanner is WIA compliant, other apps recognize it but the addon I don't know, it just doesn't do anything.


* What Windows version?
Windows 10 21H2 (x64) build 19044.1706


* So, I should name each thread differently...

Yes, it would be convenient.


* And, before starting another thread, verify if it is active, right?

I think so. I would wait for the current thread to finish before starting another.


Note that if you assign the new thread to self._thread, the old thread will continue to run until it finishes but you will no longer have a reference to it. You will only be able to access it via threading.enumerate().


You may need a method to kill threads that are stuck or taking too long.


Greetings


Javi


El 05/06/2022 a las 20:34, Rui Fontes escribió:
Hola Javi!


Comments in midle of your message marked with *...


Às 18:47 de 05/06/2022, Javi Domínguez escreveu:
When I try to recognize a file on the desktop two things happen:

1. If this is the first time a file is recognized, it says "file not supported" (tested with PDF and BMP file types). The same file recognized from a folder in Windows explorer works fine.

* It is a limitation of the routine to get the complete path of the file...

If you can help bettering the routine, I will be glad!



2. If another file has been recognized before, it process any file even if it is not of a supported type. In any case, supported type or not, it always shows the result of the previous recognition, not the requested file. Even after manually deleting the oc.txt and ocr-xxx.png files from the addon's images folder, it re-processes the previously requested file.

* It was an error on code... The path of last document was not cleared, so list of ocr-xxx.png file was created again...



On the other hand, recognition from scanner does not work for me. My scanner is HP Scanjet G2410, It may not be supported but the addon does not speak any message about it. The thread that processes this remains active and never terminates. If the script is executed again, another thread is launched that also remains active and so on forever. It is normal for the user, if he does not receive a response, to try to run it again, so you can end up with a lot of active threads.
* The scanner is recognized as WIA compatible?

* What Windows version?


I have the habit of naming the treads that I use to be able to debug better with threading.enumerate(). I have added the line to __init__.py
self._thread.name = "tesseractOCR"
before starting the thread to do these tests.
* So, I should name each thread differently...

* And, before starting another thread, verify if it is active, right?


finally, in terms of user experience, I think you need to give more information about what is happening. Sometimes, if the recognition takes time, the user does not know if it is working correctly or not.
* It is schedulled for next version...


Thanks!


Rui Fontes








Join nvda-addons@nvda-addons.groups.io to automatically receive all group messages.