Re: About Tesseract OCR.


Rui Fontes
 

Hola Javi!


Firstly, thanks by your PR!


But, I would like to better it a litle more...

The case is:

When I have set this new laptop and installed Dropbox, I have selected to backup the desktop folder, and so my desktop full path is:

c:\Users\userName\dropbox\ambiente de trabalho

"Ambiente de trabalho" is the portuguese expression for "Desktop"...


So, I have tried to use shlobj.getKnownFolderId to get the correct desktop path for all cases...

In spite of modifying the shlobj.ini and importing it in the add-on, I didn't manage to do it...


I have added in shlobj.py, on the

class FolderId(str, Enum):

the following lines:

    #Desktop folder
    # The typical path is "C:\Users\Utilizador\appData\desktop
    APP_DATA_DESKTOP = "{B2C5E279-7ADD-439F-B28C-C41FE1BBF672}"

the string between quotes was found at:

https://docs.microsoft.com/en-us/windows/win32/shell/knownfolderid"""


Someone can help on this?


Best regards,

Rui Fontes
NVDA portuguese team



Às 17:59 de 11/06/2022, Javi Domínguez escreveu:

Hello.


You have a pull request that fixes both issues.


https://github.com/ruifontes/tesseractOCR/pull/2


Greetings


Javi Dominguez



El 05/06/2022 a las 23:01, Javi Domínguez via groups.io escribió:
Hello again.

I have run wia-cmd-scanner.exe directly from the command line, in the Windows CMD, and it has worked perfectly.

However from the addon it doesn't work. stderr shows:

b'The system cannot find the specified path.\r\n'

I hope this helps. It's late and I can't do any more research. Tomorrow dawns very early.

Good nitht

Javi

El 05/06/2022 a las 22:24, Javi Domínguez via groups.io escribió:
Hello.


* It is a limitation of the routine to get the complete path of the file...

If you can help bettering the routine, I will be glad!

OK. I'll take a look at that.


* The scanner is recognized as WIA compatible?

what do you mean with "is recognized as WIA compatible"? Yes, the scanner is WIA compliant, other apps recognize it but the addon I don't know, it just doesn't do anything.


* What Windows version?
Windows 10 21H2 (x64) build 19044.1706


* So, I should name each thread differently...

Yes, it would be convenient.


* And, before starting another thread, verify if it is active, right?

I think so. I would wait for the current thread to finish before starting another.


Note that if you assign the new thread to self._thread, the old thread will continue to run until it finishes but you will no longer have a reference to it. You will only be able to access it via threading.enumerate().


You may need a method to kill threads that are stuck or taking too long.


Greetings


Javi


El 05/06/2022 a las 20:34, Rui Fontes escribió:
Hola Javi!


Comments in midle of your message marked with *...


Às 18:47 de 05/06/2022, Javi Domínguez escreveu:
When I try to recognize a file on the desktop two things happen:

1. If this is the first time a file is recognized, it says "file not supported" (tested with PDF and BMP file types). The same file recognized from a folder in Windows explorer works fine.

* It is a limitation of the routine to get the complete path of the file...

If you can help bettering the routine, I will be glad!



2. If another file has been recognized before, it process any file even if it is not of a supported type. In any case, supported type or not, it always shows the result of the previous recognition, not the requested file. Even after manually deleting the oc.txt and ocr-xxx.png files from the addon's images folder, it re-processes the previously requested file.

* It was an error on code... The path of last document was not cleared, so list of ocr-xxx.png file was created again...



On the other hand, recognition from scanner does not work for me. My scanner is HP Scanjet G2410, It may not be supported but the addon does not speak any message about it. The thread that processes this remains active and never terminates. If the script is executed again, another thread is launched that also remains active and so on forever. It is normal for the user, if he does not receive a response, to try to run it again, so you can end up with a lot of active threads.
* The scanner is recognized as WIA compatible?

* What Windows version?


I have the habit of naming the treads that I use to be able to debug better with threading.enumerate(). I have added the line to __init__.py
self._thread.name = "tesseractOCR"
before starting the thread to do these tests.
* So, I should name each thread differently...

* And, before starting another thread, verify if it is active, right?


finally, in terms of user experience, I think you need to give more information about what is happening. Sometimes, if the recognition takes time, the user does not know if it is working correctly or not.
* It is schedulled for next version...


Thanks!


Rui Fontes














Join nvda-addons@nvda-addons.groups.io to automatically receive all group messages.