Image Captioning Add-on!

Shubham Jain

Hello Everyone!

I am very excited to announce a pre-release version of my Image Captioning add-on! You can download it here:
This add-on allows users to perform image captioning on image elements present on their screen and get a caption that describes the image in English. The result is announced to the user and also presented in a virtual window that allows users to access the result character-by-character, word-by-word, as a whole and even copy the result.
Detection can be triggered by pressing Alt+NVDA+C or Alt+NVDA+C+C+...
The former only performs detection if the navigator object currently in focus has the role ROLE_GRAPHIC. This prevents non-visual users from waiting for bad results after mistakenly starting a captioning process on non-image elements. Low-vision users or otherwise can press Alt+NVDA+C+C+.. to perform captioning on any element without filtering out non ROLE_GRAPHIC roles.
The result is announced as soon as it is available. This announcement is then followed by RESULT_DOCUMENT which indicates that focus has been shifted to a "virtual result window". Users can then use arrow-key navigation to access the result character-by-character, word-by-word or as a whole. Pressing ESC or changing focus to another element on the screen escapes the "virtual result window".
The result can be re-accessed by pressing Alt+NVDA+R.

As is the case with most open-source image captioning models available, the results produced can be wrong at times. The model can also produce different results for the same image at different sizes. For images in which objects could not be easily identified, the model takes quite some time to produce any results. In some cases, it may be slow the first time it is triggered.

I would be very grateful if you could test my add-on and share your feedback with me. If you have any issues with the add-on or would like to request any changes feel free to reach out to me or create an issue at the Github repository:

