Re: Image Captioning Add-on!

Noelia Ruiz
 

Hi, as always, many thanks for this interesting project. I may create
issues on GitHub in a more advanced stage of the add-on, when easy
problems like conflict with known commands are fixed. For now I prefer
to provide feedback here since issues may be more useful for reporting
problems which require more investigation, unless you prefer a
different approach to receive feedback:

- The link provided has shown me a version where NVDA+alt+d is used,
not NVDA+alt+c. May be the previous one or something, not sure.
-. I suggest you not to use NVDA+alt+c, since this is used for
reporting comments in Excel and Work (you may see NVDA quick reference
about commands or the user guide.
- I have cloned your repo and build the add-on my self, and now NVDA+c
is working as described. Anyway, for now I havent been able to get any
recognized image.
- Sometimes, when pressing g in browse mode and then NVDA+alt+c, the
add-on announces that this is not an image, but using the object
navigator and placing it inside (in the first child), NVDA detects the
graphic as such, thoug recognition fails.
- How can be images enlarged if possible using the add-on? Sometimes
it announces that the image is too small.
I have tried, among other places, at
https://www.freepik.com/free-photos-vectors/graphics

Kind regards

2020-08-01 18:39 GMT+02:00, Shubham Jain <@ShubhamJain>:

Hello Everyone!

I am very excited to announce a pre-release version of my *Image Captioning*
add-on! You can download it here:
https://github.com/ShubhamJain7/imageCaptioning-NVDA-Addon/releases/tag/v0.1-alpha
This add-on allows users to perform image captioning on image elements
present on their screen and get a caption that describes the image in
English. The result is announced to the user and also presented in a virtual
window that allows users to access the result character-by-character,
word-by-word, as a whole and even copy the result.
Detection can be triggered by pressing Alt+NVDA+C or Alt+NVDA+C+C+...
The former only performs detection if the navigator object currently in
focus has the role ROLE_GRAPHIC. This prevents non-visual users from waiting
for bad results after mistakenly starting a captioning process on non-image
elements. Low-vision users or otherwise can press Alt+NVDA+C+C+.. to perform
captioning on any element without filtering out non ROLE_GRAPHIC roles.
The result is announced as soon as it is available. This announcement is
then followed by RESULT_DOCUMENT which indicates that focus has been shifted
to a "virtual result window". Users can then use arrow-key navigation to
access the result character-by-character, word-by-word or as a whole.
Pressing ESC or changing focus to another element on the screen escapes the
"virtual result window".
The result can be re-accessed by pressing Alt+NVDA+R.

As is the case with most open-source image captioning models available, the
results produced can be wrong at times. The model can also produce different
results for the same image at different sizes. For images in which objects
could not be easily identified, the model takes quite some time to produce
any results. In some cases, it may be slow the first time it is triggered.

I would be very grateful if you could test my add-on and share your feedback
with me. If you have any issues with the add-on or would like to request any
changes feel free to reach out to me or create an issue at the Github
repository: https://github.com/ShubhamJain7/imageCaptioning-NVDA-Addon/



Join nvda-addons@nvda-addons.groups.io to automatically receive all group messages.