I don't have a licence of vocalizer to test it. But I had a similar
issue with my driver, IBMTTS.
I don't know if this can help. But in my driver, this happens if I try
to send accurate indexes. My driver sends chunks of audio data of a
specific size. E.G. 3000 samples per chunk. But if the driver receive
an index notification from the synth, I need to send a smaller chunk,
the index notification to NVDA, and the rest of the received audio
send a very small part of audio to the player.feed cause strange
things with specific words that depends on the language used. It
usually happends at the end of the read strings.
I was unable to fix it, so I decided to loss a little accuracy and
send bigger chuncks of data to player.feed method.
It doesn't affect the NVDA's functionality, buth difficults the
compatibility with some add-ons that requires an accurate index