Speech recognition

KenScalley · 21 May 2025 03:03

Does anyone have some Delphi code, ideas, etc to handle speech recognition? I went down this path about 6 months ago with no luck and I am having another go. This time, I have used ChatGPT to write some code. I obtained 5 or 6 lots of code, which either hung, didn’t do anything or wouldn’t even compile. I have tried downloading different versions of SpeechLib_TLB and also reimporting it with various sapi.dll files.

Any help appreciated
Ken

Paul_McGee · 21 May 2025 03:26

I snagged a CrossTalk licence at the recent ADUG Symposium … so thanks for an idea regarding testing its use.

I asked Google about SR in Windows apps … this was its AI-generated reply : (for the .Net case)

To use Windows Speech Recognition programmatically, you can utilize the SpeechRecognitionEngine class in .NET. This class allows you to create an in-process speech recognizer, configure its input (audio device, file, stream), define grammars for recognition, and perform recognition operations.

Here’s a breakdown of the process:

Create a SpeechRecognitionEngine:

Instantiate a SpeechRecognitionEngine object.

Configure Input:

Use methods like SetInputToDefaultAudioDevice, SetInputToWaveFile, or SetInputToAudioStream to specify the source of the speech input.

Define Grammars (Optional):

You can load grammars using the LoadGrammar method. Grammars define the rules for what the speech recognizer should recognize.

Perform Recognition:

Use the Recognize or RecognizeAsync method to start the recognition process.

Handle Results:

The recognizer returns results in a RecognitionResult object. You can access the recognized text, confidence score, and other information from this object.

Example (Conceptual):

Code
using System.Speech;
using System.Speech.Recognition;
public class Example {    public static void Main(string[] args) {        // Create a SpeechRecognitionEngine
       using (SpeechRecognitionEngine engine = new SpeechRecognitionEngine()) {            // Set input to default audio device
           engine.SetInputToDefaultAudioDevice();
           // Load a grammar (example - you'll need to define your grammar)
           // Grammar grammar = new Grammar(new GrammarBuilder("example command"));
           // engine.LoadGrammar(grammar);
           // Start recognition
           engine.RecognizeAsync();
           // Handle recognition results (example)
           engine.SpeechRecognized += (sender, e) => {
               Console.WriteLine("Recognized: " + e.Result.Text);
           };
           // Keep the application running until the user stops it
           Console.ReadKey();
       }
   }
}

You’d think maybe the “C++” example on this page might be useful … but the “C++” there looks a bit dubious.
eg It seems to have var^ (like pascal) instead of var* (?!)

Paul_McGee · 21 May 2025 03:50

Looks like TMS have (paid) components for SR
or you can involve Google : https://blogs.embarcadero.com/this-google-api-easily-adds-powerful-speech-recognition-to-apps/
And I fully stumbled on this very recent repo :
sherpa-onnx/pascal-api-examples at master · k2-fsa/sherpa-onnx · GitHub

KenScalley · 22 May 2025 01:18

Looks like TMS have (paid) components for SR

This is for a not for profit, so I would say too expensive

or you can involve Google : https://blogs.embarcadero.com/this-google-api-easily-adds-powerful-speech-recognition-to-apps/
Ok, will have a look

And I fully stumbled on this very recent repo :
sherpa-onnx/pascal-api-examples at master · k2-fsa/sherpa-onnx · GitHub

Looks familiar, think I had looked at it before but will check

Thanks
Ken

Rohit_Nz · 22 May 2025 01:55

It depends on what you want to do with it. In my application, I just needed the Vet to be able to dictate notes, so I just used Dragon. Just had to make minor tweaks to the code.

KenScalley · 22 May 2025 02:32

Quite basic … walk up to computer, say your name and my program identifies the person from a known list.

Ken

Malcolm_Cheyne · 22 May 2025 02:47

@Ken. Ha Ha I can see that working with my IDENTICAL Twin Brother and me. In the old days we used to swap the phone between us whilst talking to (even family members) and nobody was the wiser.

IMHO not a strong solution. But then it’s not my program.

Malcolm Cheyne

ianbarker · 22 May 2025 14:25

Yes, identical twins exist purely to make speech and face recognition products fail miserably.

I used to write access control software - the only biometric solution which works with twins is fingerprint recognition. With the false acceptance set to a sensible level it even thwarts twins trying to sign in for each other.

Iris scanning can also work but the devices are pretty expensive, and I always had some reservations about our ability to convince someone to stick their eye against a device (people are very fussy about their eyes) - my thought was if someone using it later said they had a headache or eyesight issues the machine would be the default thing to blame for the luddites who were trying to get out of being monitored.