Strangely I am sat infront of a mic retrying the streaming mode as I never checked out the streaming branch before so prob why results where bad.
Haven’t got a Pi4 anymore so running on a Rock5b RK3588.
Streaming has had an update and seems really good but not sure on short command sentences if streaming mode is worthwhile when latency is so short and non streaming accuracy is really really good.
I will post a bench here again just as check to see the non stream perf as with streaming mode you can only really check load as you are feeding a stream that is a bit hard to measure.
Still the old rules apply you need to get a good signal from your Mic in terms of volume which often is pretty poor without putting on a AGC and volume to max.
I can show you load no Pi4 to try on though.
No BLAS
./main -m models/ggml-tiny.en.bin -f samples/jfk.wav
whisper_model_load: loading model from 'models/ggml-tiny.en.bin'
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 1
whisper_model_load: mem_required = 390.00 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: ggml ctx size = 73.58 MB
whisper_model_load: memory size = 11.41 MB
whisper_model_load: model size = 73.54 MB
system_info: n_threads = 4 / 8 | AVX2 = 0 | AVX512 = 0 | NEON = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 |
main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:07.540] And so my fellow Americans ask not what your country can do for you
[00:00:07.540 --> 00:00:10.160] ask what you can do for your country.
[00:00:10.160 --> 00:00:30.000] You can do for your country
whisper_print_timings: load time = 305.88 ms
whisper_print_timings: mel time = 134.55 ms
whisper_print_timings: sample time = 11.85 ms
whisper_print_timings: encode time = 802.06 ms / 200.51 ms per layer
whisper_print_timings: decode time = 321.21 ms / 80.30 ms per layer
whisper_print_timings: total time = 1576.53 ms
