Benchmark - base model
mycroft@OpenVoiceOS-e3830c:~ $ ./bench -m models/ggml-base.bin -t 4
whisper_model_load: loading model from 'models/ggml-base.bin'
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 2
whisper_model_load: mem_required = 506.00 MB
whisper_model_load: adding 1608 extra tokens
whisper_model_load: ggml ctx size = 140.60 MB
whisper_model_load: memory size = 22.83 MB
whisper_model_load: model size = 140.54 MB
system_info: n_threads = 4 / 4 | AVX2 = 0 | AVX512 = 0 | NEON = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 |
whisper_print_timings: load time = 2274.95 ms
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms
whisper_print_timings: encode time = 56033.60 ms / 9338.93 ms per layer
whisper_print_timings: decode time = 0.00 ms / 0.00 ms per layer
whisper_print_timings: total time = 58308.86 ms
If you wish, you can submit these results here:
https://github.com/ggerganov/whisper.cpp/issues/89
Please include the following information:
- CPU model
- Operating system
- Compiler
mycroft@OpenVoiceOS-e3830c:~ $ ./main -m models/ggml-base.bin -f samples/jfk.wav -t 4
whisper_model_load: loading model from 'models/ggml-base.bin'
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 2
whisper_model_load: mem_required = 506.00 MB
whisper_model_load: adding 1608 extra tokens
whisper_model_load: ggml ctx size = 140.60 MB
whisper_model_load: memory size = 22.83 MB
whisper_model_load: model size = 140.54 MB
system_info: n_threads = 4 / 4 | AVX2 = 0 | AVX512 = 0 | NEON = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 |
main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:07.600] And so my fellow Americans ask not what your country can do for you,
[00:00:07.600 --> 00:00:10.600] ask what you can do for your country.
whisper_print_timings: load time = 1338.00 ms
whisper_print_timings: mel time = 287.82 ms
whisper_print_timings: sample time = 41.32 ms
whisper_print_timings: encode time = 55883.51 ms / 9313.92 ms per layer
whisper_print_timings: decode time = 3145.44 ms / 524.24 ms per layer
whisper_print_timings: total time = 60706.79 ms
The tiny and base are the only two models that can be tested on my RPI4 with 2 GB of memory. The small model already runs out of memory.