Kamailio Rasberry Pi benchmarks

Kamailio Development Open Source

Kamailio benchmark results on Rasberry Pi platform

Kamailio on the Rasberry Pi

I always wanted to experiment a bit more with the ARM platform Rasberry Pi. The current running pre-release tests for the upcoming Kamailio 5.3.0 were a good opportunity to finally do it.

Hardware and setup details

I used a bit older Rasberry Pi 3 Model B. This model has 4 × Cortex-A53 1.2 GHz cores and 1 GB of RAM. As Linux distribution I choose the standard Debian based Rasbian stable, which is (still) providing a 32 Bit environment.

This hardware was connected to a standard switch over Ethernet network cable to the test machine. The tested Kamailio version was version 5.3.0-pre1 from git master branch, git version ccc0eb6d. Compiler was the standard gcc v8.3 on Rasbian. The operating system was used without any running desktop environment, but no other optimizations (like disabling logging services etc..) were done.

The test runs were repeated several times to see if the results could be reproduced. That said, the performance of a small embedded system like the Rasberry Pi depends a lot on the temperature of the environment. I monitored the temperature of the CPU during the tests, several times it reached over 80 °C. This means that by providing e.g. active cooling for the Rasberry Pi CPU you could achieve better results, especially for longer tests. To detect the thermal effects I did shorter (120s) and longer (300s) tests. The shorter tests were much more affected by the thermal effect as the longer tests.

For the different call tests the standard sipp tool was used. I used an existing sipp configuration from Stefan Mititelu from github link. The test rate were increased until re transmissions were observed, and then slightly decreased again to reach a stable setup with only a few re-transmissions. In the Kamailio configuration only the number of children were reduced to 4 - the number of cores.

ARM support improvements in Kamailio

During the first compilation of Kamailio on the test platform I noticed many swp{b} deprecation warnings from the compiler. These were created because of deprecated assembler instructions in the Kamailio core, used e.g. for low-level locking primitives. These primitives did also not provided support for multi-core compilation and were lacking ARM v7 support. Furthermore I noticed that the existing Kamailio build files were not prepared for newer (> v6) ARM architectures.

After several fixes and extensions all this topics should be now much improved. The make files were extended to properly detect ARM v6 - ARM v8 architectures. The current build system will compile for ARM v5, ARM v6 and ARM v7 architecture. It will fallback for ARM v8 to ARM v7 for now.

By default the current Rasbian gcc will compile for ARM v6. If you want to override this you can add -march=native in the following section in the Makefile.defs file. Then you will get ARM v7 (have a look to the output of make cfg).

# to build with native architecture, e.g. for rasberry pi: add -march=native
# armv8 not supported yet, fallback to armv7
predef_macros:=$(shell $(CC) -dM -E -x c $(CC_EXTRA_OPTS) $(extra_defs) \
                                        $(CFLAGS) /dev/null)

You can get the architecture for which Kamailio was compiled for also now with kamailio -I.

Optimization tests

I have tested several different build optimizations to see if they have a noticeable effect for my benchmarks.

  • ARM v6, old locking, standard optimization
  • ARM v6, new locking, standard optimization
  • ARM v7, new locking, standard optimization
  • ARM v7, new locking, -O2 optimization

Higher optimization or more iterations were not done, as compilation on the Rasberry Pi is much slower in higher optimization levels with gcc.

I observed a slightly better performance with the new locking code for ARM v6, and also a bit better performance in some test cases for the ARMv 7 code. But as the differences were only minor, I choose for my tests the default architecture ARM v6, new locking, standard optimization.

Test case 1 - registration server

In this test case the standard Kamailio configuration link was used to provide a registration server. The Kamailio was configured to use only in-memory storage for the registration, no authentication were done as well. This test was used to get a indication about a best-case scenario for Kamailio network throughput.

The results were pretty impressive. I was able to get about 5000 REGISTER requests per seconds for a short test, and 3000 REGISTER requests per second for the longer test. The output from the longer sipp test run is show below.

  Call-rate(length)   Port   Total-time  Total-calls  Remote-host
  6.0(3000 ms)/0.002s   2222     300.78 s       902352  192.168.188.26:5060(UDP)

  978 new calls during 0.326 s period    0 ms scheduler resolution
  9 calls (limit 27000)                  Peak was 45 calls, after 266 s
  0 Running, 99004 Paused, 1141 Woken up
  0 dead call msg (discarded)            0 out-of-call msg (discarded)        
  3 open sockets                        

                                 Messages  Retrans   Timeout   Unexpected-Msg
              [ NOP ]              
    REGISTER ---------->         902347    0         0                  
         200 <----------         902343    0         0         0        

              [ NOP ]              
------------------------------ Test Terminated --------------------------------

----------------------------- Statistics Screen ------- [1-9]: Change Screen --
  Start Time             | 2019-09-25   17:19:49:838    1569424789.838010            
  Last Reset Time        | 2019-09-25   17:24:50:298    1569425090.298358            
  Current Time           | 2019-09-25   17:24:50:624    1569425090.624483            
-------------------------+---------------------------+--------------------------
  Counter Name           | Periodic value            | Cumulative value
-------------------------+---------------------------+--------------------------
  Elapsed Time           | 00:00:00:326              | 00:05:00:786             
  Call Rate              | 3000.000 cps              | 2999.980 cps             
-------------------------+---------------------------+--------------------------
  Incoming call created  |        0                  |        0                 
  OutGoing call created  |      978                  |   902352                 
  Total Call created     |                           |   902352                 
  Current Call           |        9                  |                          
-------------------------+---------------------------+--------------------------
  Successful call        |      977                  |   902343                 
  Failed call            |        0                  |        0                 
-------------------------+---------------------------+--------------------------
  Call Length            | 00:00:00:001              | 00:00:00:001             
------------------------------ Test Terminated --------------------------------

At the peak the Rasberry was handling app. 18M MBit/s incoming and 16 MBit/s outgoing SIP traffic, which is pretty amazing.

Registration server traffic

Test case 2 - proxy server

In this test case the standard Kamailio configuration link was used to provide a proxy server. The uas and uac registered one time, and then calls were executed.

This test result shows a lower call per second result, as the executed logic in the Kamailio configuration is more complicated. Furthermore each call needs to be tracked by the tm module and the transactions stored in memory. But 300 calls per second are still an impressive number, and are something that many commercial setups of Kamailio don't see often during a day.

I tested two different scenarios, short (3s call duration) and longer (30s call duration). The results for the short call scenario were about 7% lower as for the longer call scenario. The output of the longer sipp test run is shown below.

  Call-rate(length)   Port   Total-time  Total-calls  Remote-host
  300.0(30000 ms)/1.000s   1111     301.05 s        90316  192.168.188.26:5060(UDP)

  137 new calls during 0.456 s period    0 ms scheduler resolution
  9003 calls (limit 27000)               Peak was 9012 calls, after 131 s
  1 Running, 18904 Paused, 274 Woken up
  0 dead call msg (discarded)            0 out-of-call msg (discarded)        
  3 open sockets         

                                 Messages  Retrans   Timeout   Unexpected-Msg
              [ NOP ]              

      INVITE ---------->         90316     0         0                  
         100 <----------         90315     0         0         0        
         180 <----------         64135     0         0         2        
         200 <----------         90315     3         0         0        

         ACK ---------->         90315     3                            

       Pause [    30.0s]         90315                         409      

         BYE ---------->         81315     0         0                  
         200 <----------         81313     0         0         0        

         ACK ---------->         0         0                            

----------------------------- Statistics Screen ------- [1-9]: Change Screen --
  Start Time             | 2019-09-25   18:20:02:666    1569428402.666952            
  Last Reset Time        | 2019-09-25   18:25:03:271    1569428703.271712            
  Current Time           | 2019-09-25   18:25:03:727    1569428703.727778            
-------------------------+---------------------------+--------------------------
  Counter Name           | Periodic value            | Cumulative value
-------------------------+---------------------------+--------------------------
  Elapsed Time           | 00:00:00:456              | 00:05:01:060             
  Call Rate              |  300.439 cps              |  299.993 cps             
-------------------------+---------------------------+--------------------------
  Incoming call created  |        0                  |        0                 
  OutGoing call created  |      137                  |    90316                 
  Total Call created     |                           |    90316                 
  Current Call           |     9003                  |                          
-------------------------+---------------------------+--------------------------
  Successful call        |      136                  |    81313                 
  Failed call            |        0                  |        0                 
-------------------------+---------------------------+--------------------------
  Call Length            | 00:00:30:007              | 00:00:30:008             
------------------------------ Test Terminated --------------------------------

As the peak the Rasberry was handling app. 9 MBit/s incoming and 10 MBit/s outgoing SIP traffic, which is still impressive.

Proxy server traffic

Conclusion

Overall I was impressed with the performance of Kamailio on this small and cheap hardware platform. Kamailio were running stable and processed millions of calls without any issues. I only noticed one crash (probably related to CPU over-heating) and about 10 error log messages (probably produced from the overload situation).

If you use Kamailio on another embedded ARM (>= v6) system I am interested in your feedback. After I got a new Rasberry Pi 4 I will probably try this benchmarks again.

If you are interested in Kamailio performance optimization for your platform, please contact me here.

Previous Post