Testing FreeSWITCH for memory leaks using Valgrind and SIPp

In this tutorial we will assume you have already setup SIPp on another server to stress test FreeSWITCH. We covered this in our previous article

We will need to install valgrind on the same machine running FreeSWITCH. I like using Debian for FreeSWITCH, which conveniently already has valgrind in the repository.

apt install valgrind

Now that valgrind is installed, we will need to make sure FreeSWITCH is not already running.

systemctl stop freeswitch
ps -ef | grep freeswitch

Once we stop FreeSWITCH, we want to start up FreeSWITCH using valgrind. Below is the command to do so. This will log all memory issues found by Valgrind to vg.log in the directory you run the command from.

/usr/bin/valgrind.bin --tool=memcheck --error-limit=no --log-file=vg.log --leak-check=full --leak-resolution=high --show-reachable=yes /usr/bin/freeswitch -vg -ncwait -nonat

Running the FreeSWITCH from the stable repository will not result in many errors. Below is my the end of my vg.log while running FreeSWITCH 1.6.12-20-b91a0a6~64bit

==20986== LEAK SUMMARY:
==20986==    definitely lost: 0 bytes in 0 blocks
==20986==    indirectly lost: 0 bytes in 0 blocks
==20986==      possibly lost: 16,384 bytes in 2 blocks
==20986==    still reachable: 192 bytes in 1 blocks
==20986==         suppressed: 0 bytes in 0 blocks
==20986==
==20986== For counts of detected and suppressed errors, rerun with: -v
==20986== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

There are no new errors after the leak summary in this log.

When the FreeSWITCH is having memory management problems, valgrind will continue to log errors after the LEAK SUMMARY. Below is an example of a memory leak when accessing the libcrypto.so library for WebRTC.

143768 ==32354== Use of uninitialised value of size 8
143769 ==32354==    at 0x67C8B80: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
143770 ==32354==    by 0xF684D2C1E662C6C4: ???
143771 ==32354==    by 0xE16640B5E123D1DD: ???
143772 ==32354==    by 0xD68DF16DAF4386F1: ???
143773 ==32354==    by 0xB2B603F64080C703: ???
143774 ==32354==    by 0x398A1FCC7DC2FF3B: ???
143775 ==32354==    by 0xBE60B2707D98773F: ???
143776 ==32354==    by 0x375392F355F20B45: ???
143777 ==32354==    by 0x124A183B188AA4EA: ???
143778 ==32354==    by 0x5BFE669681E01F8: ???
143779 ==32354==    by 0x6A4C9E9A4AA5A4C0: ???
143780 ==32354==    by 0x29B07C509D28C49F: ???
143781 ==32354==

In this case, valgrind was logging this message to the log many times per second. You can view the memory mismanagement manifesting itself in a memory leak by viewing the allocated memory with ps. We will want to monitor the change in RSS, or Resident Set Size, which represents the ammount of bytes are allocated for a process in RAM.

ps -aux | grep '[v]algrind.bin'

On Debian 8, ps -aux will list the information in the following order:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

The output of your ps -aux command should look something like this

root     20987 36.7 13.1 631032 271356 ?       S<sl 13:33  30:47 /usr/bin/valgrind.bin --tool=memcheck --error-limit=no --log-file=vg.log --leak-check=full --leak-resolution=high --show-reachable=yes /usr/bin/freeswitch -vg -ncwait -nonat

Knowing the current total allocated memory for FreeSWITCH is cool, but not very useful for seeing the memory leak in action. If FreeSWITCH has a memory leak, it will fail to release memory when it is done with it, causing the allocated space to continue to grow. A useful way to observe this is by monitoring the RSS of the FreeSWITCH process over time, then running SIPp to create an artificial load on FreeSWTICH I created a simple bash script to log FreeSWITCH’s RSS, the current time, and the number of sip channels to a file.

#!/bin/sh


while true; do

INT=60
FILE=/root/mem.log
DATE=$(date +"%F-%R")
MEM=$(ps -aux | grep '[v]algrind.bin' | awk -F ' ' '{print $6}')
CHANNELS=$(/usr/local/freeswitch/bin/fs_cli -x "show channels" | grep total | awk -F ' ' '{print $1}')

echo ${DATE},${MEM},${CHANNELS} >> ${FILE}
sleep ${INT}s
done

This script will log the time, RSS, and SIP channels to /root/mem.log. You may need to adjust the CHANNELS variable to point to wherever the fs_cli binary is stored.

Save the script to a file, such as memtest.sh, then run it with

./memtest.sh &

You can confirm it is running in the background with jobs.

Now, with FreeSWITCH running under valgrind, we will want to start up the SIPp stress test. On the SIPp server, run the following.

sipp -sf uac.xml -s 9999999  <freeswitch-ip>:5080 -trace_msg -l 15 -d 60000

This test is best done over a long period of time. I would let SIPp generate traffic on the server for a couple hours while our monitor script is running. After the system has ran under load for some time, the mem.log should look soemthing like this.

2016-10-24-15:29,271212,15
2016-10-24-15:29,270800,15
2016-10-24-15:30,270672,15
2016-10-24-15:31,270828,15
2016-10-24-15:32,270708,15
2016-10-24-15:33,270604,15
2016-10-24-15:34,270812,15
2016-10-24-15:35,270628,15
2016-10-24-15:36,270816,15

If FreeSWITCH has been running for multiple hours and the RSS has seemed to plateau, like above, FreeSWITCH is likely not showing any critical memory mismanagements. Here is an example of the RSS continuing to grow due to a memory leak.

2016-08-11-04:09,379192,60
2016-08-11-04:10,381584,60
2016-08-11-04:11,383104,60
2016-08-11-04:12,385096,60
2016-08-11-04:13,385356,60
2016-08-11-04:14,387580,60
2016-08-11-04:16,387432,60
2016-08-11-04:17,389316,60
2016-08-11-04:18,391144,60
2016-08-11-04:19,393076,60
2016-08-11-04:20,394484,61
2016-08-11-04:21,395776,60
2016-08-11-04:22,397072,60
2016-08-11-04:23,398140,59
2016-08-11-04:24,400112,60
2016-08-11-04:25,401868,60
2016-08-11-04:26,403348,60

FreeSWITCH had been running for many hours at this point, and the RSS had never stopped growing. This is the same system which was logging the libcrypto error in the valgrind log.

When you notice a memory issue with your version of FreeSWTICH, check the FreeSWTICH JIRA for known bugs. There may already be a patch out for the issue you are experiencing.