Dwarfguard Performance

Dwarfguard is designed to operate as a high-performance single-node application. For versions 0.x, the device namespace for a single deployment is limited to roughly 65000 devices, not allowing more devices than this limit. With the first public release called 0.6.0 Early 1 we started performance testing, focused on both stability and benchmarking purposes. For every minor and major release, we will publish performance testing protocol.

First table captures for how many devices each of the released version was tested and a link to the protocol with notes

Release	Tested for	Protocol
0.6.0 Early 1	10000	Link
0.7.0 Early 2	30000	Link
0.8.0 Early 3	40000	Link
1.0.0 Party	60000	Link

The application itself consists of a several elements where the critical element (performance-wise) is the dwarfgd, the service daemon. For HTTPS-enabled deployments, another performance determining element is the speed of encryption. The encryption could either be handled by Apache Web Server directly in the deployment environment (further communicating with dwarfgd via a localhost socket) or on a different box / VM.

Running an encryption termination reverse proxy is especially effective if you have multiple Dwarfguard deployments as the significant CPU load caused by the encryption can be offloaded to either a better scaled VM or a dedicated box. This enables you to share resources that would otherwise be left idling. This approach is extremely effective for handling peak loads as usually, the peaks for different deployments happen in different moments. Though, a system admin needs to carefully set up the proxy and access permissions to not compromise security.

Back to dwarfgd - the daemon consists of the main controlling thread, request handling threads taking care of processing data from devices and additional threads taking care of everything else. Following table captures possible setting for Dwarfguard releases, together with socket backlog size, which could be another performance influencing piece of puzzle.

Release	Type	RH thread (min)	RH thread (max)	Other threads (min)	Other threads (max)	Socket backlog	Notes
1.0.0 party	Static	2	32	6	8	64	Default - 2 RH threads. Increase if many users in GUI (RH threads serves also UI API and may get blocked by long-running operation for a while)
0.8.0 early 3	Static	2	8	6	8	64	RH threads increase requires service restart. 2 is enough for 40000 devices
0.7.0 early 2	Static	2	2	6	8	64	2 RH threads confirmed enough for 30000 devices
0.6.0 'early'	Static	2	2	6	8	64	2 RH threads determined enough for up to 10000 devices
0.5.x 'beta4'	Dynamic	1	10	5	7	256	Dynamic to measure perf. for first release
0.4.x 'beta3'	Dynamic	1	10	4	6	-
0.3.x 'beta2'	Static	4	4	0	0	-

While CPU power can limit how many requests can be handled per a request-handling thread, it is not a hard limit for the total number of devices in the system. When some of the requests cannot be processed in time, they time out and the device will return after a while to try again. The only hard limit is the amount of available memory. If the RAM segment of dwarfgd would outgrow the available memory, Dwarfguard would cease working (as any other application). Therefore, memory considerations must be done when scaling for a deployment of a number of devices.

Following table captures dwarfgd starting RAM and avg + max memory consumed per device attached.

Release	Startup RAM (dwarfgd)	Startup memory (VM)	avg RAM per device	max RAM per device	Notes
1.0.0 party	< 60 MiB	< 300 MiB	13 KiB		RAM usage very similar to 0.8.0, see comparison in perf.protocol
0.8.0 early 3	< 50 MiB	215 MiB	13 KiB		RAM usage is 60-70% of 0.7.0. See comparison sheet in perf. protocol
0.7.0 early 2	< 40 MiB		20 KiB		See perf. protocol for details on RSS / memory
0.6.0 early 1	< 40 MiB	150 MiB	12 KiB	60 KiB
0.5.0 beta4	< 40 MiB	220 MiB	40 KiB	150 KiB
0.4.0 beta3	100 MiB	< 400 MiB	150 KiB	300 KiB

Unfortunately, the above numbers are not enough to compute memory requirements of your deployment precisely as the requirements do scale with multiple aspects. What we can offer as a guide for you are

recommended maximum numbers of devices per container sizing in the table "recommended maximum"
sample data from our stability and benchmark testing showing e.g. complete system memory utilization after the test (emulating particular number of devices) was concluded

Based on that you may find suitable container (RAM) size by selecting the testing container with the closes higher number of devices you intend to handle by your Dwarfguard deployment. The table is describing results for Dwarfguard 0.8.0 while the 1.0.0 is better optimized with more data resulting in the same utilization (verified on H1 spec).

Container	CPUs / T	CPU arch.	RAM / MiB	Devices	RAM used	interval	% load per CPU (load / # of CPU)
C1	1 / 1	Xeon E-2250	512	1000	48 %	200 s	7 %
C2	2 / 2	Xeon E-2250	1024	3000	33 %	200 s	8 %
C3	4 / 4	Xeon E-2250	2048	10000	11 %	200 s	13 %
C4	8 / 8	Xeon E-2250	4096	30000	5 %	200 s	15 %
H1	8 / 16	Core i7	32768	30000	2 %	200 s	10 %

Just a remark - if you are using a VM container with a limited memory, pay close attention to the systemd - a journal systemd service can eat gigabytes of memory if not configured properly. Especially for a VM container this may cause Linux kernel killing random processes - like DB, Apache, dwarfgd or even systemd itself, resulting in possible service interruptions and data loss.

Following table brings recommended rough maximal number of devices per memory size assigned to a VM. The calculation is done on the basis that the OS installation is dedicated to operate Dwarfguard and there is no other service provided by the VM. You can find more performance test results in performance testing document for particular Dwarfguard release. Please note that the table does factor in also other aspects like stability tests parameters when run against that particular memory limit. While during our tests Dwarfguard was able to handle much bigger number than the conservative recommended maximum, please stick to the 'better safe than sorry' strategy and scale according to the first table row (recommended maximum).

Also, while the RAM criteria seems to be independent on other criteria (traffic, CPU), it is not. The same machine configuration may be fine for a deployment with higher number of devices with the default data-push interval but failing to handle lower number of devices with the considerably shortened interval while failing could mean service interruptions and VM restarts. That's because the memory requirements scales with not only number of devices, but number of requests per second as well.

Following recommendations do apply for the default data-push interval of 260 seconds - actually, the testing performed uses 200 seconds interval to be on the safe side of that.

Release 1.0.0 Party	C1/512 MiB	C2/1024 MiB	C3/2048 MiB	C4/4096 MiB	Notes
recommended maximum	1000	3000	10 k	40 k	For up to 60k devices 8192 Mib RAM is recommended
max RAM-wise (stability test)	2081	9143	90 k	566 k	Computed maximum, not recommended
max CPU-wise (benchmark)	20085	37992	45443	51829	For more devices use bigger container or H1+ equivalent

The last consideration to be taken is network traffic. To fully understand the amount of data and number of requests dwarfgd needs to handle, the following table shows additional calculations for selected number of devices, based on real measurements. Please note that in reality, the requests from devices are not evenly spread out in time. On the contrary - peaks do occur, so the illustrated load could be much higher or much lower during particular fraction of a minute or second.

devices	contacts/minute	1 contact every	relative CPU units (devs / total)	Traffic-regular/daily (In)	Traffic/sec (In)
1	0.23	260 seconds	1 / 201	1.3 MiB	15 B/s
10	2.3	26 seconds	10 / 211	13 MiB	0.2 KiB/s
100	23	2.6 secs	100 / 310	133 MiB	1.5 KiB/s
300	69	~1 sec	300 / 530	400 MiB	5 KiB/s
1000	230	0.26 secs	1000 / 1300	1.3 GiB	15 KiB/s
3000	692	86 millisec	3000 / 3500	4 GiB	46 KiB/s
10000	2307	26 millisec	10000 / 11200	13.3 GiB	153 KiB/s
60000	13846	4 millisec	60000 / 66200	80 GiB	1 MiB/s

Now, while the traffic and RAM figures are self-explaining, what about the CPU units?

The CPU comparison is hard to make because there are vast differences between physical CPUs, virtual machines and containers. The numbers in the table are good only to understand roughly the ratio between different deployments - e.g. if your deployment is loaded about 50% CPU with x devices, you can make a guess how many devices it is able to handle with 100% CPU utilization.

For every release, Dwarf Technologies runs performance testing. You can inspect results and see for yourself but our HW is not identical to yours and we use artificial testing methods to emulate lots of devices so you should take the measured results with reservation and better stick to the recommendations.

Should you need more information when considering scaling your deployment or if you are curious about results of the stability or benchmark tests performed, download the performance test protocols linked from the first table.

If you are considering where and how to deploy, please look into Deploy options, especially when inexperienced with deploying Linux server applications - Dwarf Technologies offers managing your Dwarfguard deployment in a safe and controlled environment or providing advice or assistance with setting up your own deployment.