Dwarfguard Performance
Dwarfguard is designed to operate as a high-performance single-node application. For versions 0.x, the device namespace for a single deployment is limited to roughly 65000 devices, not allowing more devices than this limit. With the first public release called 0.6.0 Early 1 we started performance testing, focused on both stability and benchmarking purposes. For every minor and major release, we will publish performance testing protocol.
First table captures for how many devices each of the released version was tested and a link to the protocol with notes
Release | Tested for | Protocol |
---|---|---|
0.6.0 Early 1 | 10000 | Link |
0.7.0 Early 2 | 30000 | Link |
0.8.0 Early 3 | 40000 | Link |
The application itself consists of a several elements where the critical element (performance-wise) is the dwarfgd, the service daemon. For HTTPS-enabled deployments, another performance determining element is the speed of encryption. The encryption could either be handled by Apache Web Server directly in the deployment environment (further communicating with dwarfgd via a localhost socket) or on a different box / VM.
Running an encryption termination reverse proxy is especially effective if you have multiple Dwarfguard deployments as the significant CPU load caused by the encryption can be offloaded to either a better scaled VM or a dedicated box. This enables you to share resources that would otherwise be left idling. This approach is extremely effective for handling peak loads as usually, the peaks for different deployments happen in different moments. Though, a system admin needs to carefully set up the proxy and access permissions to not compromise security.
Back to dwarfgd - the daemon consists of the main controlling thread, request handling threads taking care of processing data from devices and additional threads taking care of everything else. Following table captures possible setting for Dwarfguard releases, together with socket backlog size, which could be another performance influencing piece of puzzle.
Release | Type | RH thread (min) | RH thread (max) | Other threads (min) | Other threads (max) | Socket backlog | Notes |
---|---|---|---|---|---|---|---|
0.8.0 early 3 | Static | 2 | 8 | 6 | 8 | 64 | RH threads increase requires service restart. 2 is enough for 40000 devices |
0.7.0 early 2 | Static | 2 | 2 | 6 | 8 | 64 | 2 RH threads confirmed enough for 30000 devices |
0.6.0 'early' | Static | 2 | 2 | 6 | 8 | 64 | 2 RH threads determined enough for up to 10000 devices |
0.5.x 'beta4' | Dynamic | 1 | 10 | 5 | 7 | 256 | Dynamic to measure perf. for first release |
0.4.x 'beta3' | Dynamic | 1 | 10 | 4 | 6 | - | |
0.3.x 'beta2' | Static | 4 | 4 | 0 | 0 | - |
While CPU power can limit how many requests can be handled per a request-handling thread, it is not a hard limit for the total number of devices in the system. When some of the requests cannot be processed in time, they time out and the device will return after a while to try again. The only hard limit is the amount of available memory. If the RAM segment of dwarfgd would outgrow the available memory, Dwarfguard would cease working (as any other application). Therefore, memory considerations must be done when scaling for a deployment of a number of devices.
Following table captures dwarfgd starting RAM and avg + max memory consumed per device attached.
Release | Startup RAM (dwarfgd) | Startup memory (VM) | avg RAM per device | max RAM per device | Notes |
---|---|---|---|---|---|
0.8.0 early 3 | < 50 MiB | 215 MiB | 13 KiB | RAM usage is 60-70% of 0.7.0. See comparison sheet in perf. protocol | |
0.7.0 early 2 | < 40 MiB | 20 KiB | See perf. protocol for details on RSS / memory | ||
0.6.0 early 1 | < 40 MiB | 150 MiB | 12 KiB | 60 KiB | |
0.5.0 beta4 | < 40 MiB | 220 MiB | 40 KiB | 150 KiB | |
0.4.0 beta3 | 100 MiB | < 400 MiB | 150 KiB | 300 KiB |
Unfortunately, the above numbers are not enough to compute memory requirements of your deployment precisely as the requirements do scale with multiple aspects. What we can offer as a guide for you are
- recommended maximum numbers of devices per container sizing in the table "recommended maximum"
- sample data from our stability and benchmark testing showing e.g. complete system memory utilization after the test (emulating particular number of devices) was concluded
Based on that you may find suitable container (RAM) size by selecting the testing container with the closes higher number of devices you intend to handle by your Dwarfguard deployment. The table is describing results for Dwarfguard 0.8.0.
Container | CPUs / T | CPU arch. | RAM / MiB | Devices | RAM used | interval | % load per CPU (load / # of CPU) |
---|---|---|---|---|---|---|---|
C1 | 1 / 1 | Xeon E-2250 | 512 | 1000 | 48 % | 200 s | 7 % |
C2 | 2 / 2 | Xeon E-2250 | 1024 | 3000 | 33 % | 200 s | 8 % |
C3 | 4 / 4 | Xeon E-2250 | 2048 | 10000 | 11 % | 200 s | 13 % |
C4 | 8 / 8 | Xeon E-2250 | 4096 | 30000 | 5 % | 200 s | 15 % |
H1 | 8 / 16 | Core i7 | 32768 | 30000 | 2 % | 200 s | 10 % |
Just a remark - if you are using a VM container with a limited memory, pay close attention to the systemd - a journal systemd service can eat gigabytes of memory if not configured properly. Especially for a VM container this may cause Linux kernel killing random processes - like DB, Apache, dwarfgd or even systemd itself, resulting in possible service interruptions and data loss. (please note this applies to default Debian GNU/Linux installations for at least versions 10 and 11).
Following table brings recommended rough maximal number of devices per memory size assigned to a VM. The calculation is done on the basis that the OS installation is dedicated to operate Dwarfguard and there is no other service provided by the VM. You can find more performance test results in performance testing document for particular Dwarfguard release. Please note that the table does factor in also other aspects like stability tests parameters when run against that particular memory limit. While during our tests Dwarfguard was able to handle much bigger number than the conservative recommended maximum, please stick to the 'better safe than sorry' strategy and scale according to the first table row (recommended maximum).
Also, while the RAM criteria seems to be independent on other criteria (traffic, CPU), it is not. The same machine configuration may be fine for a deployment with higher number of devices with the default data-push interval but failing to handle lower number of devices with the considerably shortened interval while failing could mean service interruptions and VM restarts. That's because the memory requirements scales with not only number of devices, but number of requests per second as well.
Following recommendations do apply for the default data-push interval of 260 seconds - actually, the testing performed uses 200 seconds interval to be on the safe side of that.
Release 0.8.0 Early 3 | C1/512 MiB | C2/1024 MiB | C3/2048 MiB | C4/4096 MiB | Notes |
---|---|---|---|---|---|
recommended maximum | 1000 | 3000 | 10 k | 40 k | Release 0.8 is recommended up to 40 k devices |
max RAM-wise (stability test) | 2081 | 9143 | 90 k | 566 k | Computed maximum, not recommended |
max CPU-wise (benchmark) | 20085 | 37992 | 45443 | 51829 | Buffer included (can take more in default setting) |
The last consideration to be taken is network traffic. To fully understand the amount of data and number of requests dwarfgd needs to handle, the following table shows additional calculations for selected number of devices, based on real measurements. Please note that in reality, the requests from devices are not evenly spread out in time. On the contrary - peaks do occur, so the illustrated load could be much higher or much lower during particular fraction of a minute or second.
devices | contacts/minute | 1 contact every | relative CPU units (devs / total) | Traffic-regular/daily (In) | Traffic/sec (In) |
---|---|---|---|---|---|
1 | 0.23 | 260 seconds | 1 / 201 | 1.3 MiB | 15 B/s |
10 | 2.3 | 26 seconds | 10 / 211 | 13 MiB | 0.2 KiB/s |
100 | 23 | 2.6 secs | 100 / 310 | 133 MiB | 1.5 KiB/s |
300 | 69 | ~1 sec | 300 / 530 | 400 MiB | 5 KiB/s |
1000 | 230 | 0.26 secs | 1000 / 1300 | 1.3 GiB | 15 KiB/s |
3000 | 692 | 86 millisec | 3000 / 3500 | 4 GiB | 46 KiB/s |
10000 | 2307 | 26 millisec | 10000 / 11200 | 13.3 GiB | 153 KiB/s |
60000 | 13846 | 4 millisec | 60000 / 66200 | 80 GiB | 1 MiB/s |
Now, while the traffic and RAM figures are self-explaining, what about the CPU units?
The CPU comparison is hard to make because there are vast differences between physical CPUs, virtual machines and containers. The numbers in the table are good only to understand roughly the ratio between different deployments - e.g. if your deployment is loaded about 50% CPU with x devices, you can make a guess how many devices it is able to handle with 100% CPU utilization.
For every release, Dwarf Technologies runs performance testing. You can inspect results and see for yourself but our HW is not identical to yours and we use artificial testing methods to emulate lots of devices so you should take the measured results with reservation and better stick to the recommendations.
Should you need more information when considering scaling your deployment or if you are curious about results of the stability or benchmark tests performed, download the performance test protocols linked from the first table.
If you are considering where and how to deploy, please look into Deploy options, especially when inexperienced with deploying Linux server applications - Dwarf Technologies offers managing your Dwarfguard deployment in a safe and controlled environment or providing advice or assistance with setting up your own deployment.