How we get several hundreds of MiniPc under control

In Shipmonk Warehouses we have plenty of hardware devices. Among the most important are our devices called “Packing Stations”. Packing Station consist of:

The most interesting part is the USB MiniPC with Debian. We have several hundred of them across 3 geographical locations. If you ask how we install MiniPC, then the answer is “we do not install them!”. We regularly send our OS image to the vendor who delivers pre-installed MiniPC to Warehouses where they just turn the MiniPC on.

On the MiniPC we run our Packing application which is based on TypeScript, Node.js, React, GraphQL. The application has the ability to upgrade itself. The upgrade is a simple pull-based mechanism — in the cloud in the DB table we store current and required versions of each station. Based on that, the application can download newer versions from S3 bucket. But here all the beauty ends…this is the only mechanism we can use to control several hundreds of MiniPC with Linux in our Warehouses in California, Pennsylvania and Florida. I won’t lie if I tell you the upgrade mechanism does not work 100% of time…Sometimes it fails due to connectivity issues, sometimes fails something else. In those cases we had to either ssh into MiniPc from Prague or to instruct somebody in Warehouse for manual intervention (have you ever guided someone over whatsapp video call to ssh into linux?).

We had a little or sometimes even no visibility at all into what’s going on…

  • we had no operation metrics to answer questions like…was the CPU overloaded? Do we have enough RAM? Is there any free space on disk?
  • we had logs only from our app installed there…we had no logs from OS
  • We could not change configuration of all stations at once…e.g. Install some OS package
  • We could not verify whether our WIFI is stable enough (all MiniPC were connected via wifi)

Each MiniPC was just a single unit without central management. But we already had more than 200 of them across 3 geographical locations and more to come!

QBee Proof of Concept

One night I started playing with a tool called QBee which promised Linux device-fleet management. QBee is based on an agent which is installed on devices and periodically pulls out configuration from the server. I installed manually QBee agent by ssh into couple of MiniPC and running there those 3 commands:

dpkg -i qbee-agent_1.2.1_amd64.deb
/opt/qbee/bin/qbee-bootstrap -k <API_KEY>

Within a couple of minutes I was able to get several MiniPC under central management with features like a:

  • Full visibility into performance (cpu,memory, disk, network, etc..)
  • Remote access via SSH (over vpn…so even MiniPC behind NAT are accessible)
  • Security Audit with list of CVE for our installed packages (ouch!)
  • And last but not least, the fleet configuration like a:
  • Linux user management
  • Network watch which can initiate the device restart if remote server is not accessible for more than x minute
  • File distribution
  • Package Management
  • And many more

So much fun after running 3 commands!

QBee made a huge “wow” effect among my team. Next day I had to create a user account into Qbee for CTO and also for CEO! This was an amazing feedback! But we could not stop here….we had only a couple of MiniPC under central management.

QBee rollout to all MiniPcs

The most difficult part was “How to install QBee agents into each MiniPc?”. It would take me ages to install a QBee agent into each device manually. And also…our MiniPc has a quite shaky life…one day is ON, and day after it is turned off and returned to our armory for future use. In other words…we have plenty of MiniPc in our Warehouse and not all of them are online at a given time. For that purpose we extended the upgrade mechanism of our Packing application that was running on MiniPc. At each app upgrade, the application verified the presence of QBee and installed it there if needed. We were finally able to install QBee into each MiniPc automatically! Hooray!

What we have achieved so far with QBee

After first month with QBee we were able to make a huge progress:

  • We discovered some MiniPC had enabled Power Management on Wifi -> we turned off this feature globally for all devices
  • For easier troubleshooting we installed FluentBit and configure it to forward syslog to DataDog
  • We have established a simple network latency monitoring (ping to with result sending to DataDog)
  • We have enabled “Automatic device restart” when Remote server is not reachable (i.e. most probably device has lost connection to the local network)
  • Some MiniPc had unsynchronized time -> we configured NTP globally to all devices
  • In two warehouses we could not ssh into MiniPc due to network configuration (wifi configured as isolated network) — thanks to QBee we could remotely connect to such a devices
  • And many more

And what is the price for QBee? € 0.55 per device per month…plus € 179 for Premium Support. One can say we could achieve the same results with Chef/Puppet/etc….My answer is YES, we could definitely achieve the same results with those tools. But it will take us more time. And time matters!

About author:

Martin Damovský

ShipMonk Head of DevOps

Leave a Reply