banner

Blog & Details page

blog_post
Leo Bayer

Posted on 18 May, 2023

Devices Misbehaving

One of the core philosophies of our products here at LogicVein, is for our users to be able to manage their devices with as little effort as possible.

Database: postgres

One of the core philosophies of our products here at LogicVein, is for our users to be able to manage their devices with as little effort as possible. Whatever their environment, it isn’t the user’s responsibility to get our product to work, it’s our responsibility.

We support over fifty different vendors for configuration management in Net LineDancer and a limitless number for monitoring in ThirdEye. Each of those vendors can have hundreds of different combinations of models and OS versions. No product is without bugs, so this means that, of the thousands of devices that we support, some of them are bound to have bugs in them.

Regardless of the existence of bugs in the devices we support, we still want to be able to support them. We put a lot of effort into ensuring that even devices with these quirks work so that our users don’t have to give it any thought.

This is not something that is really visible since as a user you are unlikely to notice anything unless there is a problem.

Time to point some fingers. Here are a couple specific example cases we have seen…

Microtik SNMP response ordering

Some Microtik and Yamaha SNMP servers will respond to table based SNMP queries in column-frist rather than row-first order. In other products a user might have to configure an option to force the monitoring system to understand this. But ThirdEye is smart and can automatically identify these quirks in the SNMP responses.

There are a number of special tricks that we have implemented with our internal SNMP client to ensure that interaction with devices is smooth, performant, and without issue.

Apresia SSH server RSA key length issue

We recently encountered an issue where a small number of Apresia devices would fail to back up as a part of a large (a few hundred devices) backup job. After some deep investigation we found that, more precisely, one out of every 256 connection attempts to these devices would fail.

To support SSH and SCP interactions with devices we use the wonderful PuTTy project and specifically its “plink” and “pscp” implementations. Over the years we have had to make minor improvements to this tool in order to work around and avoid issues with devices. In this specific case, we were able to prevent PuTTy from generating encryption keys that would trigger the bug on Apresia devices.

Some might recommend that if we encounter an issue that we should just retry the operation. But we feel that it is better to not encounter the issue at all. A workaround like automatic retries has negative side effects. For instance it would hide as-of-yet unknown bugs that could trigger more harmful situations and it also hurts overall performance by requiring additional connection attempts that ultimately aren’t necessary if the root issue is addressed instead.

It’s also worth noting that the original implementation in the PuTTy project is technical correct as far as the protocol specification is concerned. The problem is that not all devices adhere strictly to the spec.

Epilogue

Whatever bug the target device has, it is our philosophy that our products will still work and our users don’t need to care.

We are sure that from time to time more bugs will be discovered in the wild. And we are excited for the opportunity to identify the root causes and engineer solutions to allow our customers to keep chugging along.

30 Day Free Trial

Get hands-on experience with ThirdEye for 30 day free of cost and assess it
by using our evaluation license.