Protocol Reliability Review
I was wondering if anyone had ever come across a document that details the reliability of messaging of various protocols. I am working on a project for critical facilities and we have come across the need to determine what happens if a control device or panel completely locked up (Processor stopped and watchdog failed to restart). Would the various protocols all be intelligent enough to pick up on this and alarm the fact at the head end? It seems to me that some protocols especially those that are registry based would likely just hold and continue to deliver the last known good value and ass such information might appear normal and yet systems be offline.
I am fairly sure that there must be documentation around this in existence and didn't want to recreate the evaluation. So if anyone know of something the assistance would be appreciated.
As a second point if anyone had any idea's on work around to verify proper operation despite protocol weaknesses those would be be welcome as well.
Thank you very much!
its more about how the protocols are implemented rather than the protocol itself.
anything polled in a JACE will appear as status.down if polls fail.
EasyIO controllers have comms lost bits for modbus and bacnet ... I have set up an internal watchdog for easyIO controllers before so that stuff will fall back to a certain position. these controllers could be set up to send a watchdog out as well.
1 + 1 = 3 ( *** for very large values of 1)
...everybody wants a box of chocolates and long stemmed rose
Be brave. You cannot get eaten by an imaginary tiger.
I have not dealt with a controller that I could not work comms loss into my overall control strategy when needed. If the product was that limited, I would find something else.
I would suspect in most cases its not the product but how its applied that determines how robust it is on a communications loss.
I'd pretty much have to agree with the other responders to your inquiry.
It's more a matter of implementation, how things are set up by whomever programs your system, than it is a matter of precisely what protocol is used.
As for some documented study on the subject. I know of none ... that I'd trust as a definitive answer to the question.
In most studies I seen, there are problems. (1) Authors of such studies that I've seen do tend to be somewhat biased. (2) Generally speaking, and I've yet to see an exception, authors of such studies tend to be only VERY knowledgeable in perhaps 1 protocol, and to have a fairly limited knowledge of any other. (3) And mostly they're more desk bound theorists than they are widely field experienced with real operating systems.
The reason I often trust what the people on this site say about such things. The people who've been here a while and have a history of being knowledgeable about the systems they actually work with.
Most modern front ends, for instance the Tridium Niagara stuff, know when a controller fails to respond to a message such as a request for the current value of an input, or whatever. But whether or not the system alerts you to the fact or just shows last value received, is a function of what instructions you give the Jace as to what to do about such instances.
This is pretty much the case with other modern front ends and even with most of the older ones I'm familiar with.
Geez, I am familiar with a front end that's been in use since the mid 90's and many of which are still running, a DOS based system, which could easily do this sort of thing. It was only a matter of whether or not the programmer who set things up TOLD it to check or not, and whether or not the programmer told it to do anything about it if such a thing occurred.
Besides just checking whether or not a field controller's microprocessor is locked up, offline, or whatever, one can also check things like whether or not a particular input is reliable or not, whether someone has done a manual override of a control and forgot to release it back to automatic, etc.
I find a LOT of instances where a problem is traced down to a failed/faulty input device, something left in manual override, or an operator entered some unlikely/bogus/impossible adjustment or setpoint. Also,commonly, I find cases where and operator disabled an alarm. This is the source of issues far more than a locked up microprocessor or offline controller.
If you have a critical facility, it'd be wise to keep these things in mind and have implemented methods to deal with such. Most protocols, all I am familiar with, can deal with such tasks.
In my experience, it is more important that you pick your controls contractor wisely, and have specific expectations for what you want as concerns fault conditions laid out clearly ... IN WRITING, and MAKE SURE those items are implemented in the finished product ... than it is which specific protocol is used.
A lot of contractors out there will only do (implement) the minimum items needed to make the system work well enough for you to sign the check and pay em. And if it's not in the specs/contract for them to implement the kind of things you may want ... they won't do it.
Thank you to all for the input!
I am unfortunately in a situation where we are installing a system that overlays whatever control system happens to be installed onsite and therefor must make allowances for the weaknesses of them as well. I may be looking at it wrong but as the primary control system stays in place including the workstation or server which we pull data from it has been my experience that if the primary control system doesn't pick up on a failure it is unlikely that we will. In addition we often have instances where feild control devices are mapped to server based applications that write the field information to a database which will retain values even if a controller goes down. We have just such an instance with a JCI field bus (N2) being mapped through and OPC server to deliver information to us.
We of course have some things that are done through our onsite engine such as looking for "stale" points and checking to make sure that values update on a regular basis. In the values example we typically pick several analog input points from each controller and make sure that there are updates to the values on a regular basis. If we go past a predetermined time with no value updates we raise an event that drives a deeper dive into the validity of the data.
Thanks again and any additional suggestions are always welcome....
Tags for this Thread