View Full Version : MS/TP Andover/McQuay
PasqualeS
04-18-2011, 09:58 PM
This is my enviroment:
-18 nodes segment
-38.4k
-Andover B3XXX,Microtech II and III controllers
I have everything OnLine and communicating. The problem is the performances of the network. My graphics are very slow to populate and to open a point takes forever. Everything is SLOW, SLOW and SLOW.
If I reset the whole network(Masters,BMS,Devices), at first everything seem working properly and then suddenly the performance degrade...
I tried everything in my knowledge to make the network faster. I decided to join your community to have some fresh ideas to implement.
Thanks for any help.
BACnet
04-18-2011, 10:32 PM
Do you have the ability to sniff the network?
PasqualeS
04-18-2011, 10:43 PM
I could. What's the plan? Thanks
BACnet
04-19-2011, 09:45 AM
My reason is that I wonder if the front end is getting itself tied up in knots or if the MS/TP bus really is getting hammered and slowing down.
If you capture a minute or so of the traffic on the MS/TP bus the root cause of the issue should be apparent.
justmike7
04-19-2011, 01:45 PM
Shooting from the hip I wonder if you have something like a very low COV threshold (thus a fluctuation is blasting the network with data) or simple binding issues from redundant DIDs.
Having zero experience with the software you are using I could be completely off base but those are two things that leap to mind.
Agree that network capature is the way to go.
BACnet
04-19-2011, 02:06 PM
I also fear that the MicroTech II controllers are simply not good. Have you had to manually reset any of them?
This thread (http://hvac-talk.com/vbb/showthread.php?t=579002) goes into their problems in greater detail. In it, an engineer from McQuay posted that the Microtech II's are not able to handle traffic properly. He also went a bit farther and said that they have no intention of fixing them.
Again, a sniff of the network will show if these devices are your problem.
justmike7
04-19-2011, 02:09 PM
If the MY stuff is on the same leg you could simply pull it off line and see if it comes up. Maybe quicker than the sniffer if you have never used on...
justmike7
04-19-2011, 02:10 PM
I meant MT stuff! ;-)
sysint
04-19-2011, 02:29 PM
It is BACnet MSTP. This is a deterministic network. I cannot possibly see how you fail to get data in a decent time frame.
I think your watch is broke.
EDIT: However, prove me wrong. Tell me how many points are being polled in your network and at what rate.
Further, indicate what properties are being passed. If you have COV - provide the details on that. Maybe if your watch isn't broke it's not a time check but a reality check.
BACnet
04-19-2011, 02:37 PM
It is BACnet MSTP. This is a deterministic network. I cannot possibly see how you fail to get data in a decent time frame.
I fear that this problem is "craptroller" specific, not protocol specific. According to this thread (http://hvac-talk.com/vbb/showthread.php?t=173200), the only thing worse than a BACnet Microtech II is a LON Microtech II... :gah:
sysint
04-19-2011, 03:15 PM
...The units will not have the ability to communicate updates to outside air damper position, space CO2 and every other point available via BACnet twice a second, and I hope that this is not required and that the customer would be OK with a slower polling rate.
Well, I think I'll just wait on the details from the guy before proclamation of junk.
BACnet
04-19-2011, 03:46 PM
The line you quoted is where their engineer stated that they are junk. The fact that they are junk has not been contested in any thread I've read.
But I agree, once we see the packets the whole story should be quite easy to piece together.
sysint
04-19-2011, 04:03 PM
He said no more than every object 2x a second. One second polling is not fast enough on every object? Nowhere did he say the stuff was junk. This is the one thing I don't appreciate. If you are with another manufacturer and you want to call them out like this then you can stand behind your company name, regardless of whether or not you are right. And, if you can't do that then maybe you shouldn't. This has nothing to do with what you want to do - just your circumstances because you cannot put your money where your mouth is. Sort of hamstrung.
I checked some literature and it looks like around 70 objects. So, not that many. Let's see what he says about his network and what he is doing - COV or straight polling, how many devices, etc...
BACnet
04-19-2011, 04:15 PM
He said no more than every object 2x a second. One second polling is not fast enough on every object? Nowhere did he say the stuff was junk. This is the one thing I don't appreciate. If you are with another manufacturer and you want to call them out like this then you can stand behind your company name, regardless of whether or not you are right. sys- I don't want to bicker with you on this and I can't understand what's gotten you all fired up.
McQuay representatives stated that these units will lock up if you poll them too quickly. I consider that a problem. You don't.
Remember that I am posting under the name BACnet and the opinions I post on this board are that of a generic BACnet engineer. When a product (anyone's product) comes on the market and calls itself a BACnet product but it locks up under normal operating conditions- As a BACnet guy I call them out. I would consider it dishonorable not to call them out when they have known flaws that are not being addressed. If it was my own product and it was found to have a fatal flaw I'd call it out on this board too. But I would then go back and put out a firmware release that fixed it ASAP.
The concept of me being "right" about these controllers is a laughable point. I am using the McQuay engineer's own posts as my understanding of this product. So settle down, Sys. No need to get riled up over nothing.
sysint
04-19-2011, 04:30 PM
What BACnet OWS do you know where you can set a straight polling rate faster than 1 second?
EDIT:
1. First thing I want to do is verify termination and network cabling.
2. Second thing I want to do is some simple math.
3. Third thing I want to do is put a sniffer on the network.
4. Potentially blame product.
I think if this guy gets past item 1 then my most likely guess is he doesn't get past item 2.
Who makes this box for McQuay? Doesn't look like something they would make on their own.
BACnet
04-19-2011, 04:38 PM
Sniff the network first. That's always the very first thing to do. It will rule in or rule out every other bullet point.
kontrolphreak
04-19-2011, 04:39 PM
Don't mean to sound stupid, but have all the basics been covered?
Correct wire, EOL terminators, Max Master set to the highest MAC address? Been doing a lot of BACnet integration lately and have found that OEM products are a lot more finicky then controllers manufactured by a dedicated EMS/DDC company.
Also have you checked the PICS for the McQuay controllers? Some devices don't support RPM and each property has to be read one at a time which slows down the network.
kontrol out
kontrolphreak
04-19-2011, 04:41 PM
Why poll something at 1 second? Anything critical should be COV.
I generally point poll any faster then 5 seconds and on most points it's around 30 seconds to a minute, do you really need to update the OSA temp every second?
kontrol out
sysint
04-19-2011, 04:50 PM
Sniff the network first. That's always the very first thing to do. It will rule in or rule out every other bullet point.If I have to throw down some cash I'm betting he is over-taxing the ability of the MSTP network to deliver the data. Of course you can see that also with sniffing probably but a little math is easy to do to see what possibly you can expect.
My pencil is sharpened and ready.
sysint
04-19-2011, 04:52 PM
Why poll something at 1 second? Anything critical should be COV.
I generally point poll any faster then 5 seconds and on most points it's around 30 seconds to a minute, do you really need to update the OSA temp every second? kontrol outSo, we have some reality injected into the argument. So, not being able to get to 2 polls/second shouldn't be an issue, right?
And, don't you think their "OEM" product came from a controls manufacturer?
BACnet
04-19-2011, 05:10 PM
If I have to throw down some cash I'm betting he is over-taxing the ability of the MSTP network to deliver the data. Of course you can see that also with sniffing probably but a little math is easy to do to see what possibly you can expect.
My pencil is sharpened and ready.
I'm back on board with your line of reasoning.
Of course, even with the strict math approach, some assumptions need to be made.
As an example, take the following generic assumptions:
Number of Masters - 18, (0 though 17 specifically)
Number of Slaves - 0
Max_master of highest device set to self? - Yes
Baud Rate - 38.4kbps
Average Delay before responding - 4ms
Front End Turnaround time between Questions? - 4ms
Average delay before passing the token? - 4ms
RPM being used? - No
Max Info Frames on Front End? - 5
Average non-token message length? - 30bytes (I feel I'm being way more than generous here)
In this example, the front end will ask for 5 data points every time it gets the token, then the token will get passed around the loop, then the front end will ask five more questions.
The actual loop would be timed like this:
T= 0: First Question Asked. Transmit Time is 7.8ms (rounded to 8)
T= 8: Q1 received. Now 4ms delay, then 8ms transmission back.
T = 20: Response received. Now 4ms delay then next question asked (12ms)
T = 32: Q2 received. Now 4ms delay, then 8ms transmission back.
T = 46: Response received. Now 4ms delay then next question asked (12ms)
T = 58: Q3 received. Now 4ms delay, then 8ms transmission back.
T = 70: Response received. Now 4ms delay then next question asked (12ms)
T = 82: Q4 received. Now 4ms delay, then 8ms transmission back.
T = 94: Response received. Now 4ms delay then next question asked (12ms)
T = 106: Q5 received. Now 4ms delay, then 8ms transmission back.
T = 118: Device 0 waits 4ms, then passes the token to next device (2ms transmit time)
T = 124: Device 1 waits 4ms, then passes the token to next device (2ms transmit time)
T = 130: Device 2 waits 4ms, then passes the token to next device (2ms transmit time)
T = 136: Device 3 waits 4ms, then passes the token to next device (2ms transmit time)
T = 142: Device 4 waits 4ms, then passes the token to next device (2ms transmit time)
T = 148: Device 5 waits 4ms, then passes the token to next device (2ms transmit time)
T = 154: Device 6 waits 4ms, then passes the token to next device (2ms transmit time)
T = 160: Device 7 waits 4ms, then passes the token to next device (2ms transmit time)
T = 166: Device 8 waits 4ms, then passes the token to next device (2ms transmit time)
T = 172: Device 9 waits 4ms, then passes the token to next device (2ms transmit time)
T = 178: Device 10 waits 4ms, then passes the token to next device (2ms transmit time)
T = 184: Device 11 waits 4ms, then passes the token to next device (2ms transmit time)
T = 190: Device 12 waits 4ms, then passes the token to next device (2ms transmit time)
T = 196: Device 13 waits 4ms, then passes the token to next device (2ms transmit time)
T = 202: Device 14 waits 4ms, then passes the token to next device (2ms transmit time)
T = 208: Device 15 waits 4ms, then passes the token to next device (2ms transmit time)
T = 214: Device 16 waits 4ms, then passes the token to next device (2ms transmit time)
T = 220: Device 17 waits 4ms, then passes the token to next device (2ms transmit time)
T = 226: Device 0 (Front end) has now received the token and waits 4ms to transmit
T= 230: 6th Question ready to be asked.
So in this example, the network is set up to have 5 questions asked and answered every 230ms. That means that it can ask and answer roughly 22 questions per second.
Of course I’d tune this network so that it was much faster. I’d up the baud rate to the new defacto standard, 76.8. I’d make any devices that don’t need the token into slaves by bumping their ID’s above 127. I’d increase the front end’s max_info_frames to allow it to exert more dominance over the network. And lastly I’d see if read property multiple could be used on any of the controllers.
But remember- sniffing the data on the line will provide answers to each of these unknowns as well as answers as to noise on the line and bad controllers...
sysint
04-19-2011, 05:48 PM
You previously forgot the HVAC-talk mantra that protocol analyzers are largely unnecessary. Next time don't make that mistake. Heck, Freddy was reading this and saw the word sniffer and shut down the PC...
I thought most techs own a pencil or a pen and can write on multiple surfaces. Therefore, they can do some math and it costs very little.
BACnet
04-19-2011, 05:52 PM
You previously forgot the HVAC-talk mantra that protocol analyzers are largely unnecessary. Next time don't make that mistake. Heck, Freddy was reading this and saw the word sniffer and shut down the PC...
I thought most techs own a pencil or a pen and can write on multiple surfaces. Therefore, they can do some math and it costs very little.
Who are you talking to? And what is this mistake you are referring to?
But they aren't needed to solve this fellow's problems. All we need here is a wireshark capture of the MS/TP network and we'll have all the data we need.:cheers:
sysint
04-19-2011, 06:01 PM
Who are you talking to? And what is this mistake you are referring to? But they aren't needed to solve this fellow's problems. All we need here is a wireshark capture of the MS/TP network and we'll have all the data we need.:cheers:It's official.... you are an engineer.
PasqualeS
04-20-2011, 07:34 AM
I also fear that the MicroTech II controllers are simply not good. Have you had to manually reset any of them?
This thread (http://hvac-talk.com/vbb/showthread.php?t=579002) goes into their problems in greater detail. In it, an engineer from McQuay posted that the Microtech II's are not able to handle traffic properly. He also went a bit farther and said that they have no intention of fixing them.
Again, a sniff of the network will show if these devices are your problem.
- I am getting prepared to do a few minutes of packets capture.
- those are few answers for the multiple post since today:
Termination resistors in place 100ohms
Bias resistors 520ohms(MT2 has an option to activate those)
Straight polling or COV? Don't know how Continuum does.(Mixed enviroment?)
I used an oscilloscope to have an idea on what was going on on the network. Activating bias resistors on MT2 seems help the performance. The traffic on the segment stay more close to the 0 volt line. Without bias resistors the 0 volt line is all over the place.
- removing all the mcquay units from loop the performace changes radically. I need to try do the way around.
- next step will be reducing the speed(19.2) and follow some of your hints(set max master, ...)
Thank all for the posts.
kontrolphreak
04-20-2011, 07:42 AM
From my experience EOL resistor should be 120 Ohm. And I have seen small/short networks that actually were brought down when EOL resistors were in place and functioned fine whem removed.
kontrol out
sysint
04-20-2011, 09:06 AM
- I am getting prepared to do a few minutes of packets capture.
- those are few answers for the multiple post since today:
Termination resistors in place 100ohms
Bias resistors 520ohms(MT2 has an option to activate those)
Straight polling or COV? Don't know how Continuum does.(Mixed enviroment?)
I used an oscilloscope to have an idea on what was going on on the network. Activating bias resistors on MT2 seems help the performance. The traffic on the segment stay more close to the 0 volt line. Without bias resistors the 0 volt line is all over the place.
- removing all the mcquay units from loop the performace changes radically. I need to try do the way around.
- next step will be reducing the speed(19.2) and follow some of your hints(set max master, ...)
Thank all for the posts.I'd connect up the biasing resistor on one side and then terminate properly on the other because the response you noted was correct.
How many points are you polling per device and total? Think about what was posted earlier. You get how many points (debatable to me) in a second?
Anyway, I've got proof with other scans how long devices can take to pass a token to another master. Some devices hold on for a long time. The bacnet standard for this is vague becuase the timeout range is something like anywhere from 20-100ms which torches anyone's calculations or general expected response rates.
Your analyzer log will show this type of thing. And, go figure.... a guy using a $#(&@ scope and an analyzer. I doubt you are from the US. The one thing I wouldn't do at the moment is lower the baud rate. Fix your network and run the scan. And, it's just me but I'd leave that max info frames at 1.
BACnet
04-20-2011, 09:17 AM
I agree with Sys about lowering the baud rate, I've not read anything on this thread that suggests a need for that.
But I can't agree with his assertion to drop the max_info_frames down to 1. That's quite literally tying one hand behind your OWS's back. If anything I'd bump it up to 10 or more on your front end or router as that will greatly increase the throughput on the network.
sysint
04-20-2011, 09:30 AM
...But I can't agree with his assertion to drop the max_info_frames down to 1. ...;)
PasqualeS
04-20-2011, 04:33 PM
Sysint and BACnet,
Sorry for miss spelling or typo(I reply with my BB)... I am in this business for the past 2 years and I am not a guru like you.
I am reading BACnet manuals and standards, and I am respecting all the rules.
The segment has been inspected and troubleshooted by Senior Engineers and Senior Field Technician. They affirm that wiring is ok and the software settings are ok.
I would like to learn from you what I am doing wrong and how I can fix.
As now everything is OnLIne and stable...
I poll 20 points max at the time to populate graphics or a class of objects.
Nothing crazy... Why is slow?
The sniff capture will show that?
I am waiting for a converter for my laptop to do the capture.
BACnet
04-20-2011, 04:50 PM
I have to admit that I stare at comm port readouts in real time not unlike the guys in the movie The Matrix. I can tell just about everything that an MS/TP network is up to based on that. When I see something of interest, generally a pause or hiccup, I stop the sniffer and take a closer look at the proceeding couple of seconds worth of data. The easiest thing to spot is silence. If there is a break in the stream, it's caused by one of only a few things.
If you do see a large break in comms, (more than 50ms is a good threshold) the usual suspects are the following:
1) There is a unit that (for whatever reason, sys) doesn't operate properly. Examples of a "bad device" is one that receives a token, then waits until the network has marked it offline, then it wakes back up and sends the token along, bringing the network down as a result. There are at least 2 manufacturers who sell products that suck in this way. They are not obeying the BACnet standard and should not call their controllers BACnet since they don't follow the BACnet standard.
2) Your front end is making requests of a device that doesn't exist. When this occurs, the entire MS/TP network holds its breath for roughly 200ms waiting for the non-existent device, then the front end gives up and asks its next question of the network. Some front ends will mark a unit offline and give up polling it for a set period of time, others aren't smart enough to understand that. These types of delays are caused by programming errors on the front end and are generally fairly easy to correct.
3) You are literally swamping the network. Everything is working properly and rapidly but the front end wants to ask (using the example above's numbers as a reference) more than the 22 messages that it can ask in a second. This is less likely, but if it is the case, then bump the max_info_frames way up and the problem will go away since it can then ask more questions per second.
Anyway, that's a brief primer on what I would look for and how I would interpret the data.
sysint
04-20-2011, 05:13 PM
Pasquales, I'm more of an antagonist if anything....
so, speaking of which what is the bacnet standard in regards to T_usage_timeout?
BACnet
04-20-2011, 05:21 PM
Pasquales, I'm more of an antagonist if anything....
so, speaking of which what is the bacnet standard in regards to T_usage_timeout?
Sys,
As you well know, that can be up to 100ms. But 101ms after being passed the token if the device doesn't do anything with it, a new one will be generated.
So if you're implying that he should look for delays longer than 100ms rather than longer than 50ms, technically that would be acceptable. But I have a hunch that the real issues on his network are at least one order of magnitude larger than that.
From the actual BACnet standard:
Tusage_timeout- The minimum time without a DataAvailable or ReceiveError event that a node must wait for a remote node to begin using a token or replying to a Poll For Master frame: 20 milliseconds. (Implementations may use larger values for this timeout, not to exceed 100ms).
Most people implement this as two distinct variables, one value specifically for pfm and one value for everything else.
PasqualeS
04-20-2011, 09:49 PM
3) You are literally swamping the network. Everything is working properly and rapidly but the front end wants to ask (using the example above's numbers as a reference) more than the 22 messages that it can ask in a second. This is less likely, but if it is the case, then bump the max_info_frames way up and the problem will go away since it can then ask more questions per second.
Thanks BACnet!!!
I increised the max_info_frames from 8 to 100 and now the BMS is responding correctly for the past 4 hours.
Thanks HVAC-TALK.com
sysint
04-20-2011, 10:11 PM
Seems like those "substandard" controllers can work.
BACnet
04-20-2011, 10:24 PM
Thanks BACnet!!!
I increised the max_info_frames from 8 to 100 and now the BMS is responding correctly for the past 4 hours.
Thanks HVAC-TALK.com
That's great, PasqualeS. Glad it worked out. :cheers:
But I would still sniff the network and see what it looks like now. You are now getting the messages through since the OWS clears its buffer each time around before it passes the token. But that doesn't mean that the token isn't still being mishandled when the front end does release it to travel the loop. If that is the case you might have other issues, (like perceived offline controllers), that might still crop up from time to time.
PasqualeS
05-23-2011, 09:43 PM
That's great, PasqualeS. Glad it worked out. :cheers:
But I would still sniff the network and see what it looks like now. You are now getting the messages through since the OWS clears its buffer each time around before it passes the token. But that doesn't mean that the token isn't still being mishandled when the front end does release it to travel the loop. If that is the case you might have other issues, (like perceived offline controllers), that might still crop up from time to time.
Hi BACnet,
again here for some help.
After adding additional nodes to the segment, Microtech II,III controllers
are experience some problems.
- M3 are going Online,Offiline every polls I will say. The units are operational.
- M2 are making my programs crashing. I am just setting 2 temperature setpoints.
I have all the necessary to do a sniff. Could you please guide me thru the process. I am losing my mind in this job site with those new controllers... :gah:
Thank you very much.
BACnet
06-18-2011, 12:05 PM
PasqualeS-
How's this site working for you? Do you still have issues or did you get it sorted out?
PasqualeS
06-20-2011, 05:02 PM
How's this site working for you? Do you still have issues or did you get it sorted out?
Hi BACnet thanks for follow-up. I thought you were gone. :)
I increased the ADPU Timeout and the Online/Offline seems disappeared.
I would love to got a capture of the traffic on the segment and understand who and how thinks works. Can you post an how-to do a capture?
BACnet networks are not that scary as I was told or experienced without knowledge.
Thank you! :putergreet:
PS: Could you please PM your email, just in case…
BACnet
06-20-2011, 05:28 PM
PasqualeS- I've been using a custom home-build MS/Tp sniffer for so long that I haven't sampled the field to see what else is out there in quite some time.
I did receive an executable several years ago from Steve Karg that allowed me to pipe a comm port to Wireshark, but it only worked at one baud rate, 38.4kbps. I could send you a zip of that, but it's not a clean and easy way to do it anyway.
I have since seen some programs that will sniff on a comm port and save all of the info in a pcap file so that wireshark can open it later. That seems like the best way to go, but I need to poke around a bit to remember the details/links.
I've got some projects I'm getting into tonight so it might be a few days before I report back with the current best (free) method to sniff rs485. (Others here may beat me to the punch though)
In the meantime I've updated my profile so that my email address is in it as requested. :)
// P.S. I'm glad that BACnet isn't as scary as some had led you to believe ;)
BACnet
06-21-2011, 11:19 AM
This is a very generic BACnet packet sniffing tool, but it is freeware and it is tried and true.
Simply launch the batch file and answer questions about the baud rate and comm port. The app will automatically save data in capture files that are formatted to be opened with wireshark (basically the only decoding application anyone needs anymore).
This uses MSTPcap.exe, which is in itself open source and written by Steve Karg. This won't allow you to debug in real time, but it is fairly nice given that it's free. More details can be found in the included text files.
Link (http://dl.dropbox.com/u/1841107/MSTP_Capture.zip) (Note, I've not permanently hosted this, but I will try to keep it up and clickable for a week or three).
// Note- Ctrl-C will end the capture process and gives you some neat feedback on the individual devices on the network.
PasqualeS
06-21-2011, 06:51 PM
Thanks BACnet this is a good starting point for me.
I will attempt my first capture asap.
Pasquale
PasqualeS
03-25-2012, 04:39 PM
This issue has been finally fixed.
I installed Biasing resistors on the Eaton devices, as per manufacture schedule, and the online/offline issue disappeared.
The site it’s not experience any problem for the past few months.
At the end of the story: integrating multiple vendors can become very painful.
Pasquale :CU:
crab master
03-26-2012, 01:13 PM
Thanks for the update. Integrating can be a pain but going back to basics can go a long ways, as you found out and as I too have found on other networks without terminators/resistors...
Powered by vBulletin® Version 4.2.0 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.