Re-occuring Siemens Insight Server and Workstation Problem
For more than a year now my coworkers and I have been running into a problem that causes insight to close and not re-open until we do a complete database restoration. We are not sure what triggers of this problem and we have not been able to get a solution from Siemens either. Let me take a moment to describe the issue:
On Insight Workstations:
I have noticed that closing insight down and re-opening it is a good way to trigger this problem on a workstation. Once I open insight again, I get an error message saying something to the nature of "cannot open insight, unable to locate federated database" That is not the exact error message, I am writing this from home. If need be, I can get more specific tomorrow. Once that message has come up, that is the beginning of the end. Soon after I see this on one machine, it will happen to the other workstations and eventually something like that will happen to the server and insight will shut itself down and will not come back until the datebase has been restored on the server.
It does not seem to be an ethernet communication issue because I can ping the server from the workstations and access all shared network files on the server. The weird thing is that the server cannot access its own federated database, which is stored on its own hard disk. I hope that this is a problem that has surfaced before and has a solution. It is getting very old to have to restore the server every day. We are loosing data every time that it happens and essentially erases anything that we accomplished earlier in the day. If anyone can help I would really appreciate it and will be happy to contribute additional information if I need to. We are running Insight 3.8.
Last edited by mardvard; 12-28-2007 at 11:23 AM.
Nope, not a common problem at all. I can't remember ever having a situation like you describe.
It doesn't really sound like a database issue since you've restored it many times, did you go back to a time when you think it was stable?
Your situation is rather extreme, and so I'd just right into extreme measures.
Really, we need to determine if it's an software or hardware issue. I'd run some really intensive diag programs on the memory and the harddrive. If they come out OK, I'd uninstall Insight, reinstall and do a full setup on it and then restore a hopefully good database.
If the problem re-occurs, in the interest of getting the think working with minimal trips to the site I'd~
Get another PC and install Insight (full setup again) and do not restore a database. Just create a new database and define your panels and suck up everything out of all the panels. You will have full access to the system, you just won't have graphics, trend collection settings and a few other items. It will be a pain to try to run this way but since your problem was re-occurring every day, it shouldn't take long to determine if this is a stable setup.
If this is stable, you could very likely have a corruption in the database.
The next step would be determined by the size of your system and how PC literate you are. Meaning, you could export your graphics from the old database and restore them (manually) into the new one or you may just take the old backgrounds and re-create the dynamic portions. And recreate the rest of the stuff that you need.
Hope that gives you an idea of where to go. This all assumes that you have had a Siemens tech check the easy things already.
Thanks a lot for your timely reply to my post. We have not had the time to focus 100% on this problem because it has been quite hectic around here. We did have a reoccurrence of the problem on Friday at about 5:00 pm and it kept us at work past midnight. Eventually, we got some help from a Siemens tech and he managed to get us up and running until today. Usually we can restore the server and get it going temporarily but we were unable to last Friday. Apparently the reason that we could not successfully complete a system restore was because there were some corrupt journal entries in the server. When speaking with Siemens tech support, I learned that as end users and not Siemens employees, we do not have access to these files because we do not have some of Siemens service software.
Because of the time of this issue on Friday, I did not take the time to try and pry for information from the Siemens service tech. I am not sure if the issue surrounding our inability to complete a system restore is related to my original issue or not. I plan on pursuing this to a much greater extent tomorrow. Since we are a university, we are trying to cram a year’s worth of work into a single month while the buildings are less occupied. It is difficult to get things done right now because it’s so hectic.
Considering the age of our server (approximately 4-5 years,) it may indeed be time to replace it. There are some huge advantages that have developed since then that we might be able to take advantage of (raid, cheap ram, duel and quad core processors, 64-bit support.) We are entertaining the idea and should hear back if there is money available for such an upgrade soon. I was curious if there has been any testing of insight on windows server 2003 64 bit. If it is a possibility, I would want to keep it in consideration when selecting new hardware.
Ooof, you asked a question that I don't know the answer too... the 64 bit OS? Gah, I don't know.
I do know that I just installed a new dual CPU, dual core Xeon server for a customer with RAID 0+1, dual monitors and a few other goodies. Total overkill (tho Insight can take advantage of dual\dual CPU's, it doesn't really need it often) but I do really, really like working on that machine. It is really, really fast. It is a win2003 server.
I also hate to cut another Siemens tech's throat, but with the trouble you've had, I can't imagine not telling where and what to do with the journal files. You do actually have access to them and they are your files! I thought about mentioning those in my previous post, but from your description, orphaned JNL files are most certainly a symptom and not the cause of your trouble.
Briefly, they end with the extension JNL and in the file name is the windows process number. If you examine the windows OS and do not find a process # that matches the JNL file, that file is "orphaned". Delete all such files and reboot.
Good luck with your problem. I'll help if I can, but I won't go further in "competing" with whatever Siemens tech\branch you've contacted.
We were running Insight on Windows 2003 Server at my previous employer. It is running on two mirrored servers each with dual hot pluggable hard drives and tons of ram and Intel Zenon processors. This is a very slick and very fast system. The only thing that seems to slow this system down is when someone launches a large data retrieval report into some of our slower BLN networks which isn’t the fault of the servers. When I retired they were working on speeding these BLNs up. Under normal conditions we had 4 DDC techs all with multiple Insight applications open at the same time working on the system plus several operations level users logged on, all of which didn’t seem to slow the system down. With Windows 2003 Server you get the availability of Terminal Services which gives you the ability of connecting with a Remote Desktop to the server and Insight over the internet or intranet. I was impressed to find out that I could connect from home and run Insight faster from Remote Desktop then I could using our standard Dedicated Insight Workstation at work.
I agree completely. I also Termserv into some customers sites, over the internet. I also find that I get faster response via termserv\internet than I do with a dedicated workstation over ethernet. I'm not really sure why, the workstations don't even do that much work.
If I were in charge of the world, I'd fix that.
Since I have begun to monitor the journal files, I have not encountered a server problem! I know that you mentioned that this is merely a symptom and not a solution but with how busy we have been, we have not been able to take any of the drastic measures that you mentioned before. I am still waiting on an answer for the funding of a new server. We did however, take some measures to reduce the number of panels that were on our main BLN trunk. In the past 2 weeks we reduced our main BLN from 33 panels to 5. We split them up onto 5 different ethernet BLN's. That made quite a difference in the speed of our server. We probably snipped a couple of miles off of that old BLN. It needed to be done for quite some time.
Unfortunately, winter vacation is over and I must go back to college. Hopefully I will be able to work 1 or 2 days a week at this during the coming semester.
It is my last semester and I have a second interview lined up for an Energy Engineer position within Siemens in Chicago. I am keeping my fingers crossed.
As of revision 3.8.1, Insight software does not currently support 64-bit operating systems.
I do not believe that there are plans to include 64-bit support in revisions 3.9 or 3.9.1... but I could be wrong.
I still dont' know about the 64bit OS either, I haven't asked the people who know.
I do know 3.9 won't run on win2000 (or rather, not supported) and I'm disappointed about that.
Originally Posted by nicholasmhughes
According to our Feild Support, a 64 bit OS is niether supported nor tested.
meh, i know this thread is kind of old but anyway... I would check the event log, both system and application logs, to find out what's going on on the server side to see if an insight service is crashing or not. JNL files, and I'm sure dingman will correct me if i have this wrong, but JNL files are typically cleaned up when the process ends normally, ie to a orderly shutdown of the process. you will get orphaned JNL files when insight crashes for example. Just because you can ping the server, and access the shared drives, doesn't mean that the services haven't crashed. We recently had to baseline a workstation because one of our users got some type of spyware or virus that was using the insight connection to the server to attempt to install an executeable from the workstation to the server whenever the user was logged in. And I agree with the comments about fast servers. Our main server (we have 3 discrete sites with dedicated servers) is a 2 node cluster each node is a dual xeon with 4gigs of ram and about .75 terabyte shared storage. We have 10 active licenses and the entire BLN and MLN is ethernet. The storage ensures that I don't have to clean up our daily ATOMbacks more than monthly. They're up to around 2gb a day backup.
Welcome to Hell. Here's your copy of Metasys.
7 years after this post and my hail mary has lead me here
We are currently using insight 3.13 and we using insight 3.12 previously and this problem started about a week ago. We have been deleting the orphaned journal files as they pop up to stay ahead and stay operational. Thankfully we went right to that and never had to do some of the full system restores I read about in this post. We have been in touch with Siemens the entire week and have had our regular technician visit for 2 sraight days as he was on the phone with the IT guy trying different fixes and different issues. We moved our server one room over last week and this problem began. Of course, the blame was immediately put on the move of the server. Everything was done correct and has been verified. We have checked our ethernet cables with our fluke ethernet testr as well and have come up clean. We did have an issue at first sending packets with the fluke down to our switch for the workstations at first, we extended the ethernet cable length from 345ft to 415ft, so we placed a switch to boost the signal where the server used to be one room over. Thought this fixed things, we tried logging in and out of workstations to cause orphaned journal files yesterday after placing the switch and could not get journal files to orphan. We thought we had the fix, came in this morning, logged in and out, than back in to have the same federated database error occur again. We need some help from someone out there whom has experienced this before. If this reaches someone and you can help, thank you in advance.
Posting here to included some additional detail on this issue as we have seen it at our organization and also to indicate another post (by BaltimoreNIH) that includes a resolution.
So, as an IT individual attempting resolution of this issue, I am including the error message as experienced in our environment to assist future searches by others.
The issue itself was resolved by information included another thread on here with the title: Siemens Federated Databse Issues causing orphaned journal files. As I have a new account, I cannot direct link to it.
After starting the Objectivity transaction, it was impossible to locate the federated database. Check the OO_FD_BOOT environment variable and your application's access to the database files.
Subsystem: ATOM Identifier : 00000040