Friday, July 20, 2007

JVM Crash

In my view JVM crash is the most dreaded problem that could ever happen to an App server on production machine. Unfortunately, one of our clients production machine has been crashing regularly with a certain version of JDK at high concurrency, We have looked at few crash reports and determined that the crash appears to be happening in one particular piece of code.

Our client being Premium partner with Sun was able to take it up with Sun Support Team. They needed some info from our team as well. So a conference call was setup.

This is how it went on

Sun: We have looked into the crash reports. But would like to know if there is any more info you could provide.
Me: Sure( I ended up saying the changes that we made).
Sun: Anything else?
Me: We might have lot of things to say. But what is that you are looking for?
Sun: I am just trying to get more info on the problem as there seems to be nothing in the logs. How did you determine that it was a problem with JDK?
Me: It crashed and produced a crash report which we shared with you, You being the developers for this JDK should be able to say more about the crash and why t happened
Sun: I do not see any specific info from the logs that you have sent
Me: But we see that there is this crash that always happens in a compiler thread.
Sun: Oh okay, that is good info. Let me forward this to my analysis team. But can you tell me where you got this info from?
Me: Did you happen to have a look at the crash logs? It says so in the logs.

After a day Sun team has come up with the outcome of the analysis. I thought it was impressive. The outcome was that there was a StackOverFlow during class-native compilation(Thanks to Rajiv for taking pains to explain me about this compilation).

There were few params that were suggested. One of them was to increase the compilerthreadstacksize. We have tried with few options 1024,2048 but to no avail. We had to go back to Sun team to report about our unsuccessful attempts. So there was one more angle that was brought into the picture. There might be some recursion in the code due to which the stackoverflow was happening. Well that sounded logical to me. But where was this happening? Since the current stacktrace in the jvm crash reported at jvm.dll, I am convinced to believe that it was happening somewhere in the native code of JVM. But the Sun team had to differ here. We wanted to know how we can check where this is happening.(All these were through email correspondenses).

Next day there was an email from the Sun team in which they have provided one way to check where the StackOverFlow was happening.

"We would like you to capture a thread dump before the crash so that we can analyze the issue. For accuracy, it would be really good if you can capture at least 3-4 thread dumps."

Whoa!!! How do I capture a thread dump before the JVM crash? I need some real Oracle to help me out in predicting the time of crash so that I can capture the thread dump before the crash!!

Well said SUN!!!!

1 comment:

Santhosh said...

Really funny. Knowing that sun JVM's team is one of the brightest around, it must be some non-jvm person who had conversation with you :)