>
> Hi,
>
> From your information it is hard to say something about it. The bug can be in FreeBSD, OpenJDK (the Oracle part or in the BSD port part), in Hadoop or in your own code running on top of Hadoop.
>
> My first idea would be to eliminate some of the possibilities.
> - Run a Linux machine with the same versions of the software.
> - Try FreeBSD 9-stable.
I will try at least FreeBSD-9 soon (never used Linux so it will take more time and not so relevant because I want to continue to use FreeBSD, not just move to Linux)
> - Try an older version of OpenJDK on FreeBSD.
I already tried latest versions of openjdk-6/7/8 from ports.
7 and 8 behaves the same way (as I described in my original e-mail). Below is the output of jstack for openjdk7 (java process running taskttacker):
46897 hadoop 147 21 0 1927M 625M uwait 22 14:31 7.86% java
/tmp# jstack -l 46897
46897: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
/tmp# jstack -F -l 46897>/tmp/jstack.out
Attaching to process ID 46897, please wait...
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
at sun.tools.jstack.JStack.main(JStack.java:102)
Caused by: sun.jvm.hotspot.debugger.UnalignedAddressException: 746f705b762f4867
at sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal$1.checkAlignment(BsdDebuggerLocal.java:183)
at sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readCInteger(BsdDebuggerLocal.java:485)
at sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:454)
at sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readAddress(BsdDebuggerLocal.java:430)
at sun.jvm.hotspot.debugger.bsd.BsdAddress.getAddressAt(BsdAddress.java:74)
at sun.jvm.hotspot.HotSpotTypeDataBase.readVMTypes(HotSpotTypeDataBase.java:154)
at sun.jvm.hotspot.HotSpotTypeDataBase.<init>(HotSpotTypeDataBase.java:85)
at sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:573)
at sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:494)
at sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:332)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:163)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
... 6 more
/tmp#
(jstack.out file is empty)
openjdk-6 is different: during shuffle phase (when portions of intermediate data are copied between data nodes), java process running tasktracker consumes a lot of CPU (300-400%), and it is often in "vm map" state. Data transfer is very-very slow (1MB/sec and less on 1GB network). With openjdk7/8 network is utilized for about 40% (~40MB/sec), it is acceptable though the question why isn't it 100MB/sec still stands. So shuffle phase is almost stuck with openjdk6. But if you wait long enough to finish this, tasktrackers in idle state behave as expected (do not consume CPU). Below is the output of top(1) and jstack:
35291 hadoop 209 22 0 1922M 461M vm map 17 46.5H 336.08% java
/tmp# jstack -l 35291
35291: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
/tmp# jstack -F -l 35291>/tmp/jstack.out
Attaching to process ID 35291, please wait...
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces at java.lang.reflect.Method.invoke(Method.java:622)
at sun.tools.jstack.JStack.runJStackTool(JStack.java:136)
at sun.tools.jstack.JStack.main(JStack.java:102)
Caused by: sun.jvm.hotspot.debugger.UnalignedAddressException: 746f705b762f4867
at sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal$1.checkAlignment(BsdDebuggerLocal.java:183)
at sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readCInteger(BsdDebuggerLocal.java:480)
at sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:454)
at sun.jvm.hotspot.debugger.bsd.BsdDebuggerLocal.readAddress(BsdDebuggerLocal.java:425)
at sun.jvm.hotspot.debugger.bsd.BsdAddress.getAddressAt(BsdAddress.java:74)
at sun.jvm.hotspot.HotSpotTypeDataBase.readVMTypes(HotSpotTypeDataBase.java:154)
at sun.jvm.hotspot.HotSpotTypeDataBase.<init>(HotSpotTypeDataBase.java:85)
at sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:572)
at sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:493)
at sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:331)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:163)
at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
... 6 more
/tmp#
(/tmp/jstack.out file is empty)
> - Try a very simple 'Hello world' style application on Hadoop which mimics the thread usage.
>
> Did you ever run your Hadoop application on FreeBSD before without this symptom? If so, what are the differences between then and now?
No, it is just my first install of hadoop and I use bundled terasort test suite (hadoop jar /usr/local/share/examples/hadoop/hadoop-examples-1.2.1.jar terasort <...>)
Since it is the problem with tasktracker (it does not run user-supplied code, it just schedules tasks and performs cleanups), so it is hardly relevant which particular task I execute.
>
> On 16 июня 2014 г., at 18:45, Ronald Klop <ronald...@klop.ws> wrote:
>
>>
>> Hi,
>>
>> From your information it is hard to say something about it. The bug can be in FreeBSD, OpenJDK (the Oracle part or in the BSD port part), in Hadoop or in your own code running on top of Hadoop.
>>
>> My first idea would be to eliminate some of the possibilities.
>> - Run a Linux machine with the same versions of the software.
>> - Try FreeBSD 9-stable.
>
> I will try at least FreeBSD-9 soon (never used Linux so it will take more time and not so relevant because I want to continue to use FreeBSD, not just move to Linux)
>
Okay, I setup a similar configuration based on FreeBSD-9/stable.
I see the same problems with both openjdk6 and openjdk7 described in my previous e-mail.