회사에 정전이 일어났다.
hadoop 개발 장비가 모두 꺼졌다..
재부팅을 하고 hadoop namenode 를 구동하니 아래와 같은 에러 발생…
2013-01-03 08:30:29,803 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: hdfs://namenode:9000/data1/hadoop/filesystem/mapreduce/system
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /data1/hadoop/filesystem/mapreduce/system. Name node is in safe mode.
The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. Safe mode will be turned off automatically.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1992)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1972)
at org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:792)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1066)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy5.delete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy5.delete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:828)
at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:234)
at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2410)
at org.apache.hadoop.mapred.JobTracker.</init><init>(JobTracker.java:2192)
at org.apache.hadoop.mapred.JobTracker.</init><init>(JobTracker.java:2186)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:300)
at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:291)
at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4978)
Hadoop이 정상적인 종료를 하지 않았을 때, 에러가 나는 것으로 보인다.
비정상적인 종료시 hadoop 은 safe 모드로 이동하는데. 종료시 아래와 같은 명령을 내려서 restart할 때 문제가 없도록 해야 한다.
]$ ./bin/hadoop dfsadmin -safemode leave Safe mode is OFF
그리고 또 데이터 노드를 올리는데 이번에는 아래와 같은 에러가 난다..
2013-01-04 02:26:11,968 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /data1/hadoop, expected: rwxr-xr-x, while actual: rwxrwxrwx
2013-01-04 02:26:12,354 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid value for volsFailed : 1 , Volumes tolerated : 0
at org.apache.hadoop.hdfs.server.datanode.FSDataset.</init><init>(FSDataset.java:951)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:380)
at org.apache.hadoop.hdfs.server.datanode.DataNode.</init><init>(DataNode.java:290)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1510)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1636)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1653)
확인해보니…
]$ cd /data1/ ]$ ll total 36 drwxr-xr-x 6 root root 4096 2012-04-18 09:46 ./ drwxr-xr-x 25 root root 4096 2012-02-27 21:31 ../ drwxrwxrwx 7 hadoop hadoop 4096 2012-12-20 16:05 hadoop/ drwxr-xr-x 2 hbase hbase 4096 2012-04-18 09:46 hbase/ drwx------ 2 root root 16384 2012-02-27 21:28 lost+found/ drwxr-xr-x 3 zookeeper zookeeper 4096 2013-01-03 08:32 zookeeper/
오잉? 왜 777이지?
다시 755으로 변경
]$ chmod 755 hadoop/ ]$ ll total 36 drwxr-xr-x 6 root root 4096 2012-04-18 09:46 ./ drwxr-xr-x 25 root root 4096 2012-02-27 21:31 ../ drwxr-xr-x 7 hadoop hadoop 4096 2012-12-20 16:05 hadoop/ drwxr-xr-x 2 hbase hbase 4096 2012-04-18 09:46 hbase/ drwx------ 2 root root 16384 2012-02-27 21:28 lost+found/ drwxr-xr-x 3 zookeeper zookeeper 4096 2013-01-03 08:32 zookeeper/
다시 구동해보니 정상적으로 구동완료!!