hadoop-장애(Name node is in safe mode.)

 

회사에 정전이 일어났다.

hadoop 개발 장비가 모두 꺼졌다..

재부팅을 하고 hadoop namenode 를 구동하니 아래와 같은 에러 발생…

2013-01-03 08:30:29,803 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: hdfs://namenode:9000/data1/hadoop/filesystem/mapreduce/system
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /data1/hadoop/filesystem/mapreduce/system. Name node is in safe mode.
The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. Safe mode will be turned off automatically.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1992)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1972)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:792)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
        at org.apache.hadoop.ipc.Client.call(Client.java:1066)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
        at $Proxy5.delete(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy5.delete(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:828)
        at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:234)
        at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2410)
        at org.apache.hadoop.mapred.JobTracker.</init><init>(JobTracker.java:2192)
        at org.apache.hadoop.mapred.JobTracker.</init><init>(JobTracker.java:2186)
        at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:300)
        at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:291)
        at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4978)

Hadoop이 정상적인 종료를 하지 않았을 때, 에러가 나는 것으로 보인다.

비정상적인 종료시 hadoop 은 safe 모드로 이동하는데. 종료시 아래와 같은 명령을 내려서 restart할 때 문제가 없도록 해야 한다.

]$ ./bin/hadoop dfsadmin -safemode leave 
Safe mode is OFF

그리고 또 데이터 노드를 올리는데 이번에는 아래와 같은 에러가 난다..

2013-01-04 02:26:11,968 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /data1/hadoop, expected: rwxr-xr-x, while actual: rwxrwxrwx
2013-01-04 02:26:12,354 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid value for volsFailed : 1 , Volumes tolerated : 0
        at org.apache.hadoop.hdfs.server.datanode.FSDataset.</init><init>(FSDataset.java:951)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:380)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.</init><init>(DataNode.java:290)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1553)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1492)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1510)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1636)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1653)

확인해보니…

]$ cd /data1/
]$ ll
total 36
drwxr-xr-x  6 root      root       4096 2012-04-18 09:46 ./
drwxr-xr-x 25 root      root       4096 2012-02-27 21:31 ../
drwxrwxrwx  7 hadoop    hadoop     4096 2012-12-20 16:05 hadoop/
drwxr-xr-x  2 hbase     hbase      4096 2012-04-18 09:46 hbase/
drwx------  2 root      root      16384 2012-02-27 21:28 lost+found/
drwxr-xr-x  3 zookeeper zookeeper  4096 2013-01-03 08:32 zookeeper/

오잉? 왜 777이지?

다시 755으로 변경

]$ chmod 755 hadoop/
]$ ll
total 36
drwxr-xr-x  6 root      root       4096 2012-04-18 09:46 ./
drwxr-xr-x 25 root      root       4096 2012-02-27 21:31 ../
drwxr-xr-x  7 hadoop    hadoop     4096 2012-12-20 16:05 hadoop/
drwxr-xr-x  2 hbase     hbase      4096 2012-04-18 09:46 hbase/
drwx------  2 root      root      16384 2012-02-27 21:28 lost+found/
drwxr-xr-x  3 zookeeper zookeeper  4096 2013-01-03 08:32 zookeeper/

다시 구동해보니 정상적으로 구동완료!!

 

This entry was posted in Bigdata/Hadoop and tagged , , , . Bookmark the permalink.

댓글 남기기