Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NameNode HA: HDFS access gives error "Operation category READ is not supported in state standby" #54

Open
juv opened this issue Jul 6, 2018 · 7 comments

Comments

@juv
Copy link

juv commented Jul 6, 2018

I've set up NameNode HA. A problem is that my service will route to either active namenode or standby namenode. For example, I have a Spark History server running with parameter SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://hadoop-hdfs-nn.my-namespace.svc:9000/shared/spark-logs".
The following error Operation category READ is not supported in state standby occurs:

Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:278)
	at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1779)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setSafeMode(NameNodeRpcServer.java:1063)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setSafeMode(ClientNamenodeProtocolServerSideTranslatorPB.java:739)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

	at org.apache.hadoop.ipc.Client.call(Client.java:1475)
	at org.apache.hadoop.ipc.Client.call(Client.java:1412)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy10.setSafeMode(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setSafeMode(ClientNamenodeProtocolTranslatorPB.java:666)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy11.setSafeMode(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:2596)
	at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:1223)
	at org.apache.spark.deploy.history.FsHistoryProvider.isFsInSafeMode(FsHistoryProvider.scala:673)
	at org.apache.spark.deploy.history.FsHistoryProvider.isFsInSafeMode(FsHistoryProvider.scala:666)
	at org.apache.spark.deploy.history.FsHistoryProvider.initialize(FsHistoryProvider.scala:159)
	at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:156)
	at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:78)
... 6 more

I guess that the Spark History Server has accessed the k8s service and got routed to the standby namenode instead of the active namenode. Have you thought about this problem @kimoonkim? I have an idea to solve this:

  1. add another service hadoop-hdfs-active-nn
  2. select only active namenode pod
  3. have another lightweight pod running that keeps track of the active namenode
  4. set the label selector to the new active namenode when the active namenode is switched

Then, let all clients connect to the hadoop-hdfs-active-nn service instead of hadoop-hdfs-nn. I am not sure if it is possible to set the label of a service from within a pod though...

@vvbogdanov87
Copy link
Contributor

You should configure a client to work with HDFS in HA mode manually. You can get params from hdfs-site.xml
kubectl describe configmap hdfs-config
Some apps can read the params from hdfs-config.xml Just put hdfs-config.xml to /etc/hadoop/conf
Some apps can be configured using flags. I use next syntax to launch spark(2.3.1) jobs on k8s:

spark-submit ... \
    --conf spark.hadoop.fs.defaultFS="hdfs://hdfs-k8s" \
    --conf spark.hadoop.fs.default.name="hdfs://hdfs-k8s" \
    --conf spark.hadoop.dfs.nameservices="hdfs-k8s" \
    --conf spark.hadoop.dfs.ha.namenodes.hdfs-k8s="nn0,nn1" \
    --conf spark.hadoop.dfs.namenode.rpc-address.hdfs-k8s.nn0="hdfs-namenode-0.hdfs-namenode.default:8020" \
    --conf spark.hadoop.dfs.namenode.rpc-address.hdfs-k8s.nn1="hdfs-namenode-1.hdfs-namenode.default:8020" \
    --conf spark.hadoop.dfs.client.failover.proxy.provider.hdfs-k8s="org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" \

Homegrown apps can be configured via code changes https://stackoverflow.com/a/35911455

@maver1ck
Copy link

I did the same mounting secrets with core-site and hdfs-site into Spark app.
The options you gave didn't work.

@xianliangz
Copy link

We have similar idea with @juv and I am implementing it. Basically We create a ZooKeeper watcher to get notification on the event of ZooKeeper znode /hadoop-ha/ns/ActiveStandbyElectorLock. Then we will get the info about which NameNode is active. We label the active NameNode so that the k8s service only route requests to active NameNode. In this way, client don't need to retry to figure out which is active.
I would like to seek feedback on this solution from experts.
Thank you!

@dennischin
Copy link

@maver1ck were you able to get this working with configmaps? I am also trying with configmaps and I am failing as well.

My hdfs-ha cluster name is not resolvable even though all my info is properly mounted in configmaps and the core-site.xml and hdfs-site.xml files are properly mounted.

Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: hdfs-k8s
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1866)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:721)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:496)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:811)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:803)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:130)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:803)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:375)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: hdfs-k8s

@vvbogdanov87
Copy link
Contributor

vvbogdanov87 commented Sep 20, 2019

@dennischin Can you connect to hdfs-client pod and traverse files on hdfs?

kubectl get pod -l=app=hdfs-client
kubectl exec -ti <podname> bash
hadoop fs -ls /
hadoop fs -put /local/file /

You can find configuration on hdfs-client pod /etc/hadoop-custom-conf/

@dennischin
Copy link

@vvbogdanov87 , yes i can interact with hdfs (outside of spark) without issue.

@geekyouth
Copy link

vim spark-evn.sh

export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=30 -Dspark.history.fs.logDirectory=hdfs://mycluster/spark-job-log"
YARN_CONF_DIR=/opt/hadoop-2.7.2-ha/etc/hadoop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants