-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
调用自定义组件失败 #486
Comments
1.确认下双方路由是正常的,kubectl get dr -A |
1.在其中一方运行kubectl get dr -A,能看到另一方是认证过的; |
kubectl describe node看下 |
Name: idata-kuscia-autonomy-com2023011620063473637 NetworkUnavailable False Tue, 19 Nov 2024 14:11:58 +0800 Tue, 19 Nov 2024 14:11:58 +0800 RouteCreated RouteController created a route com2023011620063473637 serving-2024111117494129039-7499b7d4f4-twmqc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 55d cpu 100m (0%) 32 (200%) |
通过api得到的任务报错信息为 |
清理下内部的pod,kubectl delete pod ‘ pod名’,然后任务重试下 |
您好,清理pod之后问题仍没有解决。目前的状态是假设有a,b,c三方,我在a上发起任务,a的状态是running,b和c的状态都是AwaitingApproval。调用approve接口去解决AwaitingApproval,返回的结果是success,但是状态没有改变。调用query接口查看a的报错信息仍是domain [com2023011620060497797,com2023011620072311738] can not reserve resources for pods |
而且a b c任意两方的任务是可以跑通的,不会出现等待验证,只有三方会出现这一状态。 |
三方的日志方便提供一下吗? 【/home/kuscia/var/logs/envoy】kuscia流量出入口日志信息 |
k3s.log I0108 10:57:49.963055 27 pathrecorder.go:248] kube-aggregator: "/apis/kuscia.secretflow/v1alpha1/domains/com2023011620072311738" satisfied by prefix /apis/kuscia.secretflow/v1alpha1/ |
kuscia.log |
可以提供一下你任务发起方的路由配置吗? |
这个11739不在abc三个之中,是一个已经停掉的节点。 |
2025-01-08 14:18:16.315 INFO queue/queue.go:124 Finish processing item: queue id[domain-controller], key[com2023011620072311738] (102.389µs) |
我仔细查看了协作方的日志,发现报错的原因是找不到要执行的task。两个协作方都是同样的原因。 |
2025-01-08 14:36:40.394 INFO queue/queue.go:124 Finish processing item: queue id[taskresourcegroup-controller], key[jianew-job-psi] (380.575µs) |
然后这是发起方的日志,一直在等待协作方保留pod。 |
|
原因找到了,是kuscia版本的问题。从0.8 升级到0.13就可以了,看来低版本跑多方任务存在问题。 |
Issue Type
Api Usage
Search for existing issues similar to yours
Yes
Kuscia Version
0.8.0b0
Link to Relevant Documentation
No response
Question Details
The text was updated successfully, but these errors were encountered: