Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop and latch contention #456

Open
4 tasks
savaresejt opened this issue Oct 29, 2024 · 8 comments
Open
4 tasks

Infinite loop and latch contention #456

savaresejt opened this issue Oct 29, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@savaresejt
Copy link

Development environment used

  • Z Open Editor version:
  • Editor Platform
    • [ x ] Visual Studio Code
    • Red Hat CodeReady Workspaces
    • Eclipse Che
    • Standalone Theia
  • Editor Platform Version:
  • Operating System (on which VS Code runs such as Windows 10 2004, otherwise name and version of platform such as OpenShift v4.3):
  • Java Version (when using VS Code or Theia, execute java -version and paste the details here):
  • Related to RSE API?
    • RSE API Plugin version:
    • Zowe CLI version: v2.15.0
    • Node.js version: v18.20.4
  • Logs attached (see here how to get them): yes/no

Problem Description

Detailed steps for reproducing the problem:

  1. First step

Observed behavior

We noticed that a user had over 15,000 dead address spaces. We contacted the user and saw a very large number of get requests coming from their zopen editor. We determined that the issue was from the editor looking for a nonexistent copybook on the LPAR. We created the missing copybook and watched the issue stop.

Afterwards we recreated the issue by removing the copybook and watching it go into an infinite loop again. This is related to this issue. #445

This is a problem with zopen editor interacting with z/osmf, we explored the recommended system settings in the tuning blog; however we feel that if a copybook is misssing 15,000 dead address spaces should not be created. Maybe only one lookup would be appropriate behavior

Expected behavior

We would expect that if a lookup failed it would not retry without the user clicking or something like that.

@phaumer
Copy link
Member

phaumer commented Oct 29, 2024

Did the ticket you opened with z/OSMF tech support not help? They pinged me for background and I pointed them to these issues and described the problem to them once again. I will send them an email to see if they can continue to help.

@phaumer phaumer added the question Further information is requested label Oct 29, 2024
@savaresejt
Copy link
Author

We believe there are two issues right now.

One is that the zopen editor client is making way too many requests. We don't believe it's good behavior if an engineer is sitting there idle with an open dataset that tens of thousands of requests would be going out and repeatedly failing to retrieve includes. Instead it should probably not retry at all, unless the developer hovers over it or something like that.

The second issue is that we don't believe z/osmf should be deadlocking the system when there are errors.

We opened a ticket with them and sent them logs for the second issue.

@phaumer
Copy link
Member

phaumer commented Oct 29, 2024

It will stop downloading when you close the file.

If a file was not found it would stop looking for it again after it searched in all the property group locations. Can you provide log files where it is requested repeatedly?

(Note, there is a logging issue with local files where it shows "Looking for local or down" multiple times, but they will all show the same request id. We will fix that, but they are not repeated requests to a remote server.)

We have to download all the files or we will not be able to parse the COBOL program completely to show syntax errors, outline view etc.

@savaresejt
Copy link
Author

Where does zopen editor store logs from outbound requests and where can I upload them? I will contact the engineer and get those to you.

When you say download all the files, do you mean the files in the dataset that the engineer is viewing and the includes, copybooks, etc. correct? Not the entire PDS?

What we are seeing in SDSF is the result of 16,000 requests to z/osmf from having one pdse member open, there are nowhere near that many includes in it.

@phaumer
Copy link
Member

phaumer commented Oct 29, 2024

Z Open Editor has a log that can be switched to the DEBUG level to provide a detailed output for how it tries to resolve include files showing you all the outgoing requests and Not Found errors when they happen as well as how it then continues searching in other locations or stop searching. See details here: https://ibm.github.io/zopeneditor-about/Docs/locating_local_client_logs.html

Yes, we only search for the include files that are used by a program currently opened in the editor: the language server tries to parse the program, finds a copy or include statement and then asks the editor to fetch it. The editor then uses the search order as defined in the zapp file. Our integration tests run with the default 5 parallel requests and can resolve programs with 1000 (small) copybooks from MVS in under a minute.

As mentioned in the other issue we have settings to control parallel execution of these requests as each concurrent request will create a new address space, but once finished the space should be reused by the next request. If that does not happen then z/OSMF tech support needs to help.

@phaumer phaumer added bug Something isn't working and removed question Further information is requested labels Nov 6, 2024
@phaumer
Copy link
Member

phaumer commented Nov 6, 2024

Turning this item into a bug as we found an issue with the listBeforeDownload setting that it would only run the list for data set members, but not the data sets themselves first.

@TommyTechh
Copy link

TommyTechh commented Nov 12, 2024

I'm just commenting to you let you guys know that this is also a problem we encounter.

Additionally, we also noticed when our users were editing the copybook names in their file it would start looking for the copybook before the user stopped typing. For example if the user wants to type "COPY DATACPY".
When you finish typing copy if there is another word after COPY on another line it will start looking for that word as a copybook. Then afterwards it would start looking for a copybook called D, then DA, then DAT, then DATA. For every character that was written.

We've also encountered that it loops when it is not able to find a copybook.
And as a result continiously create tasks until we close vscode.

We will see if listBeforeDownload and adjusting the parallel requests will help, but would using RSE also fix this issue?

@phaumer
Copy link
Member

phaumer commented Nov 12, 2024

Thanks @TommyTechh. We are fixing the data set issue. Configuring the delay for requesting the file and deciding when the language should give up requesting the file we need to discuss internally first.

The listBeforeDownload setting was added mainly for z/OSMF. RSE API logs differently and a 404 is not much of a problem there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants