Sooner or later you will run into problems that require digging a little bit deeper.
This document will guide you through some of the common issues Tile Developers run into and troubleshooting suggestions.
It is important to become familiar with the BOSH CLI and using SSH to connect to the Ops Manager VM and authenticate with the BOSH Director. Instructions on that can be found in Advanced Troubleshooting with BOSH.
-
Suggestion 1: There is a more meaningful error hiding in there! Further error logs can be found by ssh-ing into the Ops Manager VM (Instructions: Advanced Troubleshooting with BOSH) and view the Ops Manager logs:
less /var/log/opsmanager/production.log
-
Suggestion 2: SSH to Ops Manager VM (Instructions: Advanced Troubleshooting with BOSH) and run command
bosh tasks --recent
, look for the relevant (probably failed) task. Run commandbosh tasks <id> --debug
for verbose logs.
Issue: BOSH Errand in my deployment fails, and then deletes the VM where Errand is run, so I'm having trouble troubleshooting or look into logs.
- Error:
Errand <errand-name> completed with error (exit code 1)
- Suggestion: Run the errand manually and use the
--keep-alive
flag. From the Ops Manager VM, runbosh -d <deployment-id> run-errand <errand-name> --keep-alive
. Then you can use the bosh CLI tobosh ssh
into the Errand VM and view logs in/var/vcap/sys/log
. View the BOSH documentation on BOSH SSH and BOSH Errands for more info.
Issue: BOSH Deployment fails because there is a problem in one of your job start or pre-start scripts.
- Error:
Error: <job name> is not running after update. review logs for failed jobs: <job name>
- Error:
Action Failed get_task: Task <id> result: 2 of 3 pre-start scripts failed. Failed Jobs: <job-name>, <job-name>. Successful Jobs: <job-name>
- Suggestion 1: Use the bosh CLI to
bosh ssh
Instructions: Advanced Troubleshooting with BOSH) into the VM and view logs in/var/vcap/sys/log/<job-name>/.log
. View the BOSH documentation on Job Logs for more info. - Suggestion 2: Another option is to run the failing job start script (often
start.erb
) directly on the VM. SSH into the VM withbosh ssh
, and run the start script - found in/var/vcap/jobs/<job-name>/bin/<start-script>
. While on the VM, check to make sure your files have been set up correctly.
Issue: Operations Manager Web UI has crashed due after staging your Tile (clicking plus button) due to something wrong with Tile Metadata
- Suggestion 1: Use the OM tool to call the Ops Manager API and use the
om unstage-product
command to unstage your Tile. The UI should now be accessible. often the reason for this is a bosh add on with no form
- Error:
Error: Action Failed ssh: Getting host public key: OpenSSH is not running: sshd service not running and start type is disabled. To enable ssh on Windows you must run the enable_ssh job from the windows-utilities-release.
- Suggestion: You need to augment your deployment manifest with the
windows-utilities
that enable SSH for Windows VMs:
In your deployment-manifest.yml
under
instance_groups:
jobs:
add :
- name: enable_rdp
release: windows-utilities
consumes: {}
provides: {}
properties:
enable_rdp:
enabled: true
- name: enable_ssh
release: windows-utilities
consumes: {}
provides: {}
properties:
enable_ssh:
enabled: true
under releases:
add:
- name: windows-utilities
version: latest
Redeploy, then you will now be able to ssh into the VM with bosh ssh
.
Here are some more resources on troubleshooting general PCF issues (not specific for Tile Dev):
Please feel free to add to this document with issues you have faced turing Tile Development and suggested troubleshooting steps.