troubleshooting.html.md.erb

---
title: Troubleshooting Deployment Problems
owner: Ops Manager
---

This guide is intended to provides assistance with resolving issues encountered when deploying products with <%= vars.ops_manager %>.

## <a id='retry'></a> Retrying the Deployment

An installation or update can fail for many reasons, but the system is self-healing, and can often
automatically correct or work around hardware or network faults.

Click **Install** or **Review Pending Changes**, then **Apply Changes** again, and the system may
resolve a problem on its own.

Some failures only produce generic errors like `Exited with 1`. In cases like this, where a failure
is not accompanied by useful information, click **Install** or **Review Pending Changes**, then
**Apply Changes** to retry.

When the system does provide informative evidence, review the [Common Issues](#common_issues)
section below to see if your problem is covered.

## <a id='common_issues'></a>Common Issues ##

Compare evidence that you have gathered to the descriptions below. If your issue is covered, try
the recommended remediation procedures.

### <a id='reinstall_bosh'></a>BOSH Does Not Reinstall ###

You might want to reinstall BOSH for troubleshooting purposes.
However, if <%= vars.platform_name %> does not detect any changes, BOSH does not reinstall.
To force a reinstall of BOSH, select **BOSH Director > Resource Sizes**
and change a resource value.
For example, you could increase the amount of RAM by 4&nbsp;MB.

### <a id='time_out'></a>Creating Bound Missing VMs Times Out ###

This task happens immediately following package compilation, but before job assignment to agents.
For example:

<pre class="terminal">
cloud_controller/0: Timed out pinging to f690db09-876c-475e-865f-2cece06aba79 after 600 seconds (00:10:24)
</pre>

This is most likely a NATS issue with the VM in question.
To identify a NATS issue, inspect the agent log for the VM.
Since the BOSH director is unable to reach the BOSH agent, you must access the
VM using another method.
You will likely also be unable to access the VM using TCP.
In this case, access the VM using your virtualization console.

To diagnose:

1. Access the VM using your virtualization console and log in.

1. Navigate to the **Credentials** tab of the
<%= vars.app_runtime_full %> (<%= vars.app_runtime_abbr %>) tile and locate the VM in question to
find the **VM credentials**.

1. Become root.

1. Run `cd /var/vcap/bosh/log`.

1. Open the file `current`.

1. First, determine whether the BOSH agent and director have successfully
completed a handshake, represented in the logs as a "ping-pong":

    <pre class="terminal">
    2013-10-03_14:35:48.58456 #[608] INFO: Message: {"method"=>"ping", "arguments"=>[],
"reply_to"=>"director.f4b7df14-cb8f.19719508-e0dd-4f53-b755-58b6336058ab"}

    2013-10-03_14:35:48.60182 #[608] INFO: reply_to:   director.f4b7df14-cb8f.19719508-e0dd-4f53-b755-58b6336058ab:
payload: {:value=>"pong"}
    </pre>

    This handshake must complete for the agent to receive instructions from the director.

1. If you do not see the handshake, look for another line near the beginning of
the file, prefixed `INFO: loaded new infrastructure settings`.
For example:

    <pre class="terminal">
    2013-10-03_14:35:21.83222 #[608] INFO: loaded new infrastructure settings:
{"vm"=>{"name"=>"vm-4d80ede4-b0a5-4992-aea6a0386e18e", "id"=>"vm-360"},
"agent_id"=>"56aea4ef-6aa9-4c39-8019-7024ccfdde4",
"networks"=>{"default"=>{"ip"=>"192.0.2.19",
"netmask"=>"255.255.255.0", "cloud_properties"=>{"name"=>"VMNetwork"},
"default"=>["dns", "gateway"],
"dns"=>["192.0.2.2", "192.0.2.17"], "gateway"=>"192.0.2.2",
"dns_record_name"=>"0.nats.default.cf-d729343071061.microbosh",
"mac"=>"00:50:56:9b:71:67"}}, "disks"=>{"system"=>0, "ephemeral"=>1,
"persistent"=>{}}, "ntp"=>[], "blobstore"=>{"provider"=>"dav",
"options"=>{"endpoint"=>"http://192.0.2.17:25250",
"user"=>"agent", "password"=>"agent"}},
"mbus"=>"nats://nats:nats@192.0.2.17:4222",
"env"=>{"bosh"=>{"password"=>"$6$40ftQ9K4rvvC/8ADZHW0"}}}
     </pre>

This is a JSON blob of key-value pairs representing the expected infrastructure
for the BOSH agent.
For this issue, the following section is the most important:

`"mbus"=>"nats://nats:nats@192.0.2.17:4222"`

This key-value pair represents where the agent expects the NATS server to be.
One diagnostic tactic is to try pinging this NATS IP address from the VM to
determine whether you are experiencing routing issues.

### <a id='RSA-cert'></a>Install Exits With a Creates/Updates/Deletes App Failure or With a 403 Error###

**Scenario 1:**
Your <%= vars.platform_name %> install exits with the following 403 error when you attempt to
log in to the Apps Manager:

<pre>
{"type": "step_finished", "id": "apps-manager.deploy"}

/home/tempest-web/tempest/web/vendor/bundle/ruby/1.9.1/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:306:in
`fetch': 403 => Net::HTTPForbidden for https://login.api.example.net/oauth/authorizeresponse_type=code&client_id=portal&redirect_uri=https%3...
-- unhandled response (Mechanize::ResponseCodeError)
</pre>

**Scenario 2:**
Your <%= vars.platform_name %> install exits with a `creates/updates/deletes an app (FAILED - 1)` error message with the following stack trace:

<pre>
1) App CRUD creates/updates/deletes an app
   Failure/Error: Unable to find matching line from backtrace
   CFoundry::TargetRefused:
     Connection refused - connect(2)
</pre>

In either of the above scenarios, ensure that you have correctly entered your
domains in wildcard format:

1. Browse to the fully qualified domain name (FQDN) of Ops Manager.

1. Click the <%= vars.app_runtime_abbr %> tile.

1. Select **HAProxy** and click **Generate Self-Signed RSA Certificate**.

1. Enter your system and app domains in wildcard format, as well as optionally
any custom domains, and click **Save**.
Refer to <%= vars.app_runtime_abbr %> **Cloud Controller** for explanations of these
domain values.

 <%= image_tag('rsa_cert.png') %>

### <a id='runtime_depend'></a>Install Fails When Gateway Instances Exceed Zero ###

If you configure the number of Gateway instances to be greater than zero for a
given product, you create a dependency on <%= vars.app_runtime_abbr %> for that product
installation.
If you attempt to install a product tile with a <%= vars.app_runtime_abbr %> dependency
before installing <%= vars.app_runtime_abbr %>, the install fails.

To change the number of Gateway instances, click the product
tile, then select **Settings > Resource sizes > INSTANCES** and change the
value next to the product Gateway job.

To remove the <%= vars.app_runtime_abbr %> dependency, change the value of this field to `0`.

### <a id='out_of_disk_space'></a>Out of Disk Space Error ###

<%= vars.platform_name %> displays an `Out of Disk Space` error if log files expand to fill
all available disk space.
If this happens, rebooting the <%= vars.platform_name %> installation VM clears the tmp
directory of these log files and resolves the error.

If users receive `Out of Disk Space` errors when trying to push apps, this can mean that
Diego cells may be running out of disk space capacity.

To perform a detailed analysis of disk usage by containers and host VMs in your <%= vars.platform_name %> deployment,
see [Examining GrootFS Disk Usage](../adminguide/examining_grootfs_disk.html).

### <a id='vsphere_fails'></a>Installing BOSH Director Fails in vSphere ###

If the DNS information for the <%= vars.ops_manager %> VM is incorrectly specified when
deploying the <%= vars.ops_manager %> .ova file, installing BOSH Director fails at the "Installing Micro BOSH" step.

To resolve this issue, correct the DNS settings in the <%= vars.ops_manager %> Virtual
Machine properties.

### <a id='delete_om_fails'></a>Deleting a Deployment Fails ###

<%= vars.ops_manager %> displays an error message when it cannot delete your installation. This
scenario might happen if the BOSH Director cannot access the VMs or is experiencing other issues.
To manually delete your installation and all VMs, you must do the following:

1. Use your IaaS dashboard to manually delete the VMs for all installed products, with the exception of the <%= vars.ops_manager %> VM.
1. SSH into your <%= vars.ops_manager %> VM and remove the `installation.yml` file from
   `/var/tempest/workspaces/default/`.
    <p class="note"><strong>Note</strong>: Deleting the <code>installation.yml</code> file does not
      prevent you from reinstalling a deployment. For future deploys,
      <%= vars.ops_manager %> regenerates this file when you click <strong>Save</strong> on any
      page in the BOSH Director tile.</p>

Your installation is now deleted.

### <a id='elastic_runtime_fails'></a>Installing <%= vars.app_runtime_abbr %> Fails ###

The following sections describe errors that may occur when installing <%= vars.app_runtime_abbr %>
and how to resolve them.

#### <%= vars.app_runtime_abbr %> Cannot Verify App Push

If the DNS information for the <%= vars.ops_manager %> VM becomes incorrect after BOSH Director
has been installed, installing <%= vars.app_runtime_abbr %> fails at the "Verifying app push" step.

To resolve this issue, correct the DNS settings in the <%= vars.ops_manager %> Virtual
Machine properties.

#### MySQL Monitor Not Running After Update

When MySQL cannot communicate with UAA, <%= vars.ops_manager %> shows the following error:
> **Error: 'mysql_monitor/12a3b456-cd7e-8fgh-9012-345678b90ijk (0)' is not running after update. Review logs for failed jobs: replication-canary.**

If you see this error, create firewall rules that allow MySQL to reach UAA, using the
[MySQL Network Communications](https://docs.pivotal.io/ops-manager/<%= product_info['local_product_version'].to_s.sub('.','-') %>/security/networking/mysql-network-paths.html) topic as a
reference.

### <a id='ip_address_taken'></a><%= vars.ops_manager %> Hangs During MicroBOSH Install or HAProxy States "IP Address Already Taken" ###

During a <%= vars.app_runtime_abbr %> installation, you might receive the following errors:

* The <%= vars.ops_manager %> GUI shows that the installation stops at the "Setting MicroBOSH
  deployment manifest" task.
* When you set the IP address for the HAProxy, the "IP Address Already Taken" message appears.

When you install the BOSH Director, you assign it an IP address. <%= vars.platform_name %>
then takes the next two consecutive IP addresses, assigns the first to MicroBOSH, and reserves the
second. For example:

```
203.0.113.1 - <%= vars.platform_name %> (User assigned)
203.0.113.2 - MicroBOSH (<%= vars.platform_name %> assigned)
203.0.113.3 - Reserved (<%= vars.platform_name %> reserved)
```

To resolve this issue, ensure that the next two subsequent IP addresses from the manually assigned
address are unassigned.

### <a id='pcf_slow'></a>Poor Performance ###

If you notice poor network performance by your deployment and your
deployment uses a Network Address Translation (NAT) gateway, 
your NAT gateway may be
under-resourced.

####<a id='troubleshoot-slow'></a>Troubleshoot

To troubleshoot the issue, set a custom firewall rule in your IaaS console to route traffic
originating from your private network directly to an S3-compatible object store. If you see
decreased average latency and improved network performance, perform the solution below to scale up
your NAT gateway.

####<a id='solution-slow'></a>Scale Up Your NAT Gateway

Perform the following steps to scale up your NAT gateway:

1. Navigate to your IaaS console.
1. Spin up a new NAT gateway of a larger VM size than your previous NAT gateway.
1. Change the routes to direct traffic through the new NAT gateway.
1. Spin down the old NAT gateway.

The specific procedures will vary depending on your IaaS. Consult your IaaS documentation for more
information.

## <a id='common_problems_firewalls'></a>Common Issues Caused by Firewalls ##

This section describes various issues you might encounter when installing
<%= vars.platform_name %> in an environment that uses a strong firewall.

### <a id='DNS_res_fails'></a>DNS Resolution Fails ###

When you install <%= vars.platform_name %> in an environment that uses a strong firewall, the
firewall might block DNS resolution. To resolve this issue, refer to
[Verify Ops Manager Resolves DNS Entries Behind a Firewall](./config_firewall.html#DNS_fails) in
_Preparing Your Firewall_.