Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] RegisterTask thread not finalized if ProvisioningDeviceClient is closed too fast #1736

Closed
gabrielandrade2 opened this issue Aug 29, 2023 · 4 comments
Assignees
Labels
bug fix checked in Fix checked into main or preview, but not yet released. IoTSDK

Comments

@gabrielandrade2
Copy link

gabrielandrade2 commented Aug 29, 2023

Context

  • OS and version used: Ubuntu 22.04.1
  • Java runtime used: openjdk version "17.0.8" 2023-07-18
  • SDK version used: 2.1.1 and 2.1.0

Description of the issue

To contextualize a bit, I have a service running under my device that implements Azure DPS and allows clients to provision their devices using symmetric keys. I have a task that periodically retries the provisioning in case of failure, however, the user can start/cancel the provisioning process and the retry task (before the device is assigned in the cloud) at any time. Provisioning is done using the async ProvisioningDeviceClient.registerDevice() method.

However, during tests with some failure cases, after killing the provisioning process at some times I would have a azure-iot-sdk-RegisterTask named thread still lingering around in a WAITING state and it never finishes.

After some debugging, I believe that this happens when the provisioning process is asked to be stopped (thus ProvisioningDeviceClient.close() is called) too quickly after calling ProvisioningDeviceClient.registerDevice().
It appears to me that, if close() is called before the task from RegisterTask is done, the thread will never finish.

I was able to create a scenario to reproduce this problem using a sample code from the SDK with some modifications to use the async method instead (attached below).
If you take a thread dump after the Press any key to exit... message appears in the console, there should be a azure-iot-sdk-RegisterTask thread running.

If you switch the commented lines and wait a longer time, all DPS threads should be cleared. (During some tests I saw a ProvisioningTask thread still running, but that one was finalized after some seconds).

Code sample exhibiting the issue

package samples.com.microsoft.azure.sdk.iot;

import com.microsoft.azure.sdk.iot.device.*;
import com.microsoft.azure.sdk.iot.device.exceptions.IotHubClientException;
import com.microsoft.azure.sdk.iot.provisioning.device.*;
import com.microsoft.azure.sdk.iot.provisioning.device.internal.exceptions.ProvisioningDeviceClientException;
import com.microsoft.azure.sdk.iot.provisioning.security.SecurityProviderSymmetricKey;

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Scanner;

/**
 * Symmetric Key authenticated individual enrollment sample
 */
@SuppressWarnings("CommentedOutCode") // Ignored in samples as we use these comments to show other options.
public class ProvisioningSymmetricKeyIndividualEnrollmentSample
{
    // The scope Id of your DPS instance. This value can be retrieved from the Azure Portal
    private static final String ID_SCOPE = "removed";

    // Typically "global.azure-devices-provisioning.net"
    private static final String GLOBAL_ENDPOINT = "removed";

    // The symmetric key of the individual enrollment. Unlike with enrollment groups, this key can be used directly.

    // For the sake of security, you shouldn't save keys into String variables as that places them in heap memory. For the sake
    // of simplicity within this sample, though, we will save it as a string. Typically this key would be loaded as byte[] so that
    // it can be removed from stack memory.
    private static final String SYMMETRIC_KEY = "removed";

    // The registration Id to provision the device to. When creating an individual enrollment prior to running this sample, you choose this value.
    private static final String REGISTRATION_ID = "removed";

    // Uncomment one line to choose which protocol you'd like to use
    private static final ProvisioningDeviceClientTransportProtocol PROVISIONING_DEVICE_CLIENT_TRANSPORT_PROTOCOL = ProvisioningDeviceClientTransportProtocol.HTTPS;
    //private static final ProvisioningDeviceClientTransportProtocol PROVISIONING_DEVICE_CLIENT_TRANSPORT_PROTOCOL = ProvisioningDeviceClientTransportProtocol.MQTT;
    //private static final ProvisioningDeviceClientTransportProtocol PROVISIONING_DEVICE_CLIENT_TRANSPORT_PROTOCOL = ProvisioningDeviceClientTransportProtocol.MQTT_WS;
    //private static final ProvisioningDeviceClientTransportProtocol PROVISIONING_DEVICE_CLIENT_TRANSPORT_PROTOCOL = ProvisioningDeviceClientTransportProtocol.AMQPS;
    //private static final ProvisioningDeviceClientTransportProtocol PROVISIONING_DEVICE_CLIENT_TRANSPORT_PROTOCOL = ProvisioningDeviceClientTransportProtocol.AMQPS_WS;

    public static void main(String[] args) throws Exception
    {
        System.out.println("Starting...");
        System.out.println("Beginning setup.");
        SecurityProviderSymmetricKey securityClientSymmetricKey;

        securityClientSymmetricKey = new SecurityProviderSymmetricKey(SYMMETRIC_KEY.getBytes(StandardCharsets.UTF_8), REGISTRATION_ID);

        ProvisioningDeviceClient provisioningDeviceClient = ProvisioningDeviceClient.create(
            GLOBAL_ENDPOINT,
            ID_SCOPE,
            PROVISIONING_DEVICE_CLIENT_TRANSPORT_PROTOCOL,
            securityClientSymmetricKey);

        provisioningDeviceClient.registerDevice(ProvisioningSymmetricKeyIndividualEnrollmentSample::registrationCallback, null);
//        Thread.sleep(5000); // Wait enough time before closing, no lingering thread.
        Thread.sleep(500); // RegisterTask should still be running when Scanner line is reached and will never finish.
        provisioningDeviceClient.close();

        System.out.println("Press any key to exit...");
        new Scanner(System.in, StandardCharsets.UTF_8.name()).nextLine();
    }

    private static void registrationCallback(
            ProvisioningDeviceClientRegistrationResult provisioningDeviceClientRegistrationResult,
            Exception exception, Object context)
    {
        System.out.println("Inside registration callback function");

        if (provisioningDeviceClientRegistrationResult.getProvisioningDeviceClientStatus() == ProvisioningDeviceClientStatus.PROVISIONING_DEVICE_STATUS_ASSIGNED)
        {
            System.out.println("IotHub Uri : " + provisioningDeviceClientRegistrationResult.getIothubUri());
            System.out.println("Device ID : " + provisioningDeviceClientRegistrationResult.getDeviceId());

        }
    }
}

Console log of the issue

Log of the code variant that presents the issue (Sleeping for 500ms)

Connected to the target VM, address: '127.0.0.1:36605', transport: 'socket'
Starting...
Beginning setup.
2023-08-30 02:15:19,518 DEBUG (main) [com.microsoft.azure.sdk.iot.provisioning.device.ProvisioningDeviceClient] - Initialized a ProvisioningDeviceClient instance using SDK version 2.1.1
2023-08-30 02:15:19,522 DEBUG (main) [com.microsoft.azure.sdk.iot.provisioning.device.ProvisioningDeviceClient] - Starting provisioning thread...
2023-08-30 02:15:19,529 INFO (global.azure-devices-provisioning.net-5c1594e3-CxnPendingConnectionId-azure-iot-sdk-ProvisioningTask) [com.microsoft.azure.sdk.iot.provisioning.device.internal.task.ProvisioningTask] - Opening the connection to device provisioning service...
2023-08-30 02:15:19,672 INFO (global.azure-devices-provisioning.net-5c1594e3-Cxn5c1594e3-azure-iot-sdk-ProvisioningTask) [com.microsoft.azure.sdk.iot.provisioning.device.internal.task.ProvisioningTask] - Connection to device provisioning service opened successfully, sending initial device registration message
2023-08-30 02:15:19,683 INFO (global.azure-devices-provisioning.net-5c1594e3-Cxn5c1594e3-azure-iot-sdk-RegisterTask) [com.microsoft.azure.sdk.iot.provisioning.device.internal.task.RegisterTask] - Authenticating with device provisioning service using symmetric key
Press any key to exit...

Thread dump of this run
Screenshot from 2023-08-30 02-15-41

Log of the code variant issue-free (Sleeping for 5 secs)

Connected to the target VM, address: '127.0.0.1:37143', transport: 'socket'
Starting...
Beginning setup.
2023-08-30 02:23:02,615 DEBUG (main) [com.microsoft.azure.sdk.iot.provisioning.device.ProvisioningDeviceClient] - Initialized a ProvisioningDeviceClient instance using SDK version 2.1.1
2023-08-30 02:23:02,619 DEBUG (main) [com.microsoft.azure.sdk.iot.provisioning.device.ProvisioningDeviceClient] - Starting provisioning thread...
2023-08-30 02:23:02,628 INFO (global.azure-devices-provisioning.net-9301ed5f-CxnPendingConnectionId-azure-iot-sdk-ProvisioningTask) [com.microsoft.azure.sdk.iot.provisioning.device.internal.task.ProvisioningTask] - Opening the connection to device provisioning service...
2023-08-30 02:23:02,739 INFO (global.azure-devices-provisioning.net-9301ed5f-Cxn9301ed5f-azure-iot-sdk-ProvisioningTask) [com.microsoft.azure.sdk.iot.provisioning.device.internal.task.ProvisioningTask] - Connection to device provisioning service opened successfully, sending initial device registration message
2023-08-30 02:23:02,741 INFO (global.azure-devices-provisioning.net-9301ed5f-Cxn9301ed5f-azure-iot-sdk-RegisterTask) [com.microsoft.azure.sdk.iot.provisioning.device.internal.task.RegisterTask] - Authenticating with device provisioning service using symmetric key
Inside registration callback function
Press any key to exit...

Thread dump of this run
Screenshot from 2023-08-30 02-15-09

timtay-microsoft added a commit that referenced this issue Sep 12, 2023
…t cleanup all threads

If the provisioning task was interrupted by the user closing the client, it needs to shutdown its executor service that runs the RegisterTask

This fix addresses #1736
@timtay-microsoft
Copy link
Member

Thanks for catching this. It looks like the thread running the ProvisioningTask (which owns the thread that runs the RegisterTask thread that is leaking) isn't catching interrupted exceptions and cleaning up appropriately.

#1741 should fix this issue for you

@timtay-microsoft timtay-microsoft self-assigned this Sep 12, 2023
@gabrielandrade2
Copy link
Author

gabrielandrade2 commented Sep 12, 2023

@timtay-microsoft, thank you very much!

timtay-microsoft added a commit that referenced this issue Sep 13, 2023
…t cleanup all threads (#1741)

If the provisioning task was interrupted by the user closing the client, it needs to shutdown its executor service that runs the RegisterTask

This fix addresses #1736
@timtay-microsoft timtay-microsoft added the fix checked in Fix checked into main or preview, but not yet released. label Sep 13, 2023
@timtay-microsoft
Copy link
Member

The fix has been checked in, and we'll try to get a release out soon

@timtay-microsoft
Copy link
Member

Version 2.1.2 has been published with this fix, some I'll close this thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix checked in Fix checked into main or preview, but not yet released. IoTSDK
Projects
None yet
Development

No branches or pull requests

2 participants