Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FOGL-6891: Handle storage service restart #893

Draft
wants to merge 10 commits into
base: develop
Choose a base branch
from
Draft

Conversation

AmandeepArora
Copy link
Contributor

Storage service restart is handled in south service, north service & notification service. But this hasn't been tested a lot. Also lots of debug logs are still present in this current code.

Signed-off-by: Amandeep Singh Arora <aman@dianomic.com>
Signed-off-by: Amandeep Singh Arora <aman@dianomic.com>
Signed-off-by: Amandeep Singh Arora <aman@dianomic.com>
Signed-off-by: Amandeep Singh Arora <aman@dianomic.com>
Signed-off-by: Amandeep Singh Arora <aman@dianomic.com>
…start

Signed-off-by: Amandeep Singh Arora <aman@dianomic.com>
Signed-off-by: Amandeep Singh Arora <aman@dianomic.com>
Signed-off-by: Amandeep Singh Arora <aman@dianomic.com>
Signed-off-by: Amandeep Singh Arora <aman@dianomic.com>
Copy link
Contributor

@MarkRiddoch MarkRiddoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests are failing.

Please remove code that is not really needed.

There is lots of log output at the info level that looks at best to be debug and much that looks like it is not really needed. Merging this will cause a huge increase in logging output at info level that is meant to be a level to give the user more information, not for the code developer.

The changes are impacting the default log levels of various areas of the code and these should be reverted.

The code is littered with PRINT_FUNC, this looks like your debugging and should be removed.

The use of ps in the storage client limits the deployment options to having all services in the same machine or container, this should be avoided.

@@ -19,7 +19,7 @@
using namespace std;

// uncomment line below to get uSec level timestamps
// #define ADD_USEC_TS
#define ADD_USEC_TS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best not to checkin with this uncommented

m_urlbase << hostname << ":" << port;
m_logger->info("StorageClient c'tor 2: this=%p, m_urlbase=%s", this, m_urlbase.str().c_str());

m_logger->info("StorageClient c'tor 2: current thread ID=%u, m=%p", std::this_thread::get_id(), m);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is debug rather than info information

storage_service_urlbase.str("");
storage_service_host = "";

/* else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not needed please remove

@@ -180,6 +258,8 @@ bool StorageClient::readingAppend(const vector<Reading *>& readings)
ss << m_pid << "#" << thread_id << "_" << m_seqnum_map[thread_id].load();
sto_mtx_client_map.unlock();

m_logger->info("ss=%s", ss.str().c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really does not look like useful info for the user, should be debug at best but probably should be removed.

do
{
PRINT_FUNC;
system("ps -ef | grep fledge.services.storage | grep -v grep | wc -l > /tmp/storage_service_instance_count");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite ugly, not very portable and makes the assumption the storage service is on the same machine as the client that is using it. We have been very careful not to add that restriction so that we may distribute services across machines or containers at some point.

I suspect you can find out what you need to by just using calls to the API and looking at error codes. A TCP connection to the API port can tell you if it is running or not.

@@ -135,8 +155,11 @@ def __init__(self, core_management_host=None, core_management_port=None, is_safe

# Initialize class attributes
if not cls._logger:
cls._logger = logger.setup(__name__, level=logging.INFO)
# cls._logger = logger.setup(__name__, level=logging.DEBUG)
# cls._logger = logger.setup(__name__, level=logging.INFO)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reinstate original before merging

@@ -439,6 +479,9 @@ async def _check_schedules(self):
if schedule.exclusive and schedule_execution.task_processes:
continue

# if schedule.type == Schedule.Type.STARTUP and self._scheduler_restart:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove commented out code if it is not required

@@ -56,7 +57,7 @@
__license__ = "Apache 2.0"
__version__ = "${VERSION}"

_logger = logger.setup(__name__, level=20)
_logger = logger.setup(__name__, level=logging.DEBUG)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reinstate before merging


'''
def start_storage():
with open("/tmp/monitor.log", "w+") as file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this file used for? It looks like it is for your debugging purposes and should not be part of the final merge.

if not self._acl_handler:
self._acl_handler = ACLManager(connect.get_storage_async())
await self._acl_handler.\
resolve_pending_notification_for_acl_change(service_record._name)

check_count[service_record._id] = 1

# self._logger.debug("step D")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this if it is really not required

@AmandeepArora
Copy link
Contributor Author

This is really POC code only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants