FlexCache Monitoring in Harvest #2501
rahulguptajss
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
FlexCache Performance Metrics
This discussion focuses on the solution for issue #2428. The customer is interested in obtaining the following information at the volume level:
Relevant documentation:
Time to Retrieve Items
FlexCache is essentially a FlexGroup volume, so normal performance counters will explain how much time it took to retrieve data from that FlexCache volume.
According to TR-4743, we can identify the latency of operations that are forwarded from the FlexCache volume to the origin volume. Understanding this latency can provide insights into potential delays.
spinhi_flexcache_forward_latency
: This metric represents the average latency of FlexCache operations that are forwarded. This latency is an array.spinhi_flexcache_forward_max_time
: This metric represents the maximum time taken in receiving a response from the origin for the FlexCache operations that are forwarded. This is also an array.Oldest Data in the Cache Volume
Currently, there are no available metrics for this.
Cache Miss Rate
spinhi_flexcache_forward_fileops
: This metric represents the count of file operations that are forwarded in FlexCache. This is also an array.We can use the
flexcache_per_volume
object from perf which only provides performance data for FlexCache and its origin volume. The following metrics are useful in calculating the cache miss rate:blocks_requested_from_client
: This counter shows the total number of blocks requested by the client. A high number here could indicate heavy usage of the system.blocks_retrieved_from_origin
: This counter shows the total number of blocks retrieved from the origin. If this number is high relative to the number of blocks requested by the client, it could indicate that the cache is not being effectively used, as data is being frequently retrieved from the origin rather than the cache.The cache miss rate can be calculated as
blocks_retrieved_from_origin/blocks_requested_from_client
.Additional Metrics
Some other metrics from
flexcache_per_volume
which may be helpful:evict_skipped_reason_disconnected
,evict_skipped_reason_config_noent
, andevict_skipped_reason_offline
indicate the number of times an eviction (removal of data from the cache) was skipped due to cache disconnection or config issues.invalidate_skipped_reason_config_noent
,invalidate_skipped_reason_disconnected
, andinvalidate_skipped_reason_offline
show the number of times an invalidate operation was skipped due to cache disconnection or config issues.reconciled_data_entries
andreconciled_lock_entries
show the total number of reconciled data and lock entries at the cache side. A high number of reconciled data entries could indicate that the system is frequently having to update the cache to match the primary storage, which could be a sign of performance issues. Conversely, a low number could indicate that the cache is staying consistent with the primary storage, suggesting good performance.nix_skipped_reason_config_noent
,nix_skipped_reason_disconnected
,nix_skipped_reason_in_progress
, andnix_skipped_reason_offline
show the number of nix operations that were skipped for various reasons such as cache offline.Overall, the
flexcache_per_volume
object seems to be a better fit and provides better insight for the customer but it is not mentioned in TR-4743.The
spinhi
object seems spammy. This object provides metrics for all volumes so we want to filter these records to get just flexcache volume metrics. Each of array metrics fromspinhi
has ~90 values. Below are these array values.Null,Access,Close,Create,DiscardCred,Downgrade,FhToPath,FoldFile,GetAttr,GetAttrBulk,GetCred,GetParent,GetRootFh,Link,Lock,LockGrantedAck,Lookup,MoveLocks,Open,Prefetch,Read,Readdir,Readlink,ReclaimLocksAck,Remove,Rename,ScanMatchingLocks,ScanFile,Setattr,SetCred,Share,Unlink,Unshare,Watch,Write,NrvGetAttrSnapInofile,FhToPath2,ReadLink2,Create2,ChangeFileType,ReplayFindBin,ReplayGetOpInfo,ReplayReleaseBin,Audit,GetVolumeLanguage,GetFsAttrs,CancelFileop,GetQuota,FreeBlocks,WriteZeros,VdiskSetAttr,VdiskGetAttr,VdiskDestroy,SetLuMetadata,GetLuMetadata,Disconnect,Reconnect,MakeResilient,OpenToName,CopyNotify,CopyRevoke,Copy,CopyStatus,CopyAbort,CopyAuthcheck,Close2,CopyOffload,CopyOffloadStatus,TokenCreate,TokenCopy,TokenStatus,TokenAbort,TokenAuthorize,TokenComplete,CopyAuthcheck2,CompareWrite,QueryHoles,FreeBlocks2,DiskOpen,DiskClose,DiskRead,DiskWrite,DiskCancel,DiskSegmentMap,OffloadLunImport,DiskCompareWrite,VdiskMobilityStatus,ReplayReleaseBinLabel,FliWriteSplit,N2N,ListXAttrs
Beta Was this translation helpful? Give feedback.
All reactions