From c06565a80f27d4b7c1440d454d39a6e51d4f9c4f Mon Sep 17 00:00:00 2001 From: Heemin Kim Date: Fri, 21 Jul 2023 16:13:49 -0700 Subject: [PATCH] Rebase on main (#363) * Update gradle version to 7.6 (#265) Signed-off-by: Vijayan Balasubramanian * Exclude lombok generated code from jacoco coverage report (#268) Signed-off-by: Heemin Kim * Make jacoco report to be generated faster in local (#267) Signed-off-by: Heemin Kim * Update dependency org.json:json to v20230227 (#273) Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> * Baseline owners and maintainers (#275) Signed-off-by: Vijayan Balasubramanian * Add Auto Release Workflow (#288) Signed-off-by: Naveen Tatikonda * Change package for Strings.hasText (#314) Signed-off-by: Heemin Kim * Adding release notes for 2.8 (#323) Signed-off-by: Martin Gaievski * Add 2.9.0 release notes (#350) Signed-off-by: Junqiu Lei * Update packages according to a change in OpenSearch core (#353) Signed-off-by: Heemin Kim * Implement creation of ip2geo feature (#257) * Update gradle version to 7.6 (#265) Signed-off-by: Vijayan Balasubramanian * Implement creation of ip2geo feature * Implementation of ip2geo datasource creation * Implementation of ip2geo processor creation Signed-off-by: Heemin Kim --------- Signed-off-by: Vijayan Balasubramanian Signed-off-by: Heemin Kim Co-authored-by: Vijayan Balasubramanian * Added unit tests with some refactoring of codes (#271) * Add Unit tests * Set cache true for search query * Remove in memory cache implementation (Two way door decision) * Relying on search cache without custom cache * Renamed datasource state from FAILED to CREATE_FAILED * Renamed class name from *Helper to *Facade * Changed updateIntervalInDays to updateInterval * Changed value type of default update_interval from TimeValue to Long * Read setting value from cluster settings directly Signed-off-by: Heemin Kim * Sync from main (#280) * Update gradle version to 7.6 (#265) Signed-off-by: Vijayan Balasubramanian * Exclude lombok generated code from jacoco coverage report (#268) Signed-off-by: Heemin Kim * Make jacoco report to be generated faster in local (#267) Signed-off-by: Heemin Kim * Update dependency org.json:json to v20230227 (#273) Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> * Baseline owners and maintainers (#275) Signed-off-by: Vijayan Balasubramanian --------- Signed-off-by: Vijayan Balasubramanian Signed-off-by: Heemin Kim Co-authored-by: Vijayan Balasubramanian Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> * Add datasource name validation (#281) Signed-off-by: Heemin Kim * Refactoring of code (#282) 1. Change variable name from datasourceName to name 2. Change variable name from id to name 3. Added helper methods in test code Signed-off-by: Heemin Kim * Change field name from md5 to sha256 (#285) Signed-off-by: Heemin Kim * Implement get datasource api (#279) Signed-off-by: Heemin Kim * Update index option (#284) 1. Make geodata index as hidden 2. Make geodata index as read only allow delete after creation is done 3. Refresh datasource index immediately after update Signed-off-by: Heemin Kim * Make some fields in manifest file as mandatory (#289) Signed-off-by: Heemin Kim * Create datasource index explicitly (#283) Signed-off-by: Heemin Kim * Add wrapper class of job scheduler lock service (#290) Signed-off-by: Heemin Kim * Remove all unused client attributes (#293) Signed-off-by: Heemin Kim * Update copyright header (#298) Signed-off-by: Heemin Kim * Run system index handling code with stashed thread context (#297) Signed-off-by: Heemin Kim * Reduce lock duration and renew the lock during update (#299) Signed-off-by: Heemin Kim * Implements delete datasource API (#291) Signed-off-by: Heemin Kim * Set User-Agent in http request (#300) Signed-off-by: Heemin Kim * Implement datasource update API (#292) Signed-off-by: Heemin Kim * Refactoring test code (#302) Make buildGeoJSONFeatureProcessorConfig method to be more general Signed-off-by: Heemin Kim * Add ip2geo processor integ test for failure case (#303) Signed-off-by: Heemin Kim * Bug fix and refactoring of code (#305) 1. Bugfix: Ingest metadata can be null if there is no processor created 2. Refactoring: Moved private method to another class for better testing support 3. Refactoring: Set some private static final variable as public so that unit test can use it 4. Refactoring: Changed string value to static variable Signed-off-by: Heemin Kim * Add integration test for Ip2GeoProcessor (#306) Signed-off-by: Heemin Kim * Add ConcurrentModificationException (#308) Signed-off-by: Heemin Kim * Add integration test for UpdateDatasource API (#307) Signed-off-by: Heemin Kim * Bug fix on lock management and few performance improvements (#310) * Release lock before response back to caller for update/delete API * Release lock in background task for creation API * Change index settings to improve indexing performance Signed-off-by: Heemin Kim * Change index setting from read_only_allow_delete to write (#311) read_only_allow_delete does not block write to an index. The disk-based shard allocator may add and remove this block automatically. Therefore, use index.blocks.write instead. Signed-off-by: Heemin Kim * Fix bug in get datasource API and improve memory usage (#313) Signed-off-by: Heemin Kim * Change package for Strings.hasText (#314) (#317) Signed-off-by: Heemin Kim * Remove jitter and move index setting from DatasourceFacade to DatasourceExtension (#319) Signed-off-by: Heemin Kim * Do not index blank value and do not enrich null property (#320) Signed-off-by: Heemin Kim * Move index setting keys to constants (#321) Signed-off-by: Heemin Kim * Return null index name for expired data (#322) Return null index name for expired data so that it can be deleted by clean up process. Clean up process exclude current index from deleting. Signed-off-by: Heemin Kim * Add new fields in datasource (#325) Signed-off-by: Heemin Kim * Delete index once it is expired (#326) Signed-off-by: Heemin Kim * Add restoring event listener (#328) In the listener, we trigger a geoip data update Signed-off-by: Heemin Kim * Reverse forcemerge and refresh order (#331) Otherwise, opensearch does not clear old segment files Signed-off-by: Heemin Kim * Removed parameter and settings (#332) * Removed first_only parameter * Removed max_concurrency and batch_size setting first_only parameter was added as current geoip processor has it. However, the parameter have no benefit for ip2geo processor as we don't do a sequantial search for array data but use multi search. max_concurrency and batch_size setting is removed as these are only reveal internal implementation and could be a future blocker to improve performance later. Signed-off-by: Heemin Kim * Add a field in datasource for current index name (#333) Signed-off-by: Heemin Kim * Delete GeoIP data indices after restoring complete (#334) We don't want to use restored GeoIP data indices. Therefore we delete the indices once restoring process complete. When GeoIP metadata index is restored, we create a new GeoIP data index instead. Signed-off-by: Heemin Kim * Use bool query for array form of IPs (#335) Signed-off-by: Heemin Kim * Run update/delete request in a new thread (#337) This is not to block transport thread Signed-off-by: Heemin Kim * Remove IP2Geo processor validation (#336) Cannot query index to get data to validate IP2Geo processor. Will add validation when we decide to store some of data in cluster state metadata. Signed-off-by: Heemin Kim * Acquire lock sychronously (#339) By acquiring lock asychronously, the remaining part of the code is being run by transport thread which does not allow blocking code. We want only single update happen in a node using single thread. However, it cannot be acheived if I acquire lock asynchronously and pass the listener. Signed-off-by: Heemin Kim * Added a cache to store datasource metadata (#338) Signed-off-by: Heemin Kim * Changed class name and package (#341) Signed-off-by: Heemin Kim * Refactoring of code (#342) 1. Changed class name from Ip2GeoCache to Ip2GeoCachedDao 2. Moved the Ip2GeoCachedDao from cache to dao package Signed-off-by: Heemin Kim * Add geo data cache (#340) Signed-off-by: Heemin Kim * Add cache layer to reduce GeoIp data retrieval latency (#343) Signed-off-by: Heemin Kim * Use _primary in query preference and few changes (#347) 1. Use _primary preference to get datasource metadata so that it can read the latest data. RefreshPolicy.IMMEDIATE won't refresh replica shards immediately according to #346 2. Update datasource metadata index mapping 3. Move batch size from static value to setting Signed-off-by: Heemin Kim * Wait until GeoIP data to be replicated to all data nodes (#348) Signed-off-by: Heemin Kim * Update packages according to a change in OpenSearch core (#354) * Update packages according to a change in OpenSearch core Signed-off-by: Heemin Kim * Update packages according to a change in OpenSearch core (#353) Signed-off-by: Heemin Kim --------- Signed-off-by: Heemin Kim --------- Signed-off-by: Vijayan Balasubramanian Signed-off-by: Heemin Kim Signed-off-by: Naveen Tatikonda Signed-off-by: Martin Gaievski Signed-off-by: Junqiu Lei Co-authored-by: Vijayan Balasubramanian Co-authored-by: mend-for-github-com[bot] <50673670+mend-for-github-com[bot]@users.noreply.github.com> Co-authored-by: Naveen Tatikonda Co-authored-by: Martin Gaievski Co-authored-by: Junqiu Lei