Updating the database with the new feature names as they become available. #19

golnazads · 2024-11-04T12:36:57Z

As new feature names become available, users will extract USGS terms and execute a pipeline command to check for any names that need to be added to the database. The CSV file used in this process must contain the following six columns: Feature_ID, Clean_Feature_Name, Target, Feature_Type, Approval_Date, and Approval_Status.

1- The pipeline reads this file and only considers rows where Approval_Status is set to "Approved." It then compares each Feature_ID with existing entities in the database to identify which rows should be newly added.

2- When adding a feature name, the pipeline first checks for any new targets associated with these names and inserts those targets if they don’t already exist. Next, it verifies if the feature type is present for the specified target, adding it if it’s missing. The pipeline also updates a separate table containing unique feature names without links to targets or feature types. If a feature name has not been used before, it is inserted into this table; if it has been used for another celestial body, it will not be added again.

3- Additional checks are in place for multi-word and ambiguous feature names. Specifically, the pipeline manages cases where one word in a multi-word name may represent another feature name (e.g., “C Herschel,” “C Herschel C”). It also identifies cases where a feature name is used for multiple celestial bodies (e.g., “Herschel Crater” on Mars, the Moon, and Mimas). The pipeline will update the database for these names as needed. However, if a name has other contextual uses (e.g., “Herschel” also refers to an asteroid), the pipeline will generate a log message. In this case, a power user will need to review and identify any additional contexts. The pipeline includes a list of new feature names for the power user to check. Once any new context is identified, the developer should be informed so they can add it to the database.

4- Update repo with the latest USGS feature names.

golnazads · 2024-11-04T12:37:10Z

#1 and #2 completed.

golnazads · 2024-11-12T14:24:46Z

#3 and #4 completed.

golnazads · 2024-11-12T15:37:38Z

While importing the latest USGS feature names extracted on November 7, 2024, I noticed that the approval status column, in addition to Approved and Not Approved, also includes Dropped. As of the above date, there are 199 feature names that were marked as dropped. The approval dates range from 1958 to 2015, with three instances missing a date. There is no information about when these feature names were dropped, so it is unclear how long they were included. The top four categories of dropped features types are as follows, with the corresponding number of features names in parentheses: Satellite Feature (50), Crater, craters (37), Patera, paterae (20), and Rima, rimae (10).

My recommendation is to retain these entries in the database and treat them as historically active. My reasoning is that they were used at some point and may have historical relevance. I suggest adding an additional column to the feature name table to indicate that these features are currently dropped, also include this information to be passed to Solr, informing users of their dropped status.

@aaccomazzi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating the database with the new feature names as they become available. #19

Updating the database with the new feature names as they become available. #19

golnazads commented Nov 4, 2024 •

edited

Loading

golnazads commented Nov 4, 2024

golnazads commented Nov 12, 2024

golnazads commented Nov 12, 2024

Updating the database with the new feature names as they become available. #19

Updating the database with the new feature names as they become available. #19

Comments

golnazads commented Nov 4, 2024 • edited Loading

golnazads commented Nov 4, 2024

golnazads commented Nov 12, 2024

golnazads commented Nov 12, 2024

golnazads commented Nov 4, 2024 •

edited

Loading