-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
G4.82 Revisit the use of scope by the Validator API #82
Comments
Regarding my original vision for scope:When I first brought this up I was picturing scoping as a grouping mechanisms where importers could give the same context to all validators in the same scope and new ones could be easily added without changes being made in the importer per se. For example, I pictured a row level scope where all validators of this type would just be given a row and know what to do with it. In this way, if we later developed a new row level validator we could just add its identifier to an array and the importer would happily add it to the list of things to check. My thoughts on the design I would like to see:In hindsight, this concept might better be described as two variables. First an inputType which indicates what the importer should pass to the validator and perhaps a general category to group them further 🤔
Validators could indicate more then one inputType they support. For example, the emptyValues validator could support both 'header-row' and 'data-row' although you would want to configure the plugin differently for each case. General NotesMe + Carolyn thought perhaps we needed a category to distinguish between header and data rows and both would have the 'row' inputType but now while writing this out I wonder if we just need to let validators support more then one inputType. We may want to discuss giving the first line of the file as an arguement to the ConfiguringRight now there are two approaches born out of this concept of 'configure the validator but pass it the same thing within the same scope'. First was @reynoldtan's take on this where he used Second was @carolyncaron's approach to pass in the context to the Hindsight makes me think maybe we would be best served by configuring the validator once at the top with a number of setters. Each instance/object of a validator could be configured differently in this way allowing the same validator to be used multiple times in a single importer. It would also mean we may not need to pass in additional context to the |
Example of using this designLets consider the trait importer as an example usage since we are all familiar with it now. I picture adding a new method to the importer class which handles configuring the validators. Here is some psudo-code on how that might look. /**
* Configure all the validators this importer uses.
*
* @param array $form_values
* An array of the importer form values provided to formValidate.
* @return array
* A listing of configured validator objects first keyed by
* their inputType. More specifically,
* - [inputType]: and array of validator instances. Not an
* associative array although the keys do indicate what
* order they should be run in.
*/
public function configureValidators($form_values) {
$validators = [];
// Metadata
// Genus Exists
$instance = $manager->createInstance('genusExists');
$validators['metadata']['genus_exists'] = $instance;
// Genus matches the configured project
$instance = use plugin manager to create instance of genusProjectMatch validator
$validators['metadata']['genus_project'] = $instance;
// ...
// File level
$instance = $manager->createInstance('fileExists');
$instance->setFileSuffix(['tsv', 'txt']);
$instance->setMimeType('text/tab-separated-values');
$validators['file']['file_exists'] = $instance;
// do the same sort of thing for any other file level validators.
// each validator would have its own set of setters although some
// could share setters across different validators that are
// defined in a parent class.
// Header Level
// All header row cells are not empty.
$instance = $manager->createInstance('emptyCell');
$instance->setIndicies([0,1,2,3,4,5]);
$validators['header-row']['header_not_empty'] = $instance;
// ...
// Data Row Level
// All data row cells in columns 0,2,4 are not empty
$instance = $manager->createInstance('emptyCell');
$instance->setIndicies([0,2,4]);
$validators['data-row']['data_not_empty'] = $instance;
// ...
$this->validatorObjects = $validators;
return $validators;
} Then when validating the file in the formValidate I see us being able to happily loop through these objects and pass them all the same things. public function formValidate($form, &$form_state) {
$form_values = $form_state->getValues();
// Configure the validators
$validators = $this->configureValidators($form_values);
// Use a variable to keep track of if one input type had recieved errors
// and only continue to the next if no errors.
$validators_failed = FALSE;
// Keep track of failed items.
// We expect the first key to be a unique id of the validator instance
// which is not just the validator id as there can be multiple instances
// of one validator. Then within that we expect line number
$failures = [];
foreach ($this->validators['metadata'] as $key => $validator) {
$result = $validator->validateMetadata($form_value);
// Check for old style...
if (!array_key_exists('valid', $result) && ($return['success'] == 'fail')) {
$validators_failed = TRUE;
$failures['metadata'][$key] = $result['details'];
}
// Then new style.
elseif (array_key_exists('valid', $result) && $result['valid'] === FALSE) {
$failures['metadata'][$key] = $result['failedItems'];
}
}
if ($validators_failed !== FALSE) {
foreach ($this->validators['file'] as $key => $validator) {
$result = $validator->validateFile($form_value['filename'], $form_values['fid']);
// Do same as above but use key file instead of metadata.
}
}
// open the file for reading.
// foreach line of the file...
$line_no = 0;
while ($row = fgets($handle)) {
$line_no++;
// Validate the header.
if ($line_no == 1) {
foreach ($this->validators['header-row'] as $key => $validator) {
$result = $validator->validateRow($row);
// Check for old style...
if (!array_key_exists('valid', $result) && ($return['success'] == 'fail')) {
$validators_failed = TRUE;
// Note: Added line number here.
$failures['header-row'][$key][$line_no] = $result['details'];
}
// Then new style.
elseif (array_key_exists('valid', $result) && $result['valid'] === FALSE) {
$failures['header-row'][$key][$line_no] = $result['failedItems'];
}
}
}
// Otherwise validate the data.
else {
foreach ($this->validators['data-row'] as $key => $validator) {
$result = $validator->validateRow($row);
// Do same as header row but use key data-row.
}
}
}
// Now importer uses $failures array to generate user friendly messages.
} This way if we wanted to add another data row level validator we would only need to add and configure it in the configureValidators() method with no changes needed to formValidate()! NOTES
Still to be dealt with
|
The above code sample was updated based on our conversation yesterday.
|
Additional documentation on return valueCurrently the validators return a very human readable array meant to be passed directly to the user. More specifically, return [
'title' => [Title describing the check meant to have a status symbol in front of it. always shown]
'status' => [either 'pass' or 'fail' depending if the validation check passes or fails]
'details' => [sentence describing the failure and providing helpful details on how to fix the issue. Has the failed items embedded in the string]
]; We've decided with the validators becoming more generic this is really hard to make a good message in the generic validator... Instead we are changing the return value to the following and the importer will make the friendly messages. return [
'case' => [a developer focused string describing the case checked],
'valid' => [either TRUE or FALSE depending on if the data is valid or not]
'failedItems' => [an array of "items" to be used in the message to the user. This is empty if the data was valid]
]; |
Tasks to get us to this design in the trait importerSee the milestone for progress: Upgrade validator API to new design
|
Branch
g4.82-[shortDescriptor]
Groups
Group 2 - Data Importing, Group 4 - API | Services | Plugins
Describe
@laceysanderson and I have had discussions regarding the use of scope by the validators. Prior to PR #77 and #79, each validator was assigned its own scope, but breaking up the previous TraitImportValues validator led to 3 validators under the same scope, which was renamed from
'TRAIT IMPORT VALUES'
to'FILE ROW'
. While this change works for the purpose of those PRs, we'd like to evaluate our use of scope in general, and if we need a new way to categorize validators to streamline their use by the Traits and Phenotype importers.Here are some things to consider:
validate()
which handles the file for each validator. The new validators instead usevalidateRow()
which does not interact with the file directly. It would make the most sense to not do any file iteration within any of the validators, and instead do it one time only in the formValidate() method of the importer. But we don't want to implement validateRow() for every validator that doesn't apply to every row of the input file.Design
Current scopes:
GENUS
FILE
HEADERS
FILE ROW
The text was updated successfully, but these errors were encountered: