Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

G4.88 Split FILE validator into new style validators. #88

Closed
laceysanderson opened this issue Jun 26, 2024 · 3 comments
Closed

G4.88 Split FILE validator into new style validators. #88

laceysanderson opened this issue Jun 26, 2024 · 3 comments
Assignees
Labels
Group 2 - Data Importing Any issue relating to importing of biological data into either Chado or any other database. Group 4 - API | Services | Plugins Any issue related to developing an API (i.e. services + plugins)

Comments

@laceysanderson
Copy link
Member

laceysanderson commented Jun 26, 2024

Branch

g4.88-fileValidators

Groups

Group 2 - Data Importing, Group 4 - API | Services | Plugins

Dependencies

Describe

This issue is meant to upgrade the existing DataFile to the new style described in Issue #82.

Does not remove the original validator! Does not update trait importer to use this new validator.

Design

This should be split into two validators.

The first one is a generic check the file is valid without looking it in. This will be in inputType file and be given the filename with full path and the fid (if uploaded). It will check the following:

  • Has Drupal File Id assigned/created.
  • A valid file extension based on what's configured.
  • Matches mime type configured.
  • File Id created/assigned can be loaded.
  • Is not empty file.

The other one is specific to tsv format and has an inputType of header and will be given the first row of the file as an array. It will implement validateRow and check that the contents of the first row are in TSV format.

@laceysanderson laceysanderson added Group 2 - Data Importing Any issue relating to importing of biological data into either Chado or any other database. Group 4 - API | Services | Plugins Any issue related to developing an API (i.e. services + plugins) labels Jun 26, 2024
@reynoldtan
Copy link
Contributor

reynoldtan commented Jul 12, 2024

/**
 * input_types: {"file"} 
 */

class validateFile {
  public function validateFileObject($filename, $fid = NULL) {
    // Parameter check:
    Check $filename is a valid path and is accessible (when provided).
    Throw an exception if invalid file path.

    Check $fid is 0 or not integer value (when provided).
    Throw an exception if non-integer value.


    // Has File ID and Id can be loaded check
    // Load $filename (load by uri) or $fid (load by file id)
    Failed if it cannot load the file. File has no file id assigned/created.
    
    // File exists check.
    // Reference file uri and check if it exists in the filesystem.
    // ie: file_exists($file->uri);
    Failed if the file does not exists.
   
    // MIME and file extension check.
    // Get the file mime type provided by the Drupal File System object.
    // ie: $file->getMimeType();
    Failed if type/extension is not the expected mime type configured.
    
    // File is not empty check.
    // Get the file size provided by the Drupal File System object. 
    // ie: filesize($file->uri);
    Failed if file size is 0
  }
}

@reynoldtan
Copy link
Contributor

reynoldtan commented Jul 12, 2024

/**
 * input_types: {"header"} 
 */

class validTsvFile {
  public function validateRow($header_row)
    //  It will implement validateRow and check that the contents of the first row are in TSV format.
  }
}

Unsure what to do here 😟

@reynoldtan
Copy link
Contributor

reynoldtan commented Sep 10, 2024

A validator has been set to validate only the tsv file format, but since the module may support other file formats such as CSV, this validator will need to be redesigned.

Design changes:

  1. Rename the class from validTsvFile to ValidDelimitedFile.
  2. Add a new input type - raw-row (which can be a data row or the header row).
  3. Implement a validateRawRow method to check a raw line for:
    A. Line is not empty
    B. Some delimiter to separate values.
    C. Use of other delimiter is escaped or in quotes.
    D. Call the split row into column method to see that the line can be split and that
    the number of values returned matches the expected number of values. For example, the importer expects
    10 values, then the split step should also return 10 values.
  • A setter/getter for number of expected values is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Group 2 - Data Importing Any issue relating to importing of biological data into either Chado or any other database. Group 4 - API | Services | Plugins Any issue related to developing an API (i.e. services + plugins)
Projects
None yet
Development

No branches or pull requests

2 participants