Skip to content

Commit

Permalink
Merge pull request #316 from RTIInternational/dev_file_load_tweaks
Browse files Browse the repository at this point in the history
Add some tweaks to the project creation code
  • Loading branch information
AstridKery authored May 6, 2024
2 parents a2fb7f9 + 43d51d3 commit 9956e50
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 15 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -31,19 +31,22 @@ <h3>Description</h3>
<p>In the <strong>Labels</strong> section, we will create categories for labeling. These labeled observations will be used to train a classification model that predicts which of these categories a new observation is most likely to be.</p>
<h3>Instructions</h3>
<p>Please fill-in below the names of the categories you are interested in predicting. If you have more than two labels, use the <code>add label</code> button to add more rows to the form. If you decide that you want to remove a label after adding it, use the <code>remove label</code> button to remove the label name.</p>
<p>You may also upload a .csv file containing labels and their descriptions (label and description columns required).</p>

<p><i>Labeling Notes:</i></p>
<ul class="list-group">
<li class="list-group-item">SMART <strong>requires at least two category labels</strong> and the labels must be <strong>unique</strong>.</li>
<li class="list-group-item">If you plan on uploading a data file that contains labels, the label categories in the file must match those provided on this page.</li>
<li class="list-group-item">You may add up to 10,000 labels to each project.</li>
<li class="list-group-item">SMART has been tested with up to 50,000 labels.</li>
<li class="list-group-item">You cannot update the labels for a project after the project is created.</li>
<li class="list-group-item">Any labels currently in the table below will be overwritten by the file data if you upload a .csv file.</li>
<li class="list-group-item">.csv's use commas to split fields. If you are using the .csv upload and have commas in your label descriptions please put double quotes "" around the description text to ensure the file reader parses it correctly.</li>
</ul>
<div class="form-group">
<label class="control-label" for="{{ wizard.form.data.id_for_label }}">You may also upload a .csv file containing labels and their descriptions (label and description columns required):</label>
<label class="control-label" for="{{ wizard.form.data.id_for_label }}">Note: Any labels currently in the table will be overwritten by the file data.</label>
<label class="control-label" for="{{ wizard.form.data.id_for_label }}"></label>
<label class="control-label" for="{{ wizard.form.data.id_for_label }}">{{ wizard.form.data.label }}</label>
<p><a href="{% static 'example-labels.csv' %}">An example dataset can be downloaded from here</a>.</p>
<hr>
<p><a href="{% static 'example-labels.csv' %}">An example dataset can be downloaded from here</a>.</p>
<input class="form-control" id="{{ wizard.form.data.id_for_label }}" maxlength="30" name="{{ wizard.form.data.html_name }}" type="file" placeholder="{{ form.data.label }}" onChange="handleUpload(event)" onclick="this.value = null;" />
<button id="rmFileBtn" class="inline-btn">remove uploaded labels</button>
</div>
Expand Down
13 changes: 2 additions & 11 deletions backend/django/core/utils/utils_form.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,18 +68,9 @@ def clean_data_helper(

labels_in_data = data["Label"].dropna(inplace=False).unique()
if len(labels_in_data) > 0 and len(set(labels_in_data) - set(supplied_labels)) > 0:
just_in_data = set(labels_in_data) - set(supplied_labels)
raise ValidationError(
"There are extra labels in the file which were not created in step 2. File supplied {0} "
"but step 2 was given {1}".format(
", ".join(labels_in_data), ", ".join(supplied_labels)
)
)

num_unlabeled_data = len(data[pd.isnull(data["Label"])])
if num_unlabeled_data < 1:
raise ValidationError(
"All text in the file already has a label. SMART needs unlabeled data "
"to do active learning. Please upload a file that has less labels."
f"There are extra labels in the file which were not created in step 2: {just_in_data}"
)

if "ID" in data.columns:
Expand Down

0 comments on commit 9956e50

Please sign in to comment.