- For our Fall 2023 Hackathon Project topic we chose web scrapping.
- All SCSU students have to take General Liberal Education courses. For many finding the right courses to take that make financial sense and are fun and appealing to the student can be hard. So we set out to fix it.
- Our Solution is to leverage Web Scraping to gather a students Degree Audit and Unofficial Transcript to figure out what classes the student has completed. To then use our revolutionary in-house algorithm. Leveraging cutting-edge AI and machine learning techniques, we've crafted a state-of-the-art solution to optimize course schedules. We then use this algorithm to generate a custom report for the student to help better plan their future semesters.
The main problem is the lack of guidance and information of courses available to student's. It's sometimes unclear what classes provide more value to the student or what classes have a less favorable professor.
**(Figure 1)** SCSU requires students to take classes from 10 Separate goal areas plus Diversity and Cultural requirements. The university conveniently hosts all of this information on their website. This is the foundation for our web scrapper.Name | John Doe |
---|---|
Student ID | 123456789 |
Major | Computer Science |
Academic Advisor | Dr. Jane Smith |
Term | Course Code | Course Title | Credits | Grade |
---|---|---|---|---|
Fall 2019 | CS 101 | Intro to Computing | 4 | A |
Fall 2019 | MATH 150 | Calculus I | 4 | B+ |
Spring 2020 | CS 102 | Data Structures | 4 | A- |
Spring 2020 | ENG 101 | English Composition | 3 | B |
Fall 2020 | CS 201 | Algorithms | 4 | B+ |
Fall 2020 | STAT 200 | Statistics | 3 | A |
Spring 2021 | CS 301 | Operating Systems | 4 | A- |
Spring 2021 | PHIL 105 | Ethics in Tech | 3 | B+ |
Total Credits Earned | 29 |
---|---|
Cumulative GPA | 3.55 |
Note: This is an unofficial transcript.
Sketch/Draft
Final
graph TD;
A["Browser & Username + Password"] --> B["API Request"];
B --> C["Functions Kickoff"];
C --> D["Degree Audit Parser"];
C --> E["Transcript Parser"];
E --> F["Text Formatter"];
F --> G["Points Formula"];
D --> G
G --> H["API Response"]
H --> A
Sketch/Draft
Final
// Function to populate the accordion with data
//
function populateAccordion(data) {
const accordionContainer = document.getElementById('accordTemplate');
Object.keys(data).forEach((mainCategory, mainIndex) => {
let subAccordions = '';
const mainAccordionId = `collapseMain${mainIndex}`;
data[mainCategory].forEach((subCategoryObj) => {
const subCategoryKey = Object.keys(subCategoryObj)[0];
const subCategoryData = subCategoryObj[subCategoryKey];
const subAccordionId = `collapseSub${mainIndex}${subCategoryKey.replace(
/[^a-zA-Z0-9]/g,
''
)}`;
subAccordions += `
<div class="accordion-item">
<h2 class="accordion-header" id="subHeading${subCategoryKey}${mainIndex}">
<button class="accordion-button collapsed" type="button" data-bs-toggle="collapse" data-bs-target="#${subAccordionId}" aria-expanded="false" aria-controls="${subAccordionId}">
${subCategoryKey}
</button>
</h2>
<div id="${subAccordionId}" class="accordion-collapse collapse" aria-labelledby="subHeading${subCategoryKey}${mainIndex}">
<div class="accordion-body">
Goal: ${subCategoryData.join(', ')}
</div>
</div>
</div>
`;
});
const accordionItem = `
<div class="accordion-item">
<h2 class="accordion-header" id="heading${mainIndex}">
<button class="accordion-button collapsed" type="button" data-bs-toggle="collapse" data-bs-target="#${mainAccordionId}" aria-expanded="false" aria-controls="${mainAccordionId}">
Goal: ${mainCategory}
</button>
</h2>
<div id="${mainAccordionId}" class="accordion-collapse collapse" aria-labelledby="heading${mainIndex}">
<div class="accordion-body">
<div class="accordion" id="subAccordion${mainIndex}${mainCategory}">
${subAccordions}
</div>
</div>
</div>
</div>
`;
accordionContainer.innerHTML += accordionItem;
});
}
move_on = input("Move on 0 -->")
try:
input_field_username = driver.find_element(By.XPATH, '/html/body/div/div[2]/div[1]/div[1]/form/table/tbody/tr[1]/td/input')
input_field_password = driver.find_element(By.XPATH, '/html/body/div/div[2]/div[1]/div[1]/form/table/tbody/tr[2]/td/input')
input_field_username.send_keys(username)
input_field_password.send_keys(password)
button_on_page = driver.find_element(By.XPATH, '/html/body/div/div[2]/div[1]/div[1]/form/table/tbody/tr[5]/td[2]/input')
button_on_page.click()
except NoSuchElementException as e:
print(f"Error: {e}")
# Handle the exception or pass
pass
except TimeoutException as e:
print(f"Page load timed out: {e}")
pass
except Exception as e:
print(f"An unexpected error occurred: {e}")
pass
move_on = input("Move on 1 -->")
button_on_page = driver.find_element(By.XPATH, '/html/body/div[2]/div[2]/div[1]/div[1]/ul/li[4]/a')
button_on_page.click()
move_on = input("Move on 2 -->")
button_on_page = driver.find_element(By.XPATH, '/html/body/div[2]/div[1]/div[2]/ul/li[3]/a')
button_on_page.click()
move_on = input("Move on 3 -->")
# Execute a script that will change all elements with 'display: none' to 'display: block'
driver.execute_script("""
var elements = document.querySelectorAll('[style*="display: none"]');
for (var i = 0; i < elements.length; i++) {
elements[i].style.display = 'block';
}
""")
move_on = input("Move on 1 -->")
button_on_page = driver.find_element(By.XPATH, '/html/body/div[2]/div[2]/div[1]/div[1]/ul/li[5]/a')
button_on_page.click()
move_on = input("Move on 2 -->")
button_on_page = driver.find_element(By.XPATH, '/html/body/div[2]/div[1]/div/div[2]/div/ul/li[2]/a')
button_on_page.click()
move_on = input("Move on 3 -->")
button_on_page = driver.find_element(By.XPATH, '/html/body/div[2]/div[1]/div/div[2]/div/div[1]/div/div/form/input[2]')
button_on_page.click()
move_on = input("Move on 4 -->")
button_on_page = driver.find_element(By.XPATH, '/html/body/div[2]/div[1]/div/div[2]/div/div[2]/a/img')
button_on_page.click()
def findOptimalCourse(courses: dict, goalLeft: list, userNotCompleted: dict):
hashMap = {}
res = []
for course in courses:
points = 0
goalFulfill = courses[course]
for goals in goalLeft:
if goals in goalFulfill:
points += 1
hashMap[course] = points
hashMap = dict(sorted(hashMap.items(), key=lambda x: x[1], reverse=True))
for key, val in hashMap.items():
if val > 0:
res.append(key)
for i, key in enumerate(res):
if key in courses:
res[i] = {key: courses[key]}
courseGoal = {}
for key in res:
for val in key.values():
for item in val:
if item in courseGoal:
courseGoal[item].append(key)
else:
courseGoal[item] = [key]
newDict = {}
for key in goalLeft:
newDict[f"{str(key)} {userNotCompleted[key]}"] = courseGoal[key]
return json.dumps(newDict)
- This function creates a list of optimal courses according to the goal areas hasn't fulfilled in descending order
- :param courses: The json of all courses from the scraper
- :type courses: dict
- :param goalLeft: Goal areas the user needs to fulfill
- :type goalLeft: list
- :return: A json of all the optimal courses the user can take
- :rtype: dict
def GetDegreeAudit(DegreeAuditFile):
with open(DegreeAuditFile, 'r') as file:
file_contents = file.read()
CoursesTaken = file_contents.split()
return(CoursesTaken)
- The
GetDegreeAudit
function gets the degree audit file name and opens the file for the program to read and save it to a string. It then splits the data by blank/white space and make a list and returns the saved list to main for further processing.
def GetTakenCourses(CoursesTaken):
courses = [["CYB"],["CSCI"], ["..."]
flattened_courses = [item for sublist in courses for item in sublist]
flattened_courses_lower = set(course.lower() for course in flattened_courses)
filtered_words_with_next = []
for i in range(len(CoursesTaken) - 1):
word = CoursesTaken[i]
if word.isupper() and word.lower() in flattened_courses_lower:
filtered_words_with_next.append(word)
filtered_words_with_next.append(CoursesTaken[i + 1])
return(filtered_words_with_next)
- The
GetTakenCourses
function takes the newly generated list and cleans out all the "Garbage" data from the data scrap we don't need.
def PrintToScreen(filtered_words_with_next):
FinalList = []
for n in range (0,len(filtered_words_with_next)-1, 2):
FinalList.append(str(filtered_words_with_next[n])+str(filtered_words_with_next[n+1]))
return(FinalList)
- The
PrintToScreen
function takes the newly cleaned and list and formats the final list in a standardized way for our other pythons scripts to process.
def findOptimalCourse(courses: dict, goalLeft: list):
hashMap = {}
res = []
for course in courses:
points = 0
goalFulfill = courses[course]
for goals in goalLeft:
if goals in goalFulfill:
points += 1
hashMap[course] = points
hashMap = dict(sorted(hashMap.items(), key=lambda x: x[1], reverse=True))
for key, val in hashMap.items():
if val > 0:
res.append(key)
return json.dumps({"optimalCourses": res})
The essence of the function can be represented by the equation:
Where:
-
$O$ Represents the Optimal set of courses. -
$Gc$ denotes the Goals fulfilled by course$c$ . - The summation
$∑Gc$ sums the goals fulfilled by each course. - The max function selects the combination of courses that maximizes the total number of fulfilled goals.
- Once the class scores are fully calculated. The data is exported as a JSON.
Business Aspect
This product would add great value to students and SCSU as a whole. It would allow students to find more classes they will enjoy for a better value which in return would lead to better student retention at SCSU.
Feasibility
- This Product could be implemented directly into D2L or E-Services.
- Long-term we would want to grab this data directly from a database that SCSU host vs scraping the public web.
- In the long run this could be very profitable for the university.
Future Works & Additions
- Implement rate my professor to our Points System to add more data points to our generated report.
- Connect our product to e-services to gain access to class start times, dates and semester availability to provide a more detailed generated report.
- Expand our products scope to all Majors that are offered at SCSU. Not just General Liberal Education courses.
- Expand to all courses and implement a questionnaire/chatbot to help students gage what elective courses they might have interest in.
- Web Scrapping: The process of using bots to extract content and data from a website.
- API: Application Programming Interface.
- Text Parsing/Parser: Task that separates the given series of text into smaller components based on some rules.
- GUI: Graphical User Interface