NAPLAN 0.9.10
Results Reporting (NAPRRQL) Changes:
Ingest Check
NAPRRQL ingest check:
- If a user attempts to run
naprrql
and no data has been ingested vianaprrql --ingest
, the program will abort, rather than leave the user with a UI interface with no data.
Ingest must be run first before attempting to report or browse with web ui - no ingest means no data to work with.
Mac/Unix:
./naprrql -ingest
Windows:
naprrql.exe -ingest
ISR Printing Report File Support:
Results Reporting: Added ISR Printing File Support. Improved all reporting throughput. Ingest now batches to handle very large input files.
The ISR Printing capability creates the input file typically passed to pearsons/fuji systems for the printing of student naplan reports (ISRs)
Reporting is no longer run automatically after ingest as large volume of schools can mean a long wait before being able to explore the data in the ui.
However when ready to generate reports just run with -report flag
Mac/Unix:
./naprrql -report
Windows:
naprrql.exe -report
this will generate systems-level reports, individual school reports for each school in the dataset.
If ISR Printing reports are required just run with -isrprint flag
Mac/Unix:
./naprrql -isrprint
Windows:
naprrql.exe -isrprint
A printing file is created for each naplan year group for all students in the dataset.
South Australian feature requests:
- The student first name and surname replaces the PSI identifier in tabular display of student data in NAPRRQL.
NAPCOMP added to distribution:
- Simple command-line tool allows comparison of students between registration dataset and results reporting dataset. Highlights students who have results, but were not in registration, and (of course) students who were registered but do not show up in the results data.
Registration Validation (NAPVAL) Changes:
- On the conclusion of ingesting data into NAPVAL, a popup button appears, allowing users to check the schools and student count per school of ingested data.
- An additional check has been added for NAPVAL, to attempt data matching between schools of students, according to student attributes nominated in the
napval.toml
configuration file (fieldStudentMatch
; default settings are["FamilyName", "GivenName", "BirthDate"]
.) Any instance of a data match is reported in validation as a warning. - In addition, the validator adds up the FTE fractions of any such data matched student records; if the sum of FTE fractions is not 1.0, this is also reported as a warning. NOTE: If any student is enrolled across three or more schools, the validator will still report a warning on the second school enrolment, as the FTEs at that point do not yet add up to 1.0.
Benchmarking
NSIP has now generated a 600-school sample data file for testing NIAS or other downstream systems receiving results data.
Uncompressed results for 600 schools is around 30Gb of data, compressed around 1.2Gb.
These figures are intended to be indicative of general performance. RAM usage has been deliberately kept to the minimum possible, but more processor and SSD disks will improve performance.
Full data load and reporting achieved on standard Mac laptop (1.3Ghz i5, 4Gb RAM):
Full data ingest - 45 mins
System & School Reporting: 35 mins
ISR Print File Generation: 12 mins
Once data has been ingested (only needs to be done once), all queries in ui or data explorer return with same speed as when dataset is only 4 schools. The layout of the datastore being lexically ordered key values seems to mean (so far) that volume has little impact on query performance.
Note for Mac users:
The per-process ulimit for file handles is low on mac by default - only 256 open file descriptors; when querying and printing are happening at the same time at scale this is easily reached and 'Too many open files...' errors will be visible in the console.
To resolve this simply set process ulimit higher prior to launching the binary:
ulimit -n 2048
./naprrql [-ingest, -report, isrprint]