A normalizing representer for Exercism's Rust track.
A representer's job is to normalize some input code by stripping out and replacing any trivial details that introduce differences between students' submitted code. Comments, whitespace, and variable names, things that don't contribute to the overall logical flow and structure of the students' approach, are stripped out. In the case of variable names, these are replaced by a standard placeholder.
The ultimate purpose of the representer is to facilitate quicker response times from mentors by standardizing student implementations so that mentors can provide feedback on the approach the student took to solve the problem.
Given an example submission for the two-fer
exercise like the following:
fn twofer(name: &str) -> String {
match name {
"" => "One for you, one for me.".to_string(),
// use the `format!` macro to return a formatted String
_ => format!("One for {}, one for me.", name),
}
}
The representer will return:
fn PLACEHOLDER_1(PLACEHOLDER_2: &str) -> String {
match PLACEHOLDER_2 {
"" => "One for you, one for me.".to_string(),
_ => format!("One for {}, one for me.", PLACEHOLDER_2),
}
}
Currently the following statement/expression types are visited by the representer:
-
let
bindings -
struct
names -
struct
fields -
enum
names -
enum
variants -
fn
definitions -
fn
calls - method calls
-
const
names -
static
names -
union
names -
type
aliases -
match
expressions -
match
arms -
macro
arguments - closure expressions
-
for
loops -
while
loops -
loop
s -
if
expressions -
impl
blocks - type annotations
-
if let
bindings - user-defined types
- user-defined traits
-
mod
imports - output variable mappings to a JSON file
The high-level steps the representer takes are as follows:
- It transforms the source code into an AST, stripping out comments in the process.
- From there, it traverses the AST, looking for identifiers.
- When it finds an identifier:
- It checks whether the identifier is a Rust keyword (or any other sort of identifier that isn't actually being used as a variable/function name).
- If the identifier isn't a keyword, it then checks if the identifier is one that has been encountered before.
- If it is, then a placeholder for this identifier has already been generated and stored in a HashMap; the identifier is replaced with the placeholder.
- If it isn't, then the placeholder needs to be generated and saved in the HashMap before the identifier is replaced by it.
- The transformed output is then put through another formatting pass.