Skip to content

Motivation

Sewen Thy edited this page Aug 30, 2022 · 12 revisions

Empirical

Methodology for finding examples

Getting list of projects

https://github.com/rust-unofficial/awesome-rust
And the Rust language repository itself.

Getting commits within projects

We know that there are possible refactoring and good commit messages should say what the developer did. We expect that when we search the commit messages that contains the word extract or move and refactor we should be able to find those commits. I did not include function or method as programmers like to be precise and actually put in the function name and where it was extracted from. This will most likely under count the amount of commits but it will save time in actually looking though the code.

Then we examine the commit changes to see the actual refactoring and see if it is correct.

$ git log --grep="extract\| move" --grep="refactor" -i --all-match --all --full-history 

Looking for commits related to lifetimes and borrows:

$ git log --grep="extract\| move" --grep="refactor" --grep="lifetime\|borrow" -i --all-match --all --full-history 

Examples

I am examining 2 state of the art supports for the extract method refactoring in Intellij with its Rust plugin running in the CLion IDE and in Visual Studio Code (VSC) with its Rust Analyzer plugin. Intellij's CLion has the most support for the Rust plugin with the most functionalities enabled and Visual Studio Code's (VSC) Rust Analyzer plugin is the most popular plugin (in terms of downloads) with support for extract method refactoring in VSC.

Zola

Zola is a lightweight web framework in Rust. I found the below commit:

commit 774514f4d4efc65e99a9108a6fc886e9747e567c
Author: Peng Guanwen <pg999w@outlook.com>
Date:   Sat Jan 5 22:37:24 2019 +0800

    refactor markdown_to_html
    
    this commit contains two refactors:
    - extract custom link transformations into a function.
    - separate some trivial markup generation.

The developer extracted the code below:

                   let fixed_link = if link.starts_with("./") {
                        match resolve_internal_link(&link, context.permalinks) {
                            Ok(url) => url,
                            Err(_) => {
                                error = Some(format!("Relative link {} not found.", link).into());
                                return Event::Html(Borrowed(""));
                            }
                        }
                    } else if is_colocated_asset_link(&link) {
                        format!("{}{}", context.current_page_permalink, link)
                    } else if context.config.check_external_links
                        && !link.starts_with('#')
                        && !link.starts_with("mailto:")
                    {
                        let res = check_url(&link);
                        if res.is_valid() {
                            link.to_string()
                        } else {
                            error = Some(
                                format!("Link {} is not valid: {}", link, res.message()).into(),
                            );
                            String::new()
                        }
                    } else {
                        link.to_string()
                    };

Into the following function:

fn fix_link(link: &str, context: &RenderContext) -> Result<String> {
    let result = if link.starts_with("./") {
        match resolve_internal_link(&link, context.permalinks) {
            Ok(url) => url,
            Err(_) => {
                return Err(format!("Relative link {} not found.", link).into());
            }
        }
    } else if is_colocated_asset_link(&link) {
        format!("{}{}", context.current_page_permalink, link)
    } else if context.config.check_external_links
        && !link.starts_with('#')
        && !link.starts_with("mailto:") {
        let res = check_url(&link);
        if res.is_valid() {
            link.to_string()
        } else {
            return Err(
                format!("Link {} is not valid: {}", link, res.message()).into(),
            );
        }
    } else {
        link.to_string()
    };
    Ok(result)
}

And insert the following function call back in the extracted code:

                    let fixed_link = match fix_link(&link, context) {
                        Ok(fixed_link) => fixed_link,
                        Err(err) => {
                            error = Some(err);
                            return Event::Html(Borrowed(""))
                        }
		    };	

Intellij

When I try to do this refactoring in Intellij's CLion using the Rust plugin, I got the following extracted function:

fn fix_link_extracted_ij(context: &RenderContext, mut error: &mut Option<Error>, link: &Cow<str>) -> String {
    let fixed_link = if link.starts_with("./") {
        match resolve_internal_link(&link, context.permalinks) {
            Ok(url) => url,
            Err(_) => {
                error = Some(format!("Relative link {} not found.", link).into());
                return Event::Html(Borrowed(""));
            }
        }
    } else if is_colocated_asset_link(&link) {
        format!("{}{}", context.current_page_permalink, link)
    } else if context.config.check_external_links
        && !link.starts_with('#')
        && !link.starts_with("mailto:")
    {
        let res = check_url(&link);
        if res.is_valid() {
            link.to_string()
        } else {
            error = Some(
                format!("Link {} is not valid: {}", link, res.message()).into(),
            );
            String::new()
        }
    } else {
        link.to_string()
    };
    fixed_link
}

Which has 3 compilation errors:

mismatched types [E0308]expected `&mut Option<Error>`, found `Option<<unknown>>`
mismatched types [E0308]expected `String`, found `Event`
mismatched types [E0308]expected `&mut Option<Error>`, found `Option<<unknown>>`

These are basically how the extract function refactoring handles return values, since the extracted method is inside another function body that has another return value, this return value needs to be propagated, which the developer did but Intellij's Rust plugin did not.

To fix this extraction, I wrapped the function output into a Result crate and fixed the output valut into the crate. I also had to rewrite the mutable error reassignment statements to actually update the referenced value rather than the reference itself:

fn fix_link_extracted_ij_fixed(context: &RenderContext, mut error: &mut Option<Error>, link: &Cow<str>) -> Resut<String, Event> {
    if link.starts_with("./") {
        match resolve_internal_link(&link, context.permalinks) {
            Ok(url) => url,
            Err(_) => {
                *error = Some(format!("Relative link {} not found.", link).into());
                return Err(Event::Html(Borrowed("")));
            }
        }
    } else if is_colocated_asset_link(&link) {
        Ok(format!("{}{}", context.current_page_permalink, link))
    } else if context.config.check_external_links
        && !link.starts_with('#')
        && !link.starts_with("mailto:")
    {
        let res = check_url(&link);
        if res.is_valid() {
            Ok(link.to_string())
        } else {
            *error = Some(
                format!("Link {} is not valid: {}", link, res.message()).into(),
            );
            Ok(String::new())
        }
    } else {
        Ok(link.to_string())
    }
}

Then I rewrite the function call to a pattern match statement which extract from the Result crate:

                    let fixed_link = match fix_link_extracted_ij_fixed(context, &mut error, &link) {
                        Ok(r) => r,
                        Err(e) => return e,
                    };

VSC

It succeeded in refactoring the different return values by putting the return value in the Result<String, Event> type:

fn fix_link_extracted_vsc(link: std::borrow::Cow<str>, context: &RenderContext, error: &mut Option<errors::Error>) -> Result<String, Event> {
    let fixed_link = if link.starts_with("./") {
        match resolve_internal_link(&link, context.permalinks) {
            Ok(url) => url,
            Err(_) => {
                *error = Some(format!("Relative link {} not found.", link).into());
                return Err(Event::Html(Borrowed("")));
            }
        }
    } else if is_colocated_asset_link(&link) {
        format!("{}{}", context.current_page_permalink, link)
    } else if context.config.check_external_links
        && !link.starts_with('#')
        && !link.starts_with("mailto:")
    {
        let res = check_url(&link);
        if res.is_valid() {
            link.to_string()
        } else {
            *error = Some(
                format!("Link {} is not valid: {}", link, res.message()).into(),
            );
            String::new()
        }
    } else {
        link.to_string()
    };
    Ok(fixed_link)
}

Then inserted the following function call:

                    let fixed_link = match fix_link_extracted_vsc(link, context, &mut error) {
                        Ok(value) => value,
                        Err(value) => return value,
                    };

The plugin correctly change the return values and pattern match back the result as well as correctly terminating the parent function call almost exactly as the developer did.

Diem

Diem is an open-source blockchain technology that aims at maintaining cryptocurrency. I found the following commit which extract the code that generates a report to a function involving a very simple lifetime annotation that can be filled with a simple discard '_. The original code is :

Rust

For a complex example involving lifetime, I took a look at the Rust borrow-checker implementation and found this refactoring that involves how the borrow-checker emits errors. The commit for Rust is below:

commit 1f0a96862ac9d4c6ca3e4bb500c8b9eac4d83049
Merge: bf242bb1199 48a48fd1b85
Author: bors <bors@rust-lang.org>
Date:   Wed Feb 9 09:41:48 2022 +0000

    Auto merge of #92306 - Aaron1011:opaque-type-op, r=oli-obk
...
    * The body of `try_extract_error_from_fulfill_cx`
    has been moved out to a new function `try_extract_error_from_region_constraints`.
    This allows us to re-use the same error reporting code between
    canonicalized queries (which can extract region constraints directly
    from a fresh `InferCtxt`) and opaque type handling (which needs to take
    region constraints from the pre-existing `InferCtxt` that we use
    throughout MIR borrow checking).

This very detailed commit message shows not only the extract method refactoring but also described its usefulness.

The code before extraction in the try_extract_error_from_fulfill_cx:

fn try_extract_error_from_fulfill_cx<'tcx>(
    mut fulfill_cx: Box<dyn TraitEngine<'tcx> + 'tcx>,
    infcx: &InferCtxt<'_, 'tcx>,
    placeholder_region: ty::Region<'tcx>,
    error_region: Option<ty::Region<'tcx>>,
) -> Option<DiagnosticBuilder<'tcx>> {
    let tcx = infcx.tcx;

    // We generally shouldn't have errors here because the query was
    // already run, but there's no point using `delay_span_bug`
    // when we're going to emit an error here anyway.
    let _errors = fulfill_cx.select_all_or_error(infcx);

    let (sub_region, cause) = infcx.with_region_constraints(|region_constraints| {
        debug!("{:#?}", region_constraints);
        region_constraints.constraints.iter().find_map(|(constraint, cause)| {
            match *constraint {
                Constraint::RegSubReg(sub, sup) if sup == placeholder_region && sup != sub => {
                    Some((sub, cause.clone()))
                }
                // FIXME: Should this check the universe of the var?
                Constraint::VarSubReg(vid, sup) if sup == placeholder_region => {
                    Some((tcx.mk_region(ty::ReVar(vid)), cause.clone()))
                }
                _ => None,
            }
        })
    })?;

    debug!(?sub_region, "cause = {:#?}", cause);
    let nice_error = match (error_region, sub_region) {
        (Some(error_region), &ty::ReVar(vid)) => NiceRegionError::new(
            infcx,
            RegionResolutionError::SubSupConflict(
                vid,
                infcx.region_var_origin(vid),
                cause.clone(),
                error_region,
                cause.clone(),
                placeholder_region,
                vec![],
            ),
        ),
        (Some(error_region), _) => NiceRegionError::new(
            infcx,
            RegionResolutionError::ConcreteFailure(cause.clone(), error_region, placeholder_region),
        ),
        // Note universe here is wrong...
        (None, &ty::ReVar(vid)) => NiceRegionError::new(
            infcx,
            RegionResolutionError::UpperBoundUniverseConflict(
                vid,
                infcx.region_var_origin(vid),
                infcx.universe_of_region(sub_region),
                cause.clone(),
                placeholder_region,
            ),
        ),
        (None, _) => NiceRegionError::new(
            infcx,
            RegionResolutionError::ConcreteFailure(cause.clone(), sub_region, placeholder_region),
        ),
    };
    nice_error.try_report_from_nll().or_else(|| {
        if let SubregionOrigin::Subtype(trace) = cause {
            Some(
                infcx.report_and_explain_type_error(*trace, &TypeError::RegionsPlaceholderMismatch),
            )
        } else {
            None
        }
    })
}

The main complexity of the function above is that it uses different data structures and types that operates over 2 lifetimes such as rustc_infer::infer::InferCtxt<'a, 'b> which means the extract method refactor needs to account for which lifetime is used within the data structures. The developer extracted the above code as such:

fn try_extract_error_from_region_constraints<'tcx>(
    infcx: &InferCtxt<'_, 'tcx>,
    placeholder_region: ty::Region<'tcx>,
    error_region: Option<ty::Region<'tcx>>,
    region_constraints: &RegionConstraintData<'tcx>,
    mut region_var_origin: impl FnMut(RegionVid) -> RegionVariableOrigin,
    mut universe_of_region: impl FnMut(RegionVid) -> UniverseIndex,
) -> Option<DiagnosticBuilder<'tcx>> {
    let (sub_region, cause) =
        region_constraints.constraints.iter().find_map(|(constraint, cause)| {
            match *constraint {
                Constraint::RegSubReg(sub, sup) if sup == placeholder_region && sup != sub => {
                    Some((sub, cause.clone()))
                }
                // FIXME: Should this check the universe of the var?
                Constraint::VarSubReg(vid, sup) if sup == placeholder_region => {
                    Some((infcx.tcx.mk_region(ty::ReVar(vid)), cause.clone()))
                }
                _ => None,
            }
        })?;

    debug!(?sub_region, "cause = {:#?}", cause);
    let nice_error = match (error_region, sub_region) {
        (Some(error_region), &ty::ReVar(vid)) => NiceRegionError::new(
            infcx,
            RegionResolutionError::SubSupConflict(
                vid,
                region_var_origin(vid),
                cause.clone(),
                error_region,
                cause.clone(),
                placeholder_region,
                vec![],
            ),
        ),
        (Some(error_region), _) => NiceRegionError::new(
            infcx,
            RegionResolutionError::ConcreteFailure(cause.clone(), error_region, placeholder_region),
        ),
        // Note universe here is wrong...
        (None, &ty::ReVar(vid)) => NiceRegionError::new(
            infcx,
            RegionResolutionError::UpperBoundUniverseConflict(
                vid,
                region_var_origin(vid),
                universe_of_region(vid),
                cause.clone(),
                placeholder_region,
            ),
        ),
        (None, _) => NiceRegionError::new(
            infcx,
            RegionResolutionError::ConcreteFailure(cause.clone(), sub_region, placeholder_region),
        ),
    };
    nice_error.try_report_from_nll().or_else(|| {
        if let SubregionOrigin::Subtype(trace) = cause {
            Some(
                infcx.report_and_explain_type_error(*trace, &TypeError::RegionsPlaceholderMismatch),
            )
        } else {
            None
        }
    })
}

And then rewrite the original try_extract_error_from_fulfill_cx to:

fn try_extract_error_from_fulfill_cx<'tcx>(
    mut fulfill_cx: Box<dyn TraitEngine<'tcx> + 'tcx>,
    infcx: &InferCtxt<'_, 'tcx>,
    placeholder_region: ty::Region<'tcx>,
    error_region: Option<ty::Region<'tcx>>,
) -> Option<DiagnosticBuilder<'tcx>> {
    // We generally shouldn't have errors here because the query was
    // already run, but there's no point using `delay_span_bug`
    // when we're going to emit an error here anyway.
    let _errors = fulfill_cx.select_all_or_error(infcx);
    let region_constraints = infcx.with_region_constraints(|r| r.clone());
    try_extract_error_from_region_constraints(
        infcx,
        placeholder_region,
        error_region,
        &region_constraints,
        |vid| infcx.region_var_origin(vid),
        |vid| infcx.universe_of_region(infcx.tcx.mk_region(ty::ReVar(vid))),
    )
}

This extraction involved a bit more magic with functionalization of the argument for the extracted method but the important aspect being the choice the developer had to make as to where the 'tcx lifetime fits with the InferCtxt struct and a successful extract method refactoring for rust should account for that.

Intellij

Intellij extracted the above code into:

fn try_extract_error_from_region_constraints_extracted_ij(infcx: &InferCtxt, placeholder_region: Region, error_region: Option<Region>) -> Option<DiagnosticBuilder<ErrorGuaranteed>> {
    let (sub_region, cause) = infcx.with_region_constraints(|region_constraints| {
        debug!("{:#?}", region_constraints);
        region_constraints.constraints.iter().find_map(|(constraint, cause)| {
            match *constraint {
                Constraint::RegSubReg(sub, sup) if sup == placeholder_region && sup != sub => {
                    Some((sub, cause.clone()))
                }
                // FIXME: Should this check the universe of the var?
                Constraint::VarSubReg(vid, sup) if sup == placeholder_region => {
                    Some((tcx.mk_region(ty::ReVar(vid)), cause.clone()))
                }
                _ => None,
            }
        })
    })?;

    debug!(?sub_region, "cause = {:#?}", cause);
    let nice_error = match (error_region, sub_region) {
        (Some(error_region), &ty::ReVar(vid)) => NiceRegionError::new(
            infcx,
            RegionResolutionError::SubSupConflict(
                vid,
                infcx.region_var_origin(vid),
                cause.clone(),
                error_region,
                cause.clone(),
                placeholder_region,
                vec![],
            ),
        ),
        (Some(error_region), _) => NiceRegionError::new(
            infcx,
            RegionResolutionError::ConcreteFailure(cause.clone(), error_region, placeholder_region),
        ),
        // Note universe here is wrong...
        (None, &ty::ReVar(vid)) => NiceRegionError::new(
            infcx,
            RegionResolutionError::UpperBoundUniverseConflict(
                vid,
                infcx.region_var_origin(vid),
                infcx.universe_of_region(sub_region),
                cause.clone(),
                placeholder_region,
            ),
        ),
        (None, _) => NiceRegionError::new(
            infcx,
            RegionResolutionError::ConcreteFailure(cause.clone(), sub_region, placeholder_region),
        ),
    };
    nice_error.try_report_from_nll().or_else(|| {
        if let SubregionOrigin::Subtype(trace) = cause {
            Some(
                infcx.report_and_explain_type_error(*trace, &TypeError::RegionsPlaceholderMismatch),
            )
        } else {
            None
        }
    })
}

And simply rewrites the original function as:

fn try_extract_error_from_fulfill_cx<'tcx>(
    mut fulfill_cx: Box<dyn TraitEngine<'tcx> + 'tcx>,
    infcx: &InferCtxt<'_, 'tcx>,
    placeholder_region: ty::Region<'tcx>,
    error_region: Option<ty::Region<'tcx>>,
) -> Option<DiagnosticBuilder<'tcx, ErrorGuaranteed>> {
    let tcx = infcx.tcx;

    // We generally shouldn't have errors here because the query was
    // already run, but there's no point using `delay_span_bug`
    // when we're going to emit an error here anyway.
    let _errors = fulfill_cx.select_all_or_error(infcx);

    try_extract_error_from_region_constraints_extracted_ij(infcx, placeholder_region, error_region, tcx)
}

The issue is the missing lifetime annotations; the Rust compiler reports the following error:

error[E0106]: missing lifetime specifier
   --> compiler/rustc_borrowck/src/diagnostics/bound_region_errors.rs:424:176
    |
424 | ...j(infcx: &InferCtxt, placeholder_region: Region, error_region: Option<Region>, tcx: TyCtxt) -> Option<DiagnosticBuilder<ErrorGuarantee...
    |             ----------                      ------                --------------       ------                             ^ expected named lifetime parameter
    |
    = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from one of `infcx`'s 3 lifetimes, `placeholder_region`, `error_region`, or `tcx`
help: consider introducing a named lifetime parameter
    |
424 | fn try_extract_error_from_region_constraints_extracted_ij<'a>(infcx: &'a InferCtxt<'a, 'a>, placeholder_region: Region<'a>, error_region: Option<Region<'a>>, tcx: TyCtxt<'a>) -> Option<DiagnosticBuilder<'a, ErrorGuaranteed>> {
    |                                                          ++++         ++          ++++++++                            ++++                             ++++              ++++                              +++

The compiler suggest solution is to simply fill in a single lifetime for all of this, but an analysis is needed to correctly fill in these lifetime annotations.

VSC

The Rust analyzer plugin does not allow the extract method refactoring for the above example. This means at least the plugin will not refactor and change semantics.

Amethyst

For amethyst, this commit:

commit 9c79a567198e17feda16801a9903cd4a9241b908
Author: Azriel Hoh <azriel91@gmail.com>
Date:   Wed Jul 10 07:12:01 2019 +1200

    Extracted common code for sending `AxisMoved` events.

Toy Examples

Multi-lifetime

I found a basic example on StackOverflow that shows how basic functions over objects with different lifetimes can be written. I updated the example to in-line the function back into the main function at v and added a println statement to ensure that the Foo struct f, need to be alive after v (which is what needs to be extracted).

Consider the code below where a struct Foo can have fields with two lifetimes 'a and 'b. For the value v that could either have the reference f.x which has lifetime 'a, or the reference &ZERO which has lifetime'static. y has lifetime 'b as it is dropped within the inner scope so only x and ZERO have valid use at the last statement for printing the value v:

static ZERO: i32 = 0;

struct Foo<'a, 'b> {
    x: &'a i32,
    y: &'b i32,
}

fn main() {
    let x = 5;
    let v;
    
    {
        let y = 2;
        
        let f = Foo { x: &x, y: &y };
        v = if *f.x > *f.y {
            f.x
        } else {
            &ZERO
        };
		println!("{}", *f.y);
    }
    println!("{}", *v);
}

Say we want to define v either the reference &x or &ZERO, depending on whether the value of the 2 references f.x and f.y. To extract it to a function, the developer needs to handle those two lifetime while making sure that the output has the valid lifetime that can be de-referenced at the last println statement. To wit:

fn get_x_or_zero_ref<'a, 'b>(x: &'a i32, y: &'b i32) -> &'a i32 {
    if *x > *y {
        return x
    } else {
        return &ZERO
    }
}

And replacing v with a function call:

v = get_x_or_zero_ref(f.x, f.y);

The function get_x_or_zero_ref has the 2 lifetimes 'a,'b specified and returning the first one 'a which fits both f.x and &ZERO. A serious extraction method implementation for Rust needs to account for this lifetime features for its safety to be guaranteed.

Intellij

Intellij's Rust plugin extracted the above v value function as:

fn get_x_or_zero_ref_extracted_ij(f: &Foo) -> &i32 {
    if *f.x > *f.y {
        f.x
    } else {
        &ZERO
    }
}

And inserted the following function call:

        v = get_x_or_zero_ref_extracted_ij(&f);

The Rust compiler reports the following error:

error[E0106]: missing lifetime specifier
  --> src/main.rs:23:47
   |
23 | fn get_x_or_zero_ref_extracted_ij(f: &Foo) -> &i32 {
   |                                      ----     ^ expected named lifetime parameter
   |

Since the plugin did not account for the lifetime difference in Foo and the return value of the function, the borrow checker did not accept this definition due to the possible different lifetimes. To fix the current extraction, I simply added the lifetime annotations:

fn get_x_or_zero_ref_extracted_ij_fixed<'a, 'b>(f: &Foo<'a, 'b>) -> & 'a i32 {
	...
}

The choice of whether to pick 'a or 'b needs analysis if we want the correct lifetime chosen.

VSC

VSC's Rust analyzer extracted the above v value function as:

fn get_x_or_zero_ref_extracted_vsc(mut v: &i32, f: &Foo) {
    v = if *f.x > *f.y {
        f.x
    } else {
        &ZERO
    };
}

And inserted the following function call:

get_x_or_zero_ref_extracted_vsc(v, &f);

The Rust compiler reports the following error:

error[E0623]: lifetime mismatch
  --> src/main.rs:25:9
   |
23 | fn get_x_or_zero_ref_extracted_vsc(mut v: &i32, f: &Foo) {
   |                                           ----      --- these two types are declared with different lifetimes...
24 |     v = if *f.x > *f.y {
25 |         f.x
   |         ^^^ ...but data from `f` flows into `v` here

Since the rust analyzer plugin did not account for the lifetime, it did not annotate the lifetime of the extracted function. To fix the current extraction, I would need to add the lifetime annotations and also initialized v to a correct possible reference lifetime:

fn get_x_or_zero_ref_extracted_vsc_fixed<'a, 'b>(mut _v : & 'a i32, f: & Foo<'a, 'b>) {
    _v = if *f.x > *f.y {
        f.x
    } else {
        &ZERO
    };
}

...

        v = f.x;
        get_x_or_zero_ref_extracted_vsc_fixed(v, &f);

Formal approach