Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer.compile improvements #451

Open
seanmacavaney opened this issue Aug 8, 2024 · 3 comments · May be fixed by #480
Open

Transformer.compile improvements #451

seanmacavaney opened this issue Aug 8, 2024 · 3 comments · May be fixed by #480
Labels
enhancement New feature or request

Comments

@seanmacavaney
Copy link
Collaborator

Current Approach

Currently, we use matchpy to support pipeline optimization. A set of replacement rules are defined in pyterrier.transformer.rewrite_rules, which all get executed when Transformer.compile() is called.

Transformer.compile is not used heavily, though it can be important for getting the most out of pipelines.

Problems with the current approach

The current approach has several downsides:

  • It decouples the logic for applying combination rules from the transformers themselves. I.e., I'd expect a ComposedPipeline to define how it can be combined/optimized, not some other place in the code.
  • It requires maintaining the global rewrite_rules variable.
  • It introduces additional complexity into the class hierarchy, with certain transformers defined as matchpy.Operations with an arity attribute.
  • Overall, matchpy feels like overkill for our needs, which are relatively simple combination rules. Using it introduces a dependency that we probably don't need.

Proposed Solution

I propose dropping matchpy in favor of defining optimization logic in individual transformers. This will be faciliated with three methods:

  • Transformer.compile(self) -> Transformer -- existing method, but now individual transformers are responsible for implementing it. By default, it returns the transformer itself, but can be overridden with other functionality, such as merging ComposedPipeline together.
  • Transformer.fuse_left(self, left: Transformer) -> Optinal[Transformer] -- Fuses this transformer with a transformer that immediately precedes this one in a pipeline. The new transformer should have the same effect as performing the two transformers in sequence, i.e., pipeline_unfused and pipeline_fused in the following example should provide the same results:
pipeline_unfused = left >> self
pipeline_fused = self.fuse_left(left)
  • Transformer.fuse_right(self, right: Transformer) -> Optinal[Transformer] -- Similar to fuse_left but for a transformer that immediately follows this one.

I'd expect that most pipelines wouldn't need to implement any of these methods. Perhaps only ComposedPipeline would define compile, which would be responsible for performing the merging logic, something like:

def compile(self) -> Transformer:
    out = deque()
    inp = deque(tuple(chain.from_iterable((t.models if isinstance(t, ComposedPipeline) else (t,)) for t in self.models)))
    while inp:
        right = inp.popleft()
        if out and (fused_right := out[-1].fuse_right(right)):
            out.pop()
            inp.appendleft(fused_right)
        elif out and (fused_left := right.fuse_left(out[-1])):
            out.pop()
            inp.appendleft(fused_left)
        else:
            out.append(right)
    return ComposedPipeline(*out)

An example implemetnation of fuse_left that merges a BatchRetrieve into a FeaturesBatchRetrieve, using the current logic:

class FeaturesBatchRetrieve:
  ...
  def fuse_left(self, left: pt.Transformer) -> pt.Transformer:
      if isinstance(left, BatchRetrieve) and if self.indexref == left.indexref:
          controls = dict(controls)
          controls['wmodel'] = left.wmodel
          return FeaturesBatchRetrieve(
              self.indexref,
              self.features,
              controls,
              self.properties,
              self.threads,
          )
@seanmacavaney seanmacavaney added the enhancement New feature or request label Aug 8, 2024
seanmacavaney added a commit that referenced this issue Aug 10, 2024
still to do: apply remaining rewrite_rules
@seanmacavaney
Copy link
Collaborator Author

Also, the current rulesets are a bit limited. For instance, br1 ** br2 gets merged into a single FBR, but in br1 ** br2 ** br3, nothing gets merged. This is in part due to the order of the rules, since the feature union operation gets expanded before binary pairs can be merged. But even so, I'd expect this to work even if the feature union was already nary.

@cmacdonald
Copy link
Contributor

cmacdonald commented Aug 12, 2024

Compilation/rewriting rules was a key contribution of the ICTIR paper, but I'm not sure if we should continue to maintain it.

br1 ** br2 is an example of an expression that isnt a ComposedPipeline that can be rewritten

@seanmacavaney
Copy link
Collaborator Author

I think that in a few use cases, it's valuable. E.g., rank cutoffs, including text metadata during retrieval, etc. Though I'm also not opposed to dropping the feature completely.

There's currently a rule for merging br1 ** br2 into FBR here:

rewrite_rules.append(ReplacementRule(
Pattern(FeatureUnionPipeline(_br1, _br2), BR_index_matches),
lambda _br1, _br2: FeaturesBatchRetrieve(_br1.indexref, ["WMODEL:" + _br1.controls["wmodel"], "WMODEL:" + _br2.controls["wmodel"]])
))

seanmacavaney added a commit that referenced this issue Sep 4, 2024
still to do: apply remaining rewrite_rules
@seanmacavaney seanmacavaney linked a pull request Sep 4, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants