Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Skip recompilation if .proto file unchanged #438

Open
ivu-mawi opened this issue Oct 29, 2024 · 6 comments · May be fixed by #459
Open

Feature Request: Skip recompilation if .proto file unchanged #438

ivu-mawi opened this issue Oct 29, 2024 · 6 comments · May be fixed by #459
Assignees
Labels
general improvements Fixing/updating internal components and code new feature A new user-facing feature.
Milestone

Comments

@ivu-mawi
Copy link

ivu-mawi commented Oct 29, 2024

Running both the protoc compiler and then the java compiler takes a non-negligible amount of time, depending on the size and amount of your .proto files. I would like to have the fastest possible edit-rebuild-debug cycle.

Even when not actually changing any code, this plugin always recompiles all proto files, so we can not take advantage of incremental rebuilding as needed.

We currently use https://github.com/os72/protoc-jar-maven-plugin and it provides a config property very non-aptly named optimizeProtogen which is true by default and skips running protoc if the last build time is more recent than any of the source .proto file last-modified-timestamps. This in turn allows maven to skip running javac in many cases.

I'd love some similar mechanism in this project. I'm not sure if the timestamp comparison from the linked project is the best way. Maybe a hash over the .proto file contents and the used protoc compiler version would be the better. (In contrast to the is-timestamp-newer-check, a hash-check would correctly detect downgrade of a .proto file to an older version as needing a recompilation. It would also keep working if the original timestamp of any proto file gets lost somewhere along the way. But the hash could potentially be much slower than timestamp checking.)

@ascopes ascopes added new feature A new user-facing feature. general improvements Fixing/updating internal components and code labels Oct 29, 2024
@ascopes
Copy link
Owner

ascopes commented Oct 29, 2024

Hey, thanks for the issue.

This has already been asked by someone else before, but at the time I could not think of a good way to consistently do this. If we can ensure that timestamp data for protobuf sources remains consistent when being unpacked from sources like existing archives or dependencies, then this sounds like a good idea and something we should definitely support. We can also hash the files instead as suggested potentially, although we'd need to also keep some kind of persistent file listing so that we know if new sources have been added.

Incremental compilation of subsets of the sources may be a little more complex and require more thought.

Would this be something you'd be interested in working on? If not, it might take me a couple of weeks to be able to pick this up as my schedule is a bit packed at the moment.

@ascopes ascopes modified the milestones: v3.0.x, v2.6.x, v2.7.x Oct 29, 2024
@ivu-mawi
Copy link
Author

I'll see if I can get my employer to allow me to work on this :)

@ascopes ascopes pinned this issue Oct 30, 2024
@ascopes
Copy link
Owner

ascopes commented Nov 3, 2024

@ivu-mawi I have some time this afternoon, so will crack on with this if that is okay with you!

@ascopes ascopes self-assigned this Nov 3, 2024
@ascopes ascopes changed the title Skip recompilation if .proto file unchanged [Feature Request]: Skip recompilation if .proto file unchanged Nov 4, 2024
@ascopes ascopes changed the title [Feature Request]: Skip recompilation if .proto file unchanged Feature Request: Skip recompilation if .proto file unchanged Nov 4, 2024
ascopes added a commit that referenced this issue Nov 9, 2024
…tractor

GH-438: Inline the proto archive extractor
ascopes added a commit that referenced this issue Nov 9, 2024
… URIs

Currently we use absolute paths to generate unique names for archives
to avoid name collision, but since we now support resolution of archives
internally and recursively as part of optimisations for GH-438, there is
now a potential edge case where an archive within an archive can collide
with a different file tree by reusing the same name in a legal context.

Switching to using URIs to generate the digests reduces this risk, as
URIs encapsulate the full source of the file tree recursively.
ascopes added a commit that referenced this issue Nov 9, 2024
This more clearly represents what this class is doing, and will allow separating
out source detection logic elsewhere at a later date to reduce the scope of this
monolithic class.
ascopes added a commit that referenced this issue Nov 9, 2024
…tractor

GH-438: Change unique name digest generation for archives to use full URIs
ascopes added a commit that referenced this issue Nov 9, 2024
This more clearly represents what this class is doing, and will allow separating
out source detection logic elsewhere at a later date to reduce the scope of this
monolithic class.
ascopes added a commit that referenced this issue Nov 9, 2024
GH-438: Rename components to be clearer for their intention
@ascopes
Copy link
Owner

ascopes commented Nov 9, 2024

Just an update that I am still working on this. I've had to move some stuff around which cascaded slightly but it should make this a little easier to look into now on my side.

Sorry for the wait!

@ascopes
Copy link
Owner

ascopes commented Nov 10, 2024

@ivu-mawi hey. I've pushed some new code to support this on the feature/GH-438-incremental-compilation branch.

Right now, it is using SHA-256 digests of all source and dependency files. I don't have a big enough dataset to see what difference it makes between setting <incrementalCompilation> to true and false though. I'm somewhat wary that it may be too slow on large file sets, so I could do with some benchmarking from your side to see what sort of difference it makes.

If all goes well, I should be able to merge this once I've tidied up after myself. If it doesn't create a meaningful improvement for you then it may be time to go back to the drawing board...

ascopes added a commit that referenced this issue Nov 11, 2024
This prevents a regression that incremental compilation has produced where we do not inform javac about source files if
nothing changes from a previous build.
@ascopes ascopes linked a pull request Nov 11, 2024 that will close this issue
ascopes added a commit that referenced this issue Nov 12, 2024
ascopes added a commit that referenced this issue Nov 12, 2024
This prevents a regression that incremental compilation has produced where we do not inform javac about source files if
nothing changes from a previous build.
@ivu-mawi
Copy link
Author

ivu-mawi commented Nov 12, 2024

Wow, thanks for your effort!
I'll try to test this as soon as I can.

In the meantime I dug a bit deeper into how maven compilation works. One problem is that as soon as maven detects any changed java-source-file, it recompiles all sources files (per-module I think), instead of just those that have changed. I think this is because the maven developers thought it would not be worth the effort to come up with an algorithm to determine which files need to be recompiled, as the java compiler is "fast enough"...

However, the protobuf-to-java compilation is a separate step, and I think skipping that when possible is still good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general improvements Fixing/updating internal components and code new feature A new user-facing feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants