Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #419: Code Reorganization #425

Merged
merged 4 commits into from
Sep 23, 2024
Merged

Conversation

JosepSampe
Copy link
Member

@JosepSampe JosepSampe commented Sep 20, 2024

Description

This is PR is for #419 (related to #398), which reorganizes the code in the current project structure, and removes all the delta dependencies from the core/spark code

Type of change

Code refactoring

Checklist:

Here is the list of things you should do before submitting this pull request:

  • New feature / bug fix has been committed following the Contribution guide.
  • Add logging to the code following the Contribution guide.
  • Add comments to the code (make it easier for the community!).
  • Change the documentation.
  • Add tests.
  • Your branch is updated to the main branch (dependent changes have been merged).

@JosepSampe JosepSampe self-assigned this Sep 20, 2024
@JosepSampe JosepSampe marked this pull request as draft September 20, 2024 09:52
@JosepSampe JosepSampe changed the base branch from main to refactor September 20, 2024 10:00
@JosepSampe JosepSampe marked this pull request as ready for review September 20, 2024 16:22
CONTRIBUTING.md Outdated
@@ -193,7 +193,7 @@ sbt assembly
$SPARK_HOME/bin/spark-shell \
--jars ./target/scala-2.12/qbeast-spark-assembly-0.6.0.jar \
--packages io.delta:delta-spark_2.12:3.1.0 \
--conf spark.sql.extensions=io.qbeast.spark.internal.QbeastSparkSessionExtension \
--conf spark.sql.extensions=io.qbeast.spark.delta.QbeastSparkSessionExtension \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the SessionExtension in delta package?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that the QbeastSparkSessionExtension strongly depends on io.delta.sql.DeltaSparkSessionExtension, which is why I moved it to the delta package. However, we can discuss the best location for it or whether we should create our own format-agnostic SparkSessionExtension

Copy link
Member

@osopardo1 osopardo1 Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to either:

  • The location stays on io.qbeast.spark.internal.QbeastSparkSessionExtension, and all the reference to Delta is deleted (which I guess is tedious)
  • Rename it also to QbeastDeltaSparkSessionExtension

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second point should be faster for this iteration

src/main/scala/io/qbeast/spark/delta/QbeastFileUtils.scala Outdated Show resolved Hide resolved
@@ -74,17 +73,17 @@ case class QbeastOptions(
def toMap: CaseInsensitiveMap[String] = {
val options = Map.newBuilder[String, String]
for (txnAppId <- txnAppId; txnVersion <- txnVersion) {
options += DeltaOptions.TXN_APP_ID -> txnAppId
options += DeltaOptions.TXN_VERSION -> txnVersion
options += "txnAppId" -> txnAppId
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we are not using the DeltaOptions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see that QbeastOptions is an internal object.
Then, I am not sure how to control the mapping between QbeastOptions and DeltaOptions.

Should we add methods in the specific modules to verify and merge both subsets of options?

@osopardo1 osopardo1 self-requested a review September 23, 2024 13:04
@osopardo1 osopardo1 merged commit 57b389b into Qbeast-io:refactor Sep 23, 2024
1 check passed
@JosepSampe JosepSampe deleted the refactoring branch September 23, 2024 15:22
JosepSampe added a commit to JosepSampe/qbeast-spark that referenced this pull request Oct 24, 2024
* Code refactoring: Use Qbeast abstractions instead of Delta imports
osopardo1 pushed a commit that referenced this pull request Oct 28, 2024
* Issue #417: Abstract the Qbeast Snapshot Module (#411)

* Issue #418: Abstract PreCommitHook and StagingDataManager (#421)

* Issue #418: Abstract RollupDataWriter and QbeastStats  (#423)

* Issue #419: Code Reorganization (#425)

* Issue #420: Create Separate Modules (#427)

* Issue #398: Fix small overhead added during the refactoring (#436)

* Issue #441: Fix data change on optimize (#442)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants