Issue #264: Update qviz #401

jorgeMarin1 · 2024-09-06T09:01:54Z

Description

Fixes #264 .

With the new qbeast format, when we try to run the OTree Index Visualization (qviz) it crashes, so we need to update it.

Type of change

Bug fix.

Checklist:

New feature / bug fix has been committed following the Contribution guide.
Add tests.
Your branch is updated to the main branch (dependent changes have been merged).

How Has This Been Tested? (Optional)

The OTree Visualization (qviz) was run locally on my PC.

Also, a pyspark shell was created in order to do some tests at the implemented fix. Then, a huge table of 4.000.000 rows was created, full of fake data. From this table, three Qbeast tables were created and read as Delta tables.

From these tables, we compared each of their total number of elements with the sumatory of the element_count of their respective cubes. Since these values matched, we concluded it worked as it should.

jorgeMarin1 · 2024-09-10T10:34:04Z

Cubes are obtained by using: cubes = df["tags.blocks"]

These cubes are in a panda Series and have the following structure (this is one of these cubes):
[{"cubeId":"wggg","minWeight":-103008559,"maxWeight":2147483647,"elementCount":9521,"replicated":false},{"cubeId":"wggQ","minWeight":-100165008,"maxWeight":2147483647,"elementCount":880,"replicated":false},{"cubeId":"wgg","minWeight":-1401341541,"maxWeight":-103459089,"elementCount":9759,"replicated":false},{"cubeId":"wggA","minWeight":-102673026,"maxWeight":2147483647,"elementCount":5982,"replicated":false},{"cubeId":"wggw","minWeight":-91591261,"maxWeight":2147483647,"elementCount":1242,"replicated":false}]

Each of these are blocks, which are related to a cube through the cubeId.

jorgeMarin1 · 2024-09-12T12:11:53Z

This is the structure of the metadata for revision_id = 1 :

Metadata: {'revisionID': 1, 'timestamp': 1726134712571, 'tableID': '/Users/jorgemarin/Documents/qviz-bug/qbeast-spark/utils/visualizer/tests/resources/test_table_delta_multiples_writes/', 'desiredCubeSize': 10000, 'columnTransformers': [{'className': 'io.qbeast.core.transform.LinearTransformer', 'columnName': 'user_id', 'dataType': 'IntegerDataType'}, {'className': 'io.qbeast.core.transform.StringHistogramTransformer', 'columnName': 'brand', 'dataType': 'StringDataType'}], 'transformations': [{'className': 'io.qbeast.core.transform.LinearTransformation', 'minNumber': 274969076, 'maxNumber': 566347286, 'nullValue': 391116828, 'orderedDataType': 'IntegerDataType'}, {'className': 'io.qbeast.core.transform.StringHistogramTransformation', 'histogram': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']}]}

Merge main branch into qviz-bug branch, so it is up to date with the main repository.

Jiaweihu08 · 2024-09-17T12:25:33Z

Can you add ruff for code formatting? @jorgeMarin1

Jiaweihu08 · 2024-09-17T12:27:08Z

Also, remove the redundant test tables. You can use the most complex one to cover all the test cases.
You don't need the JSON log files that end with crc.

utils/visualizer/qviz/content_loader.py

utils/visualizer/qviz/cube.py

osopardo1

Some small things that I noticed while reviewing !

utils/visualizer/qviz/qviz.py

utils/visualizer/tests/content_loader_test.py

utils/visualizer/tests/drawing_elements_test.py

utils/visualizer/pyproject.toml

…fied by 0.01 in the dash server

….0 version of Apache Spark

Jiaweihu08 · 2024-10-17T12:33:11Z

I'm closing this PR as I'm not allowed to update the branch. We will be working on this issue with #437 instead.

initial commit

42e11d6

jorgeMarin1 added the type: bug Something isn't working label Sep 6, 2024

jorgeMarin1 self-assigned this Sep 6, 2024

Jiaweihu08 changed the title ~~Issue 264: Update qviz~~ Issue #264: Update qviz Sep 9, 2024

jorgeMarin1 and others added 3 commits September 10, 2024 09:29

Merge branch 'Qbeast-io:main' into qviz-bug

b0fda5d

initial commit

0fe8b08

added process table function using delta tables

5fb0956

jorgeMarin1 added 6 commits September 10, 2024 17:40

create qviz using Delta tables

5f60426

added custom table

0ee8232

added custom table

30256f8

removed ecommerce300k_2019

cf9cfc3

fixed visualization

cbcbaed

deleted parquet files and added a new folder for table test

92cf732

jorgeMarin1 added 5 commits September 12, 2024 16:21

added comments on the code

e9559f7

added comments to the code

ee95b87

Merge branch 'main' into qviz-bug

2f0aea8

Merge main branch into qviz-bug branch, so it is up to date with the main repository.

deleted code and files that won't be used

a267f87

added unit test

032077c

jorgeMarin1 requested a review from Jiaweihu08 September 13, 2024 15:52

Jiaweihu08 reviewed Sep 17, 2024

View reviewed changes

utils/visualizer/qviz/content_loader.py Outdated Show resolved Hide resolved

Jiaweihu08 reviewed Sep 17, 2024

View reviewed changes

utils/visualizer/qviz/content_loader.py Outdated Show resolved Hide resolved

Jiaweihu08 reviewed Sep 17, 2024

View reviewed changes

utils/visualizer/qviz/content_loader.py Outdated Show resolved Hide resolved

Jiaweihu08 reviewed Sep 17, 2024

View reviewed changes

utils/visualizer/qviz/content_loader.py Outdated Show resolved Hide resolved

jorgeMarin1 added 3 commits September 17, 2024 16:39

addressed changes

398494d

addressed changes

353011f

snake case

13e95a7

jorgeMarin1 requested a review from Jiaweihu08 September 18, 2024 14:48