Replies: 7 comments 7 replies
-
technical trick : maybe the file name of metadata should be a hash of the source path in order to identify quickly when a path is a registered table. |
Beta Was this translation helpful? Give feedback.
-
💯Let's get to work! |
Beta Was this translation helpful? Give feedback.
-
Hi ! Exciting new feature: if I've understood correctly, the metadata would be in json format? |
Beta Was this translation helpful? Give feedback.
-
Really promising! Look forward to see that. Actually, your approach is really in line with what we do in the oceanographic field since a few years now, especially:
Both Zarr and STAC are Open Geospatial Consortium standards. Not fully related, but speaking of OGC standards and data, we are currently playing around with charts not launch services, but functions (container-based) to execute scientific (or not) computation with input and output. Thanks to the team, again, great work! |
Beta Was this translation helpful? Give feedback.
-
Nice! It would be great to discuss this between Onyxia and data.gouv.fr. In terms of product inspiration, here are some resources we used when building our data explorer: You can also learn more on our own data explorer here. |
Beta Was this translation helpful? Give feedback.
-
We are ready to start development. Proposal for first step is here. |
Beta Was this translation helpful? Give feedback.
-
There will have to be some implementation for |
Beta Was this translation helpful? Give feedback.
-
Dear Onyxia community members,
At the heart of our ongoing commitment to evolve and enhance Onyxia is our belief in co-creation with the community that drives Onyxia's dynamism. Today, we are eager to introduce a proposal for a new feature: the "Data Explorer." But before diving into its intricacies, let's discuss the motivation behind its inception.
Why the Need for "Data Explorer"?
Our existing File Explorer has served the Onyxia community diligently, allowing users to organize and manage their files efficiently. However, its primary function is file management, which means it doesn't provide the necessary features to view and describe the actual content of the data. While the File Explorer is great for handling files, the community has expressed a desire for a more in-depth interaction with their data.
What is the "Data Explorer"?
The Data Explorer is a new, distinct part of the Onyxia application. While it will operate separately from the File Explorer, some light integration will be present to ensure users can conveniently transition between the two. The Data Explorer is designed to delve into the intricacies of data content, enabling users to view, describe, and interact with their data beyond mere file management. This distinction ensures that while the File Explorer handles the organization, the Data Explorer tackles comprehension, enhancing the overall user experience in the Onyxia ecosystem.
How Onyxia will handle the "Data Explorer" ?
When a user switches to the Data Explorer mode, they are not only changing the view but immersing themselves in a more analytical environment. This mode provides a deeper dive into the dataset, beyond what the File Explorer offers. Let's delve into its core features:
Dataset Description: Upon accessing a dataset, the Data Explorer will showcase a summary. This encapsulates metadata like the dataset's origin, size, last update timestamp, and other pivotal details offering clarity about the data's nature and lineage.
Attribute Exploration: For those datasets which translate to dataframes (akin to tables), users will be greeted with a list of attributes or columns. These attributes come enriched with data type information, potential value ranges, and even brief descriptions if annotations exist.
Data Preview: Beyond just metadata and structural insight, users can catch a glimpse of the actual data. The initial rows of the dataset surface, offering a snapshot of its content and the information treasures it might hold.
SQL editor: Beyond the aforementioned features, our roadmap for the Data Explorer includes an integrated SQL editor. This will harness the power of DuckDB in WebAssembly (WASM) to provide direct SQL editing capabilities within the browser, allowing users to query and manipulate their datasets seamlessly. This feature aims to bridge the gap between data viewing and data analysis, providing a comprehensive toolset right within Onyxia. This feature will be implemented in a second iteration
Metadata Registration in Data View: Once a user is in the Data Explorer mode and has an analytical view over a specific dataset, they have the added capability to register or update the associated metadata. This feature is beneficial for enriching the data catalog and ensuring datasets are accompanied by comprehensive and up-to-date information.
Metadata Storage & Integration in Onyxia
1. Metadata Storage
bucket/.onyxia/data-catalog/default-source/tableX.json
.2. Metadata Capture
3. Data Explorer Integration
4. Integration with Self-Services
This mechanism ensures efficient metadata management, promoting a collaborative environment and ensuring seamless integration with various open-source tools.
Metadata Management in Onyxia: A Simple Yet Robust Approach
Rationale:
In the complex world of data management, there's no one-size-fits-all standard for metadata. The landscape is diverse, with varying requirements, tools, and standards. In navigating this complexity, Onyxia's strategy has been to prioritize simplicity and practicality.
1. Why S3 for Metadata Storage?
Scalability: S3 is inherently scalable, accommodating vast amounts of data without manual intervention.
Reliability: With S3, data durability and availability are guaranteed, ensuring metadata remains accessible and intact.
Integration: Since the data is already in S3, there's no additional system to integrate. This streamlines the architecture, reducing potential points of failure and ensuring faster access.
2. Potential Drawbacks
3. Onyxia's Vision
Despite the potential drawbacks, Onyxia's proposal to use S3 stems from a need for a simple, low-engineering solution. Instead of building a complex system from scratch or integrating several components, leveraging S3's capabilities presents an efficient way to manage metadata. The approach acknowledges the vendor lock-in risk but considers the trade-off acceptable given the benefits of reduced engineering complexity and rapid deployment.
Beta Was this translation helpful? Give feedback.
All reactions