-
Notifications
You must be signed in to change notification settings - Fork 604
Student Projects
Below is the list of tasks that are good for student projects (course or graduate work).
YDB has a Coordination Service which allow your client application to elect a leader via distributed lock (similar to ZooKeeper). The task is to add support for Coordination Service to Go SDK.
Mentor: Aleksei Miasnikov (https://github.com/asmyasnikov)
Add support for sqlx.StructScan() from github.com/jmoiron/sqlx to Go SDK.
Mentor: Aleksei Miasnikov (https://github.com/asmyasnikov)
Add capability of loading plugins implemented via so/dll into C++ SDK.
Example: YDB supports different authorisation mechanisms, it's a good idea to implement them as plugins to keep code dependancies clear.
Mentor: Daniil Cherednik (https://github.com/dcherednik)
Out of the box monitoring for you client application is awesome. We have some ideas how to extend C++ SDK Monitoring facilities.
TODO: detailed description
Mentor: Daniil Cherednik (https://github.com/dcherednik)
Current C++ sdk implementation requires to call driver.Stop(true) method at the end of program. There are some internal sdk routines which can invoke gRpc calls out of user call context but gRpc does not allow this call after exit from main function. Such approach (to call driver.Stop(true)) is not convenient for real application because often it is difficult to control place where driver is constructed. The simplest solution is to make driver as a singletone object. Singletone usage is reasonable here because driver is able to work with multiple databases or with multiple clusters effectively sharing threads, connections and other grpc resources. Other solutions (using atexit fuction) are still possible to discuss. This task requires good knowledge of multitheading programming, ability to write portability code.
Mentor:Daniil Cherednik (https://github.com/dcherednik)
Currently YDB CLI (https://ydb.tech/en/docs/getting_started/cli) doesn't support interactive mode. Interactive mode means that you can run the ydb
program and it will provide you a way to write queries and get responses something like the psql
program does.
Mentor: Nikolay Perfilov (https://github.com/pnv1)
Currently YDB CLI (https://ydb.tech/en/docs/getting_started/cli) supports CSV and TSV input formats only. There're lots of other common formats we should support here. Such as JSON, Parquet, Avro, MessagePack, Debezium (over JSON or Avro), ORC, Protobuf, and so on. You could be interested in this task
- if you want to know how modern systems serialize their data
- if you want to get experience in data transfer between such systems
Mentor: Artem Zuikov (https://github.com/4ertus2)
If you want to dive into YDB's core, this is the task you are looking for. Writing and reading from/to YDB Distributed Storage (DS) effectively is very important. For every write to DS YDB's component dsproxy generates several messages TEvVPut (https://github.com/ydb-platform/ydb/blob/main/ydb/core/blobstorage/vdisk/common/vdisk_events.h#L504) because we write multiple replicas or erasure parts. TEvVPut message is serialised to go into the wire. The task is to optimize TEvVPut serialization. Currently we use Google protobuf for message serialization, but options are:
- Use google protobuf for metadata serialization only, but don't put opaque data into proto message. Put it next after protobuf message;
- Use flat buffers;
- Use custom protocol. We expect that you think propose some solutions, implement them and compare performance via benchmark.
Mentor: Aleksey Stankevichus (https://github.com/the-ancient-1)
YDB uses lwtrace library for tracing events in the system and to debug issues.
TODO: add detailed description
Mentor: Aleksey Stankevichus (https://github.com/the-ancient-1)
TODO: add detailed description
Mentor: Andrey Fomichev (https://github.com/fomichev3000)
TODO: add detailed description
Mentor: Oleg Doronin (https://github.com/dorooleg)
https://github.com/ydb-platform/ydb/issues/101
Mentor: Ilnaz Nizametdinov (https://github.com/CyberROFL)
Federated queries are the ability to take a query and provide solutions based on information from many different sources. Airflow™ pipelines are defined in Python, allowing for dynamic pipeline generation. This allows for writing code that instantiates pipelines dynamically. In this project it's neccessary to implement the node of this pipeline which communicates with YDB Federated Queries
Languages: C++, Python
Mentor: Oleg Doronin (https://github.com/dorooleg)
Federated queries are the ability to take a query and provide solutions based on information from many different sources. DBT, short for data build tool, is an open source project for managing data transformations in a data warehouse. DBT are part of the modern data stack. In this project it's neccessary to integrate the YDB Federated Queries into DBT
Languages: Python
Mentor: Oleg Doronin (https://github.com/dorooleg)
Federated queries are the ability to take a query and provide solutions based on information from many different sources. After the data has already been proccessed it's usefull to visualize them. There is the Apache Superset for this purpose. Apache Superset uses the SQL Alchemy to communcate with external systems. SQL Alchemy already supports YDB and is used in Datalens. In this project, this functionality needs to be integrated into Apache Superset
Languages: Python
Mentor: Oleg Doronin (https://github.com/dorooleg)
Kafka supports different sources/sinks for communicating with external systems. It's easy to find the list of these sources/sinks here. As we can see there is no YDB as a sink. This project involves a new implementation of the YDB sink. The example of sink implementation can be found here
Languages: Java
Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. In this project, you need to implement ydb as a sink for the data graph in Apache NiFi
Languages: Java
Mentor: Oleg Doronin (https://github.com/dorooleg)
The Jupyter Notebook is a web-based interactive computing platform. This tool is very popular for data analysis and visualization. It is necessary to develop a library that will allow users to conveniently interact with Federated Queries in the Jupyter Notebook environment
Languages: Python
Mentor: Oleg Doronin (https://github.com/dorooleg)