Pre-alpha phase for the "model compose" project.
1.1. Clearly define the problem the software is solving
Standardise the ML development workflow, using the Docker infrastructure as a base. Centralise the versioning, management, visualization, reporting, and deployment of models in a reproducible manner.
1.2. Understand and outline your target audience or user persona.
This might include data scientists, machine learning engineers, researchers, and other stakeholders involved in ML development and deployment.
1.3. Validate the concept through open source community feedback, surveys, and competitor analysis.
-
Investigate MLflow: Research MLflow, an open-source platform designed to manage the ML lifecycle, including experimentation, reproducibility, and deployment. Understand its features, strengths, and weaknesses. - https://github.com/mlflow/mlflow/
-
Explore DVC (Data Version Control): Look into DVC, an open-source version control system for ML projects. Understand how it brings Git-like version control to data science teams, tracking data, models, and experiments. - https://github.com/iterative/dvc
-
Study Kubeflow: Examine Kubeflow, an open-source project developed by Google to run machine learning workflows on Kubernetes. Assess how it simplifies deployments of ML workflows and understand its portability and scalability aspects. - https://www.kubeflow.org
-
Understand Seldon: Investigate Seldon, an open-source platform that enables data scientists and engineers to deploy, scale, and monitor their machine learning models in production. Determine how it manages these aspects and how it could potentially integrate with your project. - https://www.seldon.io - NOT OPENSOURCE
-
Look into Tecton: Review Tecton, a feature store for operational machine learning, designed to help data scientists manage and access features for model training and inference. Explore how it manages and provides access to ML features. - https://www.tecton.ai - NOT OPENSOURCE
-
Research Neptune.ai: Analyze Neptune.ai, a platform that aids in tracking machine learning experiments and facilitates monitoring and visualizing metrics and outputs. Understand how it achieves these functionalities and how they may fit into your project. - https://neptune.ai/homepage - NOT OPENSOURCE
I'm still reading through these, but they all seem to be their own thing, apart from the docker environment. Perhaps except kubeflow, which does not seem to be suitable for individuals and small enterprises. I'm looking for a thight integration with docker and compose. A superset of docker compose. This will minimise friction of adoption and integration with ongoing projects.
1.4. Document the key findings and insights.
2.1. Identify key stakeholders (users, contributors, etc.) and define their roles.
2.2. Gather detailed requirements through online feedback and discussion threads.
2.3. Analyze and document the gathered requirements.
2.4. Prioritize requirements based on feedback, community needs, and technical feasibility.
3.1. Define and document the high-level architecture of the software.
3.2. Design and document the user interface layout and navigation (mockups, wireframes).
3.3. Plan database structure and data flow diagram.
3.4. Develop and finalize detailed design documents including software blueprints, diagrams, and technical specifications.
4.1. Define the development methodology (Waterfall, Agile, etc.).
4.2. Breakdown requirements into user stories or tasks.
4.3. Estimate the effort required for each task.
4.4. Create a project timeline.
5.1. Develop a simple PoC or prototype for complex features.
5.2. Validate the feasibility of the prototype through community feedback.
5.3. Incorporate feedback and revise as necessary.
6.1. Set up a coding environment (IDE, language/framework version, etc.).
6.2. Set up version control systems (GitHub, GitLab, Bitbucket, etc.).
6.3. Define code review and branching strategies.
7.1. Plan and set up a testing environment similar to the production environment.
7.2. Prepare test data for different testing types (Unit, Integration, System).
7.3. Set up automated testing tools, if applicable.
8.1. Identify potential project risks (technical, community acceptance, etc.)
8.2. Develop a risk mitigation plan.
8.3. Define contingency plans for high-impact risks.
9.1. Define the required documentation (technical, user manual, contributor guidelines, etc.)
9.2. Start documenting as you go.
10.1. Establish a regular communication schedule with the community (updates, reports, etc.)
10.2. Open channels for discussion, feedback, and collaboration.
10.3. Use social media or relevant platforms to keep the community updated.