Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Firestore stag db config #6

Conversation

saiyam3243
Copy link

@saiyam3243 saiyam3243 commented Nov 27, 2023

Motivation

For the integration of firestore db with this repo

Changes

Integration of firestore in repo and passed env variables via ci.yml

Checklist

  • added myself as assignee
  • correct reviewers
  • descriptive PR title using conventional commits.
  • description explains the motivation and details of the changes
  • tests cover my changes
  • documentation is updated
  • CI is green
  • breaking changes are discussed with the team and documented in the PR title ! (e.g. feat!: Update endpoint)

Copy link

linear bot commented Nov 27, 2023

META-36 Connect Firestore database to Sourcing Repo(s)

Just figured that one of your tickets was a duplicate, therefore adding a new one as otherwise your workload wouldn't match ours this week :)

Basically the same task of connecting firebase as for the analytics repo, but for at least 1 sourcing repo! Mostly reusing the same logic as in the analytics ticket.

Copy link

github-actions bot commented Nov 27, 2023

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
163 55 34% 0% 🟢

New Files

File Coverage Status
parma_db/firestore_service.py 0% 🟢
parma_mining/parma_db/init.py 100% 🟢
parma_mining/parma_db/firestore_service.py 0% 🟢
TOTAL 33% 🟢

Modified Files

No covered modified files...

updated for commit: d1aca00 by action🐍

@github-actions github-actions bot added the enhancement New feature or request label Nov 27, 2023
@egekocabas
Copy link
Contributor

This is the same as here right?
Do you know how to test this db connection?

@saiyam3243
Copy link
Author

This is the same as here right? Do you know how to test this db connection?

yes the task is similar. We need the db connection in both repos.

@saiyam3243 saiyam3243 closed this Nov 27, 2023
@saiyam3243 saiyam3243 reopened this Nov 27, 2023
@Constantin343
Copy link
Contributor

Does this give "selective" access to the database? I.e., only the part that is relevant for the Github module, so we can't interfere with other modules writing to the db?

Also, I would need the secrets to test.

@saiyam3243
Copy link
Author

Does this give "selective" access to the database? I.e., only the part that is relevant for the Github module, so we can't interfere with other modules writing to the db?

Also, I would need the secrets to test.

Will update you after team meet and secrets are sent.

@robinholzi
Copy link
Contributor

Does this give "selective" access to the database? I.e., only the part that is relevant for the Github module, so we can't interfere with other modules writing to the db?

Also, I would need the secrets to test.

Yes, that would also be a requirement before merging this!

Copy link
Contributor

@robinholzi robinholzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as here

Comment on lines 1 to 4
import firebase_admin
import os
from firebase_admin import credentials
from firebase_admin import firestore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these imports are little bit mixed.

Also I couldn't find a package for firebase_admin and firebase-admin using micromamba. But after activating the micromamba environment, I was able to use pip command which is for the activated environment

(parma-mining-github) egekocabas@Eges-Air parma-mining-github % which pip
/Users/egekocabas/micromamba/envs/parma-mining-github/bin/pip
(parma-mining-github) egekocabas@Eges-Air parma-mining-github % 

GitHub firebase-admin here
Firebase setup here

It seems that we need to use pip to install this dependency. If its the case then it should be added into the README.md

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caught on my eye.

I think in here we should add below line into Makefile --> install

pip install firebase-admin

then I worked for me 👌🏻

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pip install needs to be added to Makefile and also in Dockerfile

parma_db/firestore_db.py Outdated Show resolved Hide resolved
parma_db/firestore_db.py Outdated Show resolved Hide resolved
@robinholzi
Copy link
Contributor

@saiyam3243 How's the status here? As this is still a task from last week I'd like to see this finished as soon as possible! As It's only ~80 lines of code , I think this shouldn't be too hard to finish in time.

Please let us know if you can foreseeably not finish all the tasks assigned to you so we can reallocate our resources accordingly!

@Constantin343
Copy link
Contributor

@saiyam3243 @robinholzi Would be great if we can merge this soon. Not being able to use the db is starting to be a blocker for us.

@robinholzi
Copy link
Contributor

@saiyam3243 @robinholzi Would be great if we can merge this soon. Not being able to use the db is starting to be a blocker for us.

@Constantin343 I see that this topic is quite important for you, that's why we already started it in a sprint of Nov. 20th.
As the respective PRs still seem to be far from ready I think I might need to take this over as @saiyam3243 is not available this weekend afaik (or maybe that changed with the weather conditions?)

I hope that you/your team (@Constantin343) and @saiyam3243 have finished the schema specifications by now. (meta linear task)

@saiyam3243
Copy link
Author

@saiyam3243 @robinholzi Would be great if we can merge this soon. Not being able to use the db is starting to be a blocker for us.

Hi @Constantin343, I am waiting for your response on slack since two days on the topic of firestore db schema, my implementation is ready but I am just waiting if you need any changes there, if you think my proposed schema is fine, I can push my changes. I have already pushed my code in analytics team and the code will look similar but with few changes.

@Constantin343
Copy link
Contributor

@saiyam3243 @robinholzi Would be great if we can merge this soon. Not being able to use the db is starting to be a blocker for us.

Hi @Constantin343, I am waiting for your response on slack since two days on the topic of firestore db schema, my implementation is ready but I am just waiting if you need any changes there, if you think my proposed schema is fine, I can push my changes. I have already pushed my code in analytics team and the code will look similar but with few changes.

@saiyam3243 sorry, somehow I didn't get notified by Slack and therefore did not give you feedback on the schema. Do you need something else from data sourcing to proceed?

@robinholzi
Copy link
Contributor

@Constantin343 @saiyam3243 Ping me on Slack once you resolve the unclear points!

@Constantin343
Copy link
Contributor

Does this give "selective" access to the database? I.e., only the part that is relevant for the Github module, so we can't interfere with other modules writing to the db?

Also, I would need the secrets to test.

@saiyam3243 How is the access handeled now? I saw that I can define the datasource as a string to write to a collection (e.g., function add_new_raw_data), which looks like we can access all collections from every data source or is there already some mechanism in place to restrict the access?

Copy link
Contributor

@egekocabas egekocabas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments 👌🏻 but before starting, it is better to update this branch first (merge from main to your branch)

.github/workflows/ci.yml Outdated Show resolved Hide resolved
Comment on lines 1 to 4
import firebase_admin
import os
from firebase_admin import credentials
from firebase_admin import firestore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pip install needs to be added to Makefile and also in Dockerfile

parma_db/firestore_db.py Outdated Show resolved Hide resolved
parma_db/firestore_service.py Show resolved Hide resolved
parma_db/firestore_service.py Show resolved Hide resolved
parma_db/firestore_service.py Show resolved Hide resolved
parma_db/firestore_service.py Show resolved Hide resolved
.collection(page_id)
.document("raw_data")
)
doc_ref.set(raw_data_content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that would mean we overwrite the old raw_data or how would it work? I think we need a way to ensure we don't do that, maybe we can add a new document with the timestamp every time we write raw data.
It is important for the client that we collect the data over time

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as I mentioned in the last comment, we can add time stamp and last _modified_page fields to the pages document to avoid raw_data to get overwritten!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why is page_id a string here and not a dict as before? That seems to be a bit inconsistent

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats a mistake. changed now!

@Constantin343
Copy link
Contributor

Does this give "selective" access to the database? I.e., only the part that is relevant for the Github module, so we can't interfere with other modules writing to the db?
Also, I would need the secrets to test.

@saiyam3243 How is the access handeled now? I saw that I can define the datasource as a string to write to a collection (e.g., function add_new_raw_data), which looks like we can access all collections from every data source or is there already some mechanism in place to restrict the access?

Any updates regarding this topic?

@saiyam3243
Copy link
Author

Does this give "selective" access to the database? I.e., only the part that is relevant for the Github module, so we can't interfere with other modules writing to the db?
Also, I would need the secrets to test.

@saiyam3243 How is the access handeled now? I saw that I can define the datasource as a string to write to a collection (e.g., function add_new_raw_data), which looks like we can access all collections from every data source or is there already some mechanism in place to restrict the access?

Any updates regarding this topic?

When using the Firebase Admin SDK, security rules do not apply because it assumes you have administrative privileges. To restrict access on the server side, we need to implement our logic like defining a list of authenticated users for a particular data source and checking their access every time!

@saiyam3243
Copy link
Author

Does this give "selective" access to the database? I.e., only the part that is relevant for the Github module, so we can't interfere with other modules writing to the db?
Also, I would need the secrets to test.

@saiyam3243 How is the access handeled now? I saw that I can define the datasource as a string to write to a collection (e.g., function add_new_raw_data), which looks like we can access all collections from every data source or is there already some mechanism in place to restrict the access?

Any updates regarding this topic?

When using the Firebase Admin SDK, security rules do not apply because it assumes you have administrative privileges. To restrict access on the server side, we need to implement our logic like defining a list of authenticated users for a particular data source and checking their access every time!

I found out a way. Within firestore db rules, we can specify the rights of the user/devs, in that way we can save db to get accessed by anyone

@robinholzi
Copy link
Contributor

Does this give "selective" access to the database? I.e., only the part that is relevant for the Github module, so we can't interfere with other modules writing to the db?
Also, I would need the secrets to test.

@saiyam3243 How is the access handeled now? I saw that I can define the datasource as a string to write to a collection (e.g., function add_new_raw_data), which looks like we can access all collections from every data source or is there already some mechanism in place to restrict the access?

Any updates regarding this topic?

When using the Firebase Admin SDK, security rules do not apply because it assumes you have administrative privileges. To restrict access on the server side, we need to implement our logic like defining a list of authenticated users for a particular data source and checking their access every time!

I found out a way. Within firestore db rules, we can specify the rights of the user/devs, in that way we can save db to get accessed by anyone

fyi. For that, we need also need to authenticate first and have "service accounts" within the firebase auth service. I will build something for that as currently there's nothing in place as far as I can see.

@robinholzi robinholzi marked this pull request as draft December 5, 2023 17:28
@robinholzi
Copy link
Contributor

This solution uses the firebase_admin SDK which is NOT appropriate for usage in sourcing modules as it grants root like access to Firestore. This access cannot be restricted (through e.g. Firestore rules). Therefore we cannot rely on direct Firebase access here. Therefore I'll close this PR.

@robinholzi robinholzi closed this Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants