Skip to content
This repository has been archived by the owner on Aug 14, 2024. It is now read-only.

docs(self-hosted): external storage configurations #1269

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/components/sidebar.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,8 @@ export default () => {
<SidebarLink to="/self-hosted/geolocation/">Geolocation</SidebarLink>
<SidebarLink to="/self-hosted/sso/">Single Sign-On (SSO)</SidebarLink>
<SidebarLink to="/self-hosted/csp/">Content Security Policy (CSP)</SidebarLink>
<SidebarLink to="/self-hosted/reverse-proxy">Reverse Proxy</SidebarLink>
<SidebarLink to="/self-hosted/reverse-proxy/">Reverse Proxy</SidebarLink>
<SidebarLink to="/self-hosted/external-storage/">External Storage</SidebarLink>
<SidebarLink to="/self-hosted/troubleshooting/">Troubleshooting</SidebarLink>
<SidebarLink to="/self-hosted/support/">Support</SidebarLink>
</ul>
Expand Down
89 changes: 89 additions & 0 deletions src/docs/self-hosted/external-storage.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
title: External Storage
Copy link

@stayallive stayallive May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should generalize to "Data Storage" or something.

This way this document can explain where the data is stored by default and can list alternatives if there are any.

Could have the following sections:

  • Sentry (with a general explanation about postgres, clickhouse and kafka maybe)
    • Filestore (Uploads, Replays)
      • Database
      • Object Storage
    • Nodestore (Event data)
      • Database
  • Vroom (Profiles)
    • Docker volume
    • Object Storage

We should probably either rename those section to what the specifically store or explain that in the intro because "Vroom" is not very descriptive but if it's explained that that component is responsible for (ingest and) storing profiling data it makes a lot more sense.

Maybe with until someone else also chimes in before rewriting the whole thing in case I'm off base with this outline but this sounds like a document I would love to have had when I started my self-hosted adventures 👍

For the Object Storage thing we might want to link to the relevant documentation instead of adding examples for every option under the sun because otherwise there is no bound to the size of this document.

---

<!-- Hello! If you're reading this, you're in luck because I can't decide whether to make.. wait let me copy the text from Discord.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: much easier to review if questions/comments like this are added in the GH comment system, rather than in-line in the PR.


I got some time before Monday to write up some docs about setting up an S3 storage for selfhosted instance, but I can't decide whether I should put it under a big "External Services" page, in which people can include external postgres, external redis, and that kind of things; or should I put it under a page called "External Storage"?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call the proposed page "Integrating with Major Cloud Providers" or something similar, just to make it clear that we are specifically referring to GCS/AWS/Azure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

External storage sounds fine. I don't think we should recommend people to use external postgres, redis, etc as that can introduce a lot of issues for people trying to set that up unless they really know what they're doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with "don't think we should recommend people to use external postgres, redis, etc", but if there are some people who wishes to do that... I don't know if there's any other better way to say to them that "they're on their own"


There. Please help me decide this. I'll delete this comment afterwards -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a separate page that, rather than "External Services", says something like "Unsupported Workflows".

For example, S3 storage support technically exists, but is a. untested, and b. unused by Sentry internally. So there's no real pressure ensuring that is stays functional over time. Ultimately, what we have at the moment is a (very possibly bit-rotted) thin wrapper around Django's FileStore capabilities. We do not want to indicate to users that it is something we'll offer support for, because realistically we can't offer very good support for it, and folks will be left disappointed.

For this specific doc, we need to be very clear that this is provided as a rough best effort template, and that we offer very limited support for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because realistically we can't offer very good support for it, and folks will be left disappointed.

I know this, but I think the current S3 support is good enough for selfhosted. Users can always pull out their own Django plugins though. Like @stayallive's S3 Nodestore plugin https://github.com/stayallive/sentry-nodestore-s3

we need to be very clear that this is provided as a rough best effort template

I agree. Need to take some time to come up with good enough copywriting for this lol.


<Alert title="Note" level="info">
After changing configuration files, re-run the <code>./install.sh</code> script, to rebuild and restart the containers. See the <Link to="/self-hosted/#configuration">configuration section</Link> for more information.
</Alert>

<!-- Should we add a description about what "external storage" is? -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean by "external storage"? Does that essentially mean "storage supplied by a cloud services provider like AWS/GCP/Azure"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's something that's not strictly on the same filesystem as the sentry self-hosted instance. But it also excludes if you want to use something like NAS or external bind-mount storage to store Sentry data. For files, it's the blob storage provided by each cloud provider. For databases, it's external database that's either managed or unmanaged, but it should be separate to the sentry self-hosted instance. Do you have any suggestion on how to better phrase this out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can safely assume that people can find that out for themselves if they are looking at this page


## Filestore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
## Filestore
## Django Filestore

Sentry (confusingly) maintains a separate service called Filestore which acts as an intermediate layer in front of ex GCP, though we don't really recommend this for self-hosted use.


Filestore handles storing attachment, sourcemap, and replays. Filestore configuration for Sentry should be configured on the `sentry/config.yml` file.

### S3 backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure, have you tried the Azure/s3 compatible backend without issues? We're using GCS so wanted to make sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


The configuration for S3-compatible backend is pointed to `sentry.filestore.s3.S3Boto3Storage`.

```yaml
filestore.backend: 's3'
filestore.options:
bucket_acl: 'private'
default_acl: 'private'
access_key: '<REDACTED>'
secret_key: '<REDACTED>'
bucket_name: 'my-bucket'
region_name: 'auto'
endpoint_url: 'https://<REDACTED>'
addressing_style: 'path' # For regular AWS S3, use "auto" or "virtual". For other S3-compatible API like MinIO or Ceph, use "path".
signature_version: 's3v4'
```

Refer to [botocore configuration](https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html) for valid configuration values.

<!-- ### Google Cloud Storage backend

I don't know how this works. The source code that points to this configurations:
- https://github.com/getsentry/sentry/blob/751ef4a029dda5802311fc424a5f63d72b7efd3d/src/sentry/conf/server.py#L2149
- https://github.com/getsentry/sentry/blob/751ef4a029dda5802311fc424a5f63d72b7efd3d/src/sentry/filestore/gcs.py#L226-L245 -->

## Vroom

Vroom is the service that handles profiling. By default the data for profiling is saved on local filesystem. On self-hosted deployment, this should be done by overriding the `SENTRY_BUCKET_PROFILES` environment variable. It's also possible that additional environment variables should be added, depending on the backend of choice.

### S3 backend

```bash
# For regular AWS S3
s3://my-bucket?awssdk=v1&region=us-west-1&endpoint=amazonaws.com

# For other S3-compatible API
aldy505 marked this conversation as resolved.
Show resolved Hide resolved
s3://my-bucket?awssdk=v1&region=any-region&endpoint=minio.yourcompany.com&s3ForcePathStyle=true&disableSSL=false
```

Additional environment variables should be provided:
- `AWS_ACCESS_KEY=foobar`
- `AWS_SECRET_KEY=foobar`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (here and elsewhere): change the value from foobar to something else, to make clear that the two keys will be different in practice. I would suggest something like your_secret_key or similar.

- `AWS_SESSION_TOKEN=foobar` (optional)

Further explanation on the query string options:
- `region`: The AWS region for requests.
- `endpoint`: The endpoint URL (hostname only or fully qualified URI).
- `disableSSL`: A value of "true" disables SSL when sending requests.
- `s3ForcePathStyle`: A value of "true" forces the request to use path-style addressing.

### Azure Blob Storage backend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My inclination is to remove Azure for now, and indicate somewhere in this document that there are no shims for it atm. Since we don't have a Filestore shim for Azure, in practice it will be very hard to run vroom on Azure, and users will likely work themselves into a corner if they try.


```bash
azblob://my-container?protocol=https&domain=yourcompany.blob.core.windows.net&localemu=false&cdn=false
```

Additional environment variables that should be provided (pick what's compatible with your configuration):
- `AZURE_STORAGE_ACCOUNT=foobar` - The service account name. Required if used along with `AZURE_STORAGE_KEY`, because it defines authentication mechanism to be [azblob.NewSharedKeyCredential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob#NewSharedKeyCredential), which creates immutable shared key credentials. Otherwise, "storage_account" in the URL query string parameter can be used.
aldy505 marked this conversation as resolved.
Show resolved Hide resolved
- `AZURE_STORAGE_KEY=foobar` - To use a shared key credential alongside with `AZURE_STORAGE_ACCOUNT`.
- `AZURE_STORAGE_SAS_TOKEN=foobar` - To use a SAS token

Other authentication options and details can be found on the [gocloud.dev/blob/azblob's documentation](https://pkg.go.dev/gocloud.dev@v0.37.0/blob/azureblob#hdr-URLs)

Further explanation on the query string options:
- `domain`: Your storage domain.
- `protocol`: Network protocol (`http` or `https`).
- `cdn`: A value of "true" specifies that the blob server is a CDN.
- `localemu`: A value of "true" specifies that the blob server is a local emulator.
Loading