A serverless, pseudonymizing, DLP layer between Worklytics and the REST API of your data sources.
Psoxy replaces PII in your organization's data with hash tokens to enable Worklytics's analysis to be performed on anonymized data which we cannot map back to any identifiable individual.
Psoxy is a pseudonymization service that acts as a Security / Compliance layer, which you can deploy between your data sources (SaaS tool APIs, Cloud storage buckets, etc) and the tools that need to access those sources.
Psoxy ensures more secure, granular data access than direct connections between your tools will offer - and enforces access rules to fulfill your Compliance requirements.
Psoxy functions as API-level Data Loss Prevention layer (DLP), by blocking sensitive fields / values / endpoints that would otherwise be exposed when you connect a data sources API to a 3rd party service. It can ensure that data which would otherwise be exposed to a 3rd party service, due to granularity of source API models/permissions, is not accessed or transfered to the service.
Objectives:
- serverless - we strive to minimize the moving pieces required to run psoxy at scale, keeping your attack surface small and operational complexity low. Furthermore, we define infrastructure-as-code to ease setup.
- transparent - psoxy's source code is available to customers, to facilitate code review and white box penetration testing.
- simple - psoxy's functionality will focus on performing secure authentication with the 3rd party API and then perform minimal transformation on the response (pseudonymization, field redaction) to ease code review and auditing of its behavior.
Psoxy may be hosted in Google Cloud or AWS.
A Psoxy instances reside on your premises (in the cloud) and act as an intermediary between Worklytics and the data source you wish to connect. In this role, the proxy performs the authentication necessary to connect to the data source's API and then any required transformation (such as pseudonymization or redaction) on the response.
Orchestration continues to be performed on the Worklytics side.
Source API data may include PII such as:
{
"id": "1234567890",
"name": "John Doe",
"email": "john.doe@acme.com"
}
But Psoxy ensures Worklytics only sees:
{
"id": "t~A80SJXrbfawKpDRcddGnKI4QDKyjQI9KtjJZDb8FZ27UE_toS68FyWz7Y22fnQYLP91SHJ",
"email": "p~SIoJOpeSgYF7YUPQ28IWZexVuHyN9A80SJXrbfawKpDRcddGnKI4QDKyjQI9KtjJZDb8FZ27UE_toS68FyWz7Y22fnQYLP91SHJGVwQiN3E@acme.com"
}
These pseudonyms leverage SHA-256 hashing / AES encryption, with salt/keys that are known only to your organization and never transferred to Worklytics.
Psoxy enforces that Worklytics can only access API endpoints you've configured (principle of least
privilege) using HTTP methods you allow (eg, limit to GET
to enforce read-only for RESTful APIs).
For data sources APIs which require keys/secrets for authentication, such values remain stored in your premises and are never accessible to Worklytics.
You authorize your Worklytics tenant to access your proxy instance(s) via the IAM platform of your cloud host.
Worklytics authenticates your tenant with your cloud host via Workload Identity Federation. This eliminates the need for any secrets to be exchanged between your organization and Worklytics, or the use any API keys/certificates for Worklytics which you would need to rotate.
See also: API Data Sanitization
As of March 2023, the following sources can be connected to Worklytics via psoxy.
Note: Some sources require specific licenses to transfer data via the APIs/endpoints used by Worklytics, or impose some per API request costs for such transfers.
For all of these, a Google Workspace Admin must authorize the Google OAuth client you provision (with provided terraform modules) to access your organization's data. This requires a Domain-wide Delegation grant with a set of scopes specific to each data source, via the Google Workspace Admin Console.
If you use our provided Terraform modules, specific instructions that you can pass to the Google Workspace Admin will be output for you.
Source | Examples | Scopes Needed |
---|---|---|
Google Calendar | data - rules | calendar.readonly |
Google Chat | data - rules | admin.reports.audit.readonly |
Google Directory | data - rules | admin.directory.user.readonly admin.directory.user.alias.readonly admin.directory.domain.readonly admin.directory.group.readonly admin.directory.group.member.readonly admin.directory.orgunit.readonly |
Google Drive | data - rules | drive.metadata.readonly |
GMail | data - rules | gmail.metadata |
Google Meet | data - rules | admin.reports.audit.readonly |
NOTE: the above scopes are copied from infra/modules/worklytics-connector-specs. Please refer to that module for a definitive list.
NOTE: 'Google Directory' connection is required prerequisite for all other Google Workspace connectors.
NOTE: you may need to enable the various Google Workspace APIs within the GCP project in which you provision the OAuth Clients. If you use our provided terraform modules, this is done automatically.
NOTE: the above OAuth scopes omit the https://www.googleapis.com/auth/
prefix. See OAuth 2.0 Scopes for Google APIs for details of scopes.
See details: sources/google-workspace/README.md
For all of these, a Microsoft 365 Admin (at minimum, a Privileged Role Administrator) must authorize the Azure Application you provision (with provided terraform modules) to access your Microsoft 365 tenant's data with the scopes listed below. This is done via the Azure Portal (Active Directory). If you use our provided Terraform modules, specific instructions that you can pass to the Microsoft 365 Admin will be output for you.
Source | Examples | Application Scopes |
---|---|---|
Entra ID (former Active Directory) | data - rules | User.Read.All Group.Read.All MailboxSettings.Read |
Calendar | data - rules | User.Read.All Group.Read.All Calendars.Read MailboxSettings.Read |
data - rules | User.Read.All Group.Read.All Mail.ReadBasic.All MailboxSettings.Read |
|
Teams (beta) | data - rules | User.Read.All Team.ReadBasic.All Channel.ReadBasic.All Chat.Read.All ChannelMessage.Read.All CallRecords.Read.All OnlineMeetings.Read.All |
NOTE: the above scopes are copied from infra/modules/worklytics-connector-specs./ Please refer to that module for a definitive list.
NOTE: usage of the Microsoft Teams APIs may be billable, depending on your Microsoft 365 licenses and level of Teams usage. Please review: Payment models and licensing requirements for Microsoft Teams APIs
See details: sources/microsoft-365/README.md
These sources will typically require some kind of "Admin" within the tool to create an API key or client, grant the client access to your organization's data, and provide you with the API key/secret which you must provide as a configuration value in your proxy deployment.
The API key/secret will be used to authenticate with the source's REST API and access the data.
Source | Details + Examples | API Permissions / Scopes |
---|---|---|
Asana | sources/asana | a Service Account (provides full access to Workspace) |
GitHub | sources/github | Read Only permissions for: Repository: Contents, Issues, Metadata, Pull requests Organization: Administration, Members |
Jira Cloud | sources/atlassian/jira-cloud | "Classic Scopes": read:jira-user read:jira-work "Granular Scopes": read:group:jira read:user:jira "User Identity API" read:account |
Jira Server / Data Center | sources/atlassian/jira-server | Personal Acccess Token on behalf of user with access to equivalent of above scopes for entire instance |
Salesforce | sources/salesforce | api chatter_api refresh_token offline_access openid lightning content cdp_query_api |
Slack | sources/slack | discovery:read |
Zoom | sources/zoom | meeting:read:past_meeting:admin meeting:read:meeting:admin meeting:read:list_past_participants:admin meeting:read:list_past_instances:admin meeting:read:list_meetings:admin meeting:read:participant:admin report:read:list_meeting_participants:admin report:read:meeting:admin report:read:user:admin user:read:user:admin user:read:list_users:admin |
NOTE: the above scopes are copied from infra/modules/worklytics-connector-specs. Please refer to that module for a definitive list.
Other data sources, such as Human Resource Information System (HRIS), Badge, or Survey data can be exported to a CSV file. The "bulk" mode of the proxy can be used to pseudonymize these files by copying/uploading the original to a cloud storage bucket (GCS, S3, etc), which will trigger the proxy to sanitize the file and write the result to a 2nd storage bucket, which you then grant Worklytics access to read.
Alternatively, the proxy can be used as a command line tool to pseudonymize arbitrary CSV files
(eg, exports from your HRIS), in a manner consistent with how a psoxy instance will pseudonymize
identifiers in a target REST API. This is REQUIRED if you want SaaS accounts to be linked with HRIS
data for analysis (eg, Worklytics will match email set in HRIS with email set in SaaS tool's account
so these must be pseudonymized using an equivalent algorithm and secret). See java/impl/cmd-line/
for details.
See also: Bulk File Sanitization
The prequisites and dependencies you will need for Psoxy are determined by:
- Where you will host psoxy? eg, Amazon Web Services (AWS), or Google Cloud Platform (GCP)
- Which data sources you will connect to? eg, Microsoft 365, Google Workspace, Zoom, etc, as defined in previous sections.
Once you've gathered that information, you can identify the required software and permissions in the next section, and the best environment from which to deploy Psoxy.
At a high-level, you need 3 things:
- a cloud host platform account to which you will deploy Psoxy (eg, AWS account or GCP project)
- an environment on which you will run the deployment tools (usually your laptop)
- some way to authenticate that environment with your host platform as an entity with sufficient permissions to perform the deployment. (usually an AWS IAM Role or a GCP Service Account, which your personal AWS or Google user can assume).
You, or the IAM Role / GCP Service account you use to deploy Psoxy, usually does NOT need to be authorized to access or manage your data sources directly. Data access permissions and steps to grant those vary by data source and generally require action to be taken by the data source administrator AFTER you have deployed Psoxy.
As of Feb 2023, Psoxy is implemented with Java 11 and built via Maven. The proxy infrastructure is provisioned and the Psoxy code deployed using Terraform, relying on Azure, Google Cloud, and/or AWS command line tools.
You will need all the following in your deployment environment (eg, your laptop):
Tool | Version | Test Command |
---|---|---|
git | 2.17+ | git --version |
Maven | 3.6+ | mvn -v |
Java JDK 11+ | 11, 17, 21 (see notes) | mvn -v | grep Java |
Terraform | 1.3+, <= 1.9 | terraform version |
NOTE: we will support Java versions for duration of official support windows, in particular the LTS versions. As of Nov 2023, we still support java 11 but may end this at any time. Minor versions, such as 12-16, and 18-20, which are out of official support, may work but are not routinely tested.
NOTE: Using terraform
is not strictly necessary, but it is the only supported method. You may
provision your infrastructure via your host's CLI, web console, or another infrastructure provisioning
tool, but we don't offer documentation or support in doing so. Adapting one of our
terraform examples or writing your own config that re-uses our
modules will simplify things greatly.
NOTE: Refrain to use Terraform versions 1.4.x that are < v1.4.3. We've seen bugs.
NOTE: from v0.4.59, we've relaxed Terraform version constraint on our modules to allow up to 1.9.x. However, we are not officially supporting this, as we strive to maintain compatibility with both OpenTofu and Terraform.
Depending on your Cloud Host / Data Sources, you will need:
Condition | Tool | Test Command | Roles / Permissions (Examples, YMMV) |
---|---|---|---|
if deploying to AWS | AWS CLI 2.2+ | aws --version |
|
if deploying to GCP | Google Cloud CLI 1.0+ | gcloud version |
|
if connecting to Microsoft 365 | Azure CLI 2.29+ | az --version |
Cloud Application Administrator |
if connecting to Google Workspace | Google Cloud CLI 1.0+ | gcloud version |
For testing your psoxy instance, you will need:
Tool | Version | Test Command |
---|---|---|
Node.js | 16+ (ideally, an LTS version) | node --version |
npm (should come with node ) |
8+ | npm --version |
NOTE: Node.js v16 is unmaintained since Oct 2023, so we recommend a newer version: v18, v20. Some Node.js versions (e.g. v21) may display warning messages when running the test scripts.
We provide a script to check these prereqs, at tools/check-prereqs.sh
.
That script has no dependencies itself, so should be able to run on any plain POSIX-compliant shell
(eg,bash
, zsh
, etc) that we'd expect you to find on most Linux, MacOS, or even Windows with
Subsystem for Linux (WSL) platforms.
# from the root of a clone of this repository
./tools/check-prereqs.sh
-
Choose the cloud platform you'll deploy to, and follow its 'Getting Started' guide:
-
Based on that choice, pick from the example template repos below. Use your choosen option as a template to create a new GitHub repo, or if you're not using GitHub Cloud, create clone/fork of the choosen option in your source control system:
- AWS - https://github.com/Worklytics/psoxy-example-aws
- GCP - https://github.com/Worklytics/psoxy-example-gcp
You will make changes to the files contained in this repo as appropriate for your use-case. These changes should be committed to a repo that is accessible to other members of your team who may need to support your Psoxy deployment in the future.
-
Pick the location from which you will deploy (provision) the psoxy instance. This location will need the software prereqs defined in the previous section. Some suggestions:
- your local machine; if you have the prereqs installed and can authenticate it with your host platform (AWS/GCP) as a sufficiently privileged user/role, this is a simple option
- Google Cloud Shell - if you're using GCP and/or connecting to Google Workspace, this is option simplifies authentication. It includes the prereqs above EXCEPT aws/azure CLIs out-of-the-box.
- Terraform Cloud - this works, but adds complexity of authenticating it with you host platform (AWS/GCP)
- Ubuntu Linux VM/Container - we provide some setup instructions covering prereq installation for Ubuntu variants of Linux, and specific authentication help for:
-
Follow the 'Setup' steps in the READMEs of those repos, ultimately running
terraform apply
to deploy your Psoxy instance(s). -
follow any
TODO
instructions produced by Terraform, such as:- provision API keys / make OAuth grants needed by each Data Connection
- create the Data Connection from Worklytics to your psoxy instance (Terraform can provide
TODO
file with detailed steps for each)
-
Various test commands are provided in local files, as the output of the Terraform; you may use these examples to validate the performance of the proxy. Please review the proxy behavior and adapt the rules as needed. Customers needing assistance adapting the proxy behavior for their needs can contact support@worklytics.co
Component | Status |
---|---|
Java | |
Terraform Examples | |
Tools |
Review release notes in GitHub.
Psoxy is maintained by Worklytics, Co. Support as well as professional services to assist with configuration and customization are available. Please contact sales@worklytics.co for more information or visit www.worklytics.co.