For controlling or compliance purposes, you may want to see the exact data that has been shared from your Enterprise instance. All data shared from the instance ends up in an S3 bucket located in an AWS account owned by Gitpod. See the Data and Observability for more information on the observability architecture.
Customers can access the S3 bucket where the data is stored from any role/user in the Gitpod Instance’s AWS account by following the following steps:
Upon request, your Gitpod account manager can give you the name of the S3 bucket where the data from your instance is sent to.
Set up the AWS CLI environment to assume any role or user in the AWS account where Gitpod is installed into. For example, whatever user or role used to apply the CloudFormation template to install Gitpod can be used.
Copy
Ask AI
# e.g. if they're using env variablesexport AWS_SECRET_ACCESS_KEY=""export AWS_ACCESS_KEY_ID=""export AWS_SESSION_TOKEN=""
Use the CLI to inspect the data
Copy
Ask AI
# List the bucketaws s3 ls <bucket-name-provided-by-gitpod># Download a specific fileaws s3 cp s3://<bucket-name-provided-by-gitpod>/k8s-state/meta/2023/06/03/23/k8s-state-meta-1-2023-06-03-23-59-21-12dab8f5-0d40-4069-a679-172f94f13304 kubstate-example.json
The storage format depends on the telemetry type. For non-metrics data, the files can be directly inspected. For metrics data, see the instructions below.
Prometheus metrics data is stored in protobuf format, which requires special handling to decode and inspect. The following steps will help you access, decode, and search through Prometheus metrics data:
Use the prometheus-decoder tool to decode the metrics files and search for specific metrics:
Copy
Ask AI
# Decode a single file./prometheus-decoder -input ./metrics-data/filename# Decode all files and search for a specific metricfind ./metrics-data -type f -exec ./prometheus-decoder -input {} \; | grep 'metric_name' -A 20 -B 2 > ./results.txt
The results will be in JSON format, making it easier to read and analyze the metrics data.
Example: To search for the gitpod_scm_token_refresh_requests_total metric:
Copy
Ask AI
find ./metrics-data -type f -exec ./prometheus-decoder -input {} \; | grep 'gitpod_scm_token_refresh_requests_total' -A 20 -B 2 > ./results.txt
This will find all instances of the metric in the decoded data, include 2 lines before and 20 lines after each match, and save the results to a file called results.txt.
In case any data is found in the S3 bucket that contains personally identifiable or confidential information that should not have been leaked, the process for notifying Gitpod and remediating the issue is as follows:
Customer can access data to identify potentially sensitive data leaks: Customers are able to inspect any data that was sent to Gitpod by gaining access to the S3 bucket where all data from an instance is sent to (see “Accessing the Data Shared” above).
Customer informs of data leak: Upon identification of confidential data leakage, a customer can trigger security incident via their Gitpod account manager.
Data is deleted: The data that was “leaked” is identified and measures are taken to delete it in S3 and then further in any third party systems.
For S3 there is the option to delete the entire bucket. In any case, the data in this bucket is configured to have a very short retention. See Observability and Data.
If the effort is deemed worthwhile, the data can also be deleted individually
For 3rd party services, details will depend on the service and the data that was leaked.
Improvements made: The root cause of why the data leaked is identified, and measures are put in place to prevent this from occurring again.