Provider Datalab – Usage & Concepts
This section explains how to use the provider-datalab configuration packages once they are installed. It focuses on the concepts of Sessions, Files, vclusters, Storage Secrets, Databases and the optional Keycloak integration for identity and access.
Read this page as an operator-facing contract. A Datalab gives users a smooth workspace, but durable state should stay visible to the platform team: object-storage credentials, persistent volumes, databases, key-value/cache stores, vector stores, registries, identity resources, and network policy. That visibility lets operators own lifecycle, monitoring, and backups.
Concepts
Sessions
A Datalab claim may declare one or more spec.sessions. A session is the named workspace identity for a user or workflow. It owns a durable home PVC and may also have a live Educates runtime.
Multiple started sessions
A single Datalab can have multiple sessions started at the same time. This supports patterns such as a default human workspace plus a separate analysis, automation, or agent workspace. Each started session gets its own runtime and durable workspace PVC, while shared Datalab-level credentials and managed services remain operator-visible.
- Each
spec.sessionsentry declares a session byname.statedefaults tostarted. state: startedcreates the WorkshopSession runtime for that session and mounts the session PVC as the home workspace.state: stoppedkeeps the declared session and its PVC, but does not create the WorkshopSession runtime. Switching back tostartedreuses the same workspace PVC.- If no sessions are given, no declared session PVC or runtime is pre-created. The shared runtime namespace and non-session resources can still be reconciled and tested without a
WorkshopSession.
Sessions can also be patched into the spec later if needed.
Persistence
Each declared Datalab session is equipped with a persistent volume for storing files, in addition to the connected object storage. This ensures that user data and session state are preserved even if the workshop pod is restarted, rescheduled by Kubernetes, or intentionally stopped through state: stopped. Installing code libraries, handling metadata, or working with Git repositories often generates many small files that may be updated frequently. A storage class providing NFS-like capabilities is usually a good fit for these kinds of workloads, object storage abstractions are not.
Provider Datalab creates a stable PVC per declared session, including sessions with state: stopped, in the Educates workshop namespace and configures Educates to use that claim as the /home/eduk8s workspace volume. The size comes from spec.quota.storage, and spec.persistence.storageClassName may select a StorageClass subject to the operator allowlist in EnvironmentConfig.data.storageClasses.allowed.
When EnvironmentConfig.data.storageClasses.allowed is non-empty, a requested storageClassName is used only if it appears in that list; otherwise Provider Datalab uses the first allowed class. If the list is omitted or empty, any requested class is allowed and an omitted storageClassName lets Kubernetes use the cluster default.
For operators, this is the responsibility boundary. Session PVCs are useful for workspace state and many-small-file workloads, but they are not a replacement for managed data services. If data must survive upgrades, disaster recovery events, or independent service lifecycles, use a managed database, a bucket provisioned outside Provider Datalab, or another store with a clear backup policy.
Database
Many Datalab workloads require a stateful database in addition to files and object storage, for example metadata catalogs or application backends.
Instead of running databases inside sessions, Datalabs attach to a platform-managed database cluster. The platform creates logical databases inside that cluster and provisions credentials automatically.
spec:
databases:
pg0:
names:
- dev
- prod
storage: 1Gi
backupStorage: 3Gi
pg0- target database cluster managed by the platformnames- logical databases created inside the clusterstorage- persistent storage allocationbackupStorage- space reserved for backups
The platform automatically:
- creates databases and users
- stores credentials in a Secret
- injects connection details into sessions
- configures backups according to the operator-managed database setup
- keeps data independent from session lifecycle
This keeps compute ephemeral while database state remains durable.
If a Kubernetes gateway service is running in the cluster and enabled in the global configuration, the database can also be exposed externally. In that case, corresponding environment variables such as the external hostname or external URL are injected into the session as well.
For each declared PostgreSQL host, Provider Datalab also exposes host-scoped aliases in the generated <datalab>-datalab Secret. For example, pg0 receives variables such as POSTGRES_PG0_HOST, POSTGRES_PG0_PORT, POSTGRES_PG0_DATABASES, POSTGRES_PG0_DEV_URL, and, when gateway exposure is configured, POSTGRES_PG0_DEV_URL_EXTERNAL.
Note: The Postgres endpoint is exposed through a gateway
TLSRoute,which requires immediate TLS with SNI (direct TLS). The PostgreSQL server and libpq-based clients (e.g. psql, psycopg) fully support this. However, some non-libpq drivers such as asyncpg do not yet implement this negotiation correctly and may fail during connection setup.
Document, Cache, and Vector Stores
For non-relational workloads, a Datalab can also provision optional document, cache, and vector stores:
spec:
documentStores:
prod:
storage: 1Gi
cacheStores:
prod:
storage: 1Gi
vectorStores:
prod:
storage: 1Gi
documentStoresprovisionsMongoDBCommunityresources (mongodbcommunity.mongodb.com/v1).cacheStoresprovisions Redis resources (redis.redis.opstreelabs.in/v1beta2).vectorStoresprovisionsQdrantClusterresources (qdrant.io/v1alpha1).- Access credentials are created as namespaced Secrets with predictable names:
- Mongo:
<store>-mongodb-auth(key:password) - Redis:
<store>-redis-auth(key:password) - Qdrant:
<store>-qdrant-auth(keys:apiKey,readApiKey)
Provider Datalab also exposes connection details through the generated <datalab>-datalab Secret, which sessions import as environment variables. For a store key such as prod, the session receives variables such as MONGO_PROD_URI, REDIS_PROD_URL, and QDRANT_PROD_URL, plus split fields for host, port, user, database, and credentials where applicable.
These stores should be visible platform resources, not ad-hoc services hidden inside a user terminal. The Datalab claim records that they exist. The underlying operators handle persistence, upgrades, monitoring, and backups. Before enabling them in production, check the backup and restore guarantees of your MongoDB, Redis, or Qdrant operator setup.
Docker Registry
A Datalab can optionally provide an in-session Docker registry:
spec:
registry:
enabled: true
storage: 3Gi
This is useful when users need to push and pull images inside the lab. Treat it as workspace-scoped registry storage. If registry contents need backup, retention, scanning, or promotion into a central registry, define that in platform policy before enabling it.
Authentication
Provider Datalab is a workspace building block. It can wire authentication into the runtime, but the cleaner production pattern is often to delegate authentication to the surrounding platform, especially at ingress.
Multiple options are possible:
- Enable built-in runtime authentication. By default,
auth.type = credentialsuses the same credentials that are used to access the connected object-storage buckets for session login. This is a simple basic-auth style option, but it ties workspace users to the credentials known by the Datalab runtime. - Set
auth.type = delegatedand let another platform component protect access before requests reach the workspace. This does not mean that unauthenticated access is required; it means authentication is delegated to another layer, such as the Kubernetes ingress controller.
Delegating authentication is often more flexible because users accessing a workspace do not necessarily have to exist in the same identity model used by the Datalab composition. For example, NGINX Ingress can call a shared oauth2-proxy, while APISIX can enforce OIDC directly with its openid-connect plugin, optionally combined with keycloak-authz or the OPA plugin for authorization.
Those controller-specific settings should be added by platform policy instead of being repeated in every Datalab. Kyverno is one option, but the same result can be achieved with a mutating admission webhook, GitOps post-processing, or any other automation that consistently targets the generated Educates Ingress resources. In all examples below, the Datalab environment keeps auth.type: delegated; the protection is established externally at the ingress layer.
Generated workshop session resources
For a Datalab named s-jane with a default session and ingress.domain: lab.acme.org, Educates creates session ingress hosts such as:
s-jane-default.lab.acme.org
editor-s-jane-default.lab.acme.org
s-jane-default-editor.lab.acme.org
data-s-jane-default.lab.acme.org
s-jane-default-data.lab.acme.org
The generated ingresses carry labels that are suitable for platform policy:
training.educates.dev/application: workshop
training.educates.dev/component: session
training.educates.dev/environment.name: s-jane
Provider Datalab also creates a Keycloak client named after the Datalab. For s-jane, the generated client includes redirect and web-origin entries for the workspace root and each declared session host:
https://s-jane.lab.acme.org/*
https://s-jane-default.lab.acme.org/*
https://editor-s-jane-default.lab.acme.org/*
https://s-jane-default-editor.lab.acme.org/*
https://data-s-jane-default.lab.acme.org/*
https://s-jane-default-data.lab.acme.org/*
http://localhost:*
This allows ingress-layer OIDC implementations, such as APISIX openid-connect, to reuse the Datalab-owned Keycloak client without an extra Keycloak mutation policy.
Shared delegated-auth environment configuration
Both nginx and APISIX examples use delegated auth at the Datalab layer. Change ingress.class to match the ingress controller you operate.
apiVersion: apiextensions.crossplane.io/v1beta1
kind: EnvironmentConfig
metadata:
name: datalab
data:
iam:
realm: acme
auth:
type: delegated
ingress:
class: apisix # use "nginx" for the nginx example
domain: lab.acme.org
secret: workspace-tls
storage:
endpoint: https://s3.acme.org
provider: Other
region: acme
force_path_style: "true"
secretNamespace: workspace
type: s3
The Educates installation must use the same ingress class, domain, and TLS secret. For example, an APISIX-based deployment would set:
clusterIngressDomain: lab.acme.org
clusterIngressClass: apisix
tlsCertificateRef:
name: workspace-tls
namespace: workspace
For TLS, use a wildcard certificate for the session domain:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: workspace-cert
namespace: workspace
spec:
dnsNames:
- "*.lab.acme.org"
issuerRef:
kind: ClusterIssuer
name: letsencrypt-dns-prod
secretName: workspace-tls
NGINX Ingress with oauth2-proxy
With NGINX Ingress, the usual pattern is an externally deployed oauth2-proxy instance and NGINX external-auth annotations on the generated workshop-session ingresses.
In this model, oauth2-proxy normally uses its own OAuth client, for example with redirect URI:
https://auth.lab.acme.org/oauth2/callback
The Datalab-generated Keycloak clients are still useful for direct OIDC ingress controllers, but a central oauth2-proxy does not need one client per Datalab unless you intentionally deploy it that way.
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: protect-workshop-sessions-nginx
spec:
admission: true
background: false
rules:
- name: add-oauth2-proxy-annotations
match:
any:
- resources:
kinds:
- Ingress
selector:
matchLabels:
training.educates.dev/application: workshop
training.educates.dev/component: session
preconditions:
all:
- key: "{{ request.object.spec.ingressClassName || '' }}"
operator: Equals
value: nginx
- key: "{{ (request.object.spec.rules || [])[?host != null && ends_with(host, '.lab.acme.org')] | length(@) }}"
operator: GreaterThan
value: 0
mutate:
patchStrategicMerge:
metadata:
annotations:
+(nginx.ingress.kubernetes.io/auth-url): "https://auth.lab.acme.org/oauth2/auth"
+(nginx.ingress.kubernetes.io/auth-signin): "https://auth.lab.acme.org/oauth2/start?rd=https://$host$escaped_request_uri"
+(nginx.ingress.kubernetes.io/auth-response-headers): "Authorization,X-Auth-Request-User,X-Auth-Request-Email,X-Auth-Request-Preferred-Username"
Configure oauth2-proxy with a cookie domain that covers the workshop hosts, for example .lab.acme.org, and restrict allowed redirect domains to the same boundary.
APISIX Ingress with openid-connect
With APISIX, the ingress controller can enforce OIDC directly. This pattern mirrors the EOEPCA deployment, adapted to acme.org.
Kyverno needs permission to create ApisixPluginConfig resources in the generated session namespaces:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kyverno:workspace-session-apisix-pluginconfigs
labels:
rbac.kyverno.io/aggregate-to-admission-controller: "true"
rbac.kyverno.io/aggregate-to-background-controller: "true"
rules:
- apiGroups:
- apisix.apache.org
resources:
- apisixpluginconfigs
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
The policy generates one APISIX plugin config per session namespace and annotates the matching workshop ingress to use it:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: protect-workshop-sessions-apisix
spec:
admission: true
background: false
rules:
- name: generate-apisix-oidc-plugin-config
match:
any:
- resources:
kinds:
- Ingress
selector:
matchLabels:
training.educates.dev/application: workshop
training.educates.dev/component: session
preconditions:
all:
- key: "{{ request.object.spec.ingressClassName || '' }}"
operator: Equals
value: apisix
- key: "{{ (request.object.spec.rules || [])[?host != null && ends_with(host, '.lab.acme.org')] | length(@) }}"
operator: GreaterThan
value: 0
- key: "{{ request.object.metadata.labels.\"training.educates.dev/environment.name\" || '' }}"
operator: NotEquals
value: ""
generate:
apiVersion: apisix.apache.org/v2
kind: ApisixPluginConfig
name: "workspace-oidc-{{ request.object.metadata.labels.\"training.educates.dev/environment.name\" }}"
namespace: "{{ request.namespace }}"
synchronize: false
data:
metadata:
labels:
training.educates.dev/application: workshop
training.educates.dev/component: session
training.educates.dev/environment.name: "{{ request.object.metadata.labels.\"training.educates.dev/environment.name\" }}"
spec:
plugins:
- name: openid-connect
enable: true
config:
discovery: "https://iam-auth.acme.org/realms/acme/.well-known/openid-configuration"
use_jwks: true
bearer_only: false
client_id: "{{ request.object.metadata.labels.\"training.educates.dev/environment.name\" }}"
client_secret: ""
session:
secret: "{{ random('[A-Za-z0-9]{32}') }}"
access_token_in_authorization_header: false
set_access_token_header: false
set_id_token_header: false
set_userinfo_header: false
set_refresh_token_header: false
- name: add-apisix-oidc-plugin-config
match:
any:
- resources:
kinds:
- Ingress
selector:
matchLabels:
training.educates.dev/application: workshop
training.educates.dev/component: session
preconditions:
all:
- key: "{{ request.object.spec.ingressClassName || '' }}"
operator: Equals
value: apisix
- key: "{{ (request.object.spec.rules || [])[?host != null && ends_with(host, '.lab.acme.org')] | length(@) }}"
operator: GreaterThan
value: 0
- key: "{{ request.object.metadata.labels.\"training.educates.dev/environment.name\" || '' }}"
operator: NotEquals
value: ""
mutate:
patchStrategicMerge:
metadata:
annotations:
+(k8s.apisix.apache.org/plugin-config-name): "workspace-oidc-{{ request.object.metadata.labels.\"training.educates.dev/environment.name\" }}"
The plugin uses the Datalab name as client_id, taken from training.educates.dev/environment.name. Because Provider Datalab creates the matching public Keycloak client and redirect URIs, no additional Keycloak mutation is required for declared sessions.
The session secret is generated when Kyverno creates the ApisixPluginConfig. synchronize: false keeps the generated object stable; if you intentionally change the plugin template for existing sessions, recreate the generated plugin config or restart the session so Kyverno can generate a fresh one.
Token and userinfo forwarding flags are disabled by default. Enable them only when the upstream workspace application explicitly needs those headers.
Other ingress controllers follow the same delegated pattern: set auth.type: delegated, match the generated workshop session ingresses by label and domain, and attach the controller-specific authentication policy.
Full example manifests are available in the repository:
- examples/ingress-protection/nginx-oauth2-proxy-workshop-session-protection.yaml
- examples/ingress-protection/apisix-workshop-session-protection.yaml
Keycloak-managed access is supported. When it is used, the composition automatically provisions the Keycloak client, groups, roles, role bindings, and memberships needed for the workspace.
Files and the Workshop Tab
The spec.files array is optional.
- When empty or omitted, no workshop tab is rendered in the Educates UI.
- When at least one source is defined, workshop and/or data content is mounted and the tab is enabled.
Supported sources:
- OCI image (
spec.files[].image) - Git repository (
spec.files[].git) - HTTP(S) download (
spec.files[].http)
Filters (includePaths, excludePaths, newRootPath, path) control what ends up visible.
vcluster toggle
spec.vcluster is a boolean flag.
- true → the datalab provisions a vcluster for runtime isolation.
- false → workloads run directly in the namespace.
Storage Secret
A Datalab requires credentials to an S3-compatible storage system. Provider Datalab does not create the bucket. Create it manually, through your platform process, or with Provider Storage.
Provider Datalab reads the credentials from a Kubernetes Secret named via spec.secretName, or by the Datalab name when spec.secretName is omitted. The Secret lives in EnvironmentConfig.data.storage.secretNamespace, which is usually the same namespace as the Datalab claim.
This secret must include at least AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. The endpoint and provider are defined in EnvironmentConfig.data.storage.
Security and Access Policy
The spec.security section controls access permissions and runtime privilege level for sessions.
Key fields:
policy— defines Pod Security Standard (restricted,baseline,privileged).privilegedenables Docker-in-Docker with 20 Gi of local storage.kubernetesAccess— whether a Kubernetes service account token is mounted inside the session.kubernetesRole— defines in-namespace RBAC level (admin,edit,view).
Resource Quotas
The spec.quota section allows per-Datalab overrides of default compute and storage budgets.
memory— memory allocation per session (default 2 Gi).storage— persistent volume size (default 1 Gi).budget— Educates resource budget profile (small,medium,large,x-large, etc.).
When unspecified, defaults from the EnvironmentConfig apply.
| Budget | CPU | Memory |
|---|---|---|
| small | 1000m | 1Gi |
| medium | 2000m | 2Gi |
| large | 4000m | 4Gi |
| x-large | 8000m | 8Gi |
| xx-large | 8000m | 12Gi |
| xxx-large | 8000m | 16Gi |
Identity and Keycloak Resources
When Keycloak-managed access is used, users listed under spec.users must already exist in Keycloak.
When a Datalab is created for that pattern, the composition automatically provisions the required Keycloak resources:
- Groups for the datalab and datalab administrators
- Group memberships for the listed users
- A dedicated OAuth2 client
- User and admin roles, plus the role bindings for the generated groups
This ensures that authentication and authorization are consistently enforced across the runtime and UI. If authentication is delegated to the ingress or another platform component, the identities allowed through that outer layer are managed by that component and do not necessarily have to be users in the Datalab Keycloak realm.
Example: Joe (no session by default)
# Joe gets a personal datalab s-joe with no pre-created session.
# He must explicitly declare and start a session himself; nothing is running by default.
# No vcluster is provisioned and no workshop files are attached.
# Credentials to storage are expected to exist in a secret "s-joe" in the same namespace.
# A Keycloak group, role, and client are created; user "joe" must exist in Keycloak.
apiVersion: pkg.internal/v1beta2
kind: Datalab
metadata:
name: s-joe
spec:
users:
- joe
secretName: s-joe
- Joe’s Datalab exists but is idle until he launches a session.
- Useful for lightweight, on-demand environments.
- Keycloak ensures Joe is authorized to access his workspace.
Example: Jeff, Jim, and Jane (shared store validation, privileged with Docker)
# Jeff (owner), Jim (admin) and Jane (user) share a datalab s-jeff with no pre-created session.
# This is the canonical shared store-validation example: the lab stays sessionless by default.
# The lab does not use a vcluster and has no workshop files.
# Credentials to storage are expected to exist in a secret "s-jeff" in the same namespace.
# A Keycloak group, role, and client are created; users "jeff", "jim" and "jane" must exist in Keycloak.
# This configuration runs the lab in privileged mode:
# - Security policy: "privileged" → automatically enables Docker with 20 Gi workspace storage.
# - Docker registry is disabled for this shared example.
# - Session quota: increased to 6 Gi memory, 1 Gi storage, budget class "x-large".
# - Kubernetes API access is disabled (kubernetesAccess=false).
# The data component for the object storage mount and browser UI is disabled.
# Additionally, two PostgreSQL databases are provisioned for the lab: "prod" and "dev".
# Additionally, one MongoDB-backed document store is provisioned:
# - prod with 1 Gi storage
# Additionally, one Redis-backed cache store is provisioned:
# - prod with 1 Gi storage
# Additionally, one Qdrant-backed vector store is provisioned:
# - prod with 1 Gi storage
# Access credentials are generated as secrets in the runtime namespace:
# - MongoDB: <store>-mongodb-auth
# - Redis: <store>-redis-auth
# - Qdrant: <store>-qdrant-auth
apiVersion: pkg.internal/v1beta2
kind: Datalab
metadata:
name: s-jeff
spec:
users:
- jeff
- jim
- jane
userOverrides:
jim:
grantedAt: "2025-01-10T19:00:00Z"
role: admin
secretName: s-jeff
sessions: []
vcluster: false
data:
enabled: false
quota:
memory: 6Gi
storage: 1Gi
budget: x-large
files: []
security:
policy: privileged
kubernetesAccess: false
registry:
enabled: false
storage: 3Gi
documentStores:
prod:
storage: 1Gi
cacheStores:
prod:
storage: 1Gi
vectorStores:
prod:
storage: 1Gi
databases:
pg0:
names:
- dev
- prod
storage: 1Gi
backupStorage: 3Gi
- No
WorkshopSessionis pre-created for this shared example. The runtime namespace and backing services can be validated without a session pod. - Runs in privileged mode with Docker support and increased ephemeral disk (20 Gi).
- No Kubernetes API access is granted inside the environment. The shared example leaves the registry disabled.
- Access is secured through the corresponding Keycloak group and role.
Example: Jane (isolated vcluster with admin role and higher quota)
# Jane runs a datalab s-jane with a default session automatically created.
# That session will run permanently until stopped by the operator,
# and a dedicated vcluster is provisioned for runtime isolation.
# No workshop files are attached. Credentials to storage are expected
# to exist in a secret "s-jane" in the same namespace.
# A Keycloak group, role, and client are created; user "jane" must exist in Keycloak.
# This configuration explicitly overrides default resource quotas and security settings:
# - Security policy: "privileged" → automatically enables Docker with 20 Gi workspace storage.
# - Docker registry is enabled with 3 Gi storage.
# - Session quota: increased to 4 Gi memory, 40 Gi storage, budget class "x-large".
# - Kubernetes role: elevated to "admin" for full namespace permissions.
# The data component for the object storage mount and browser UI is configured as readonly.
# Additionally, one PostgreSQL database is provisioned for the lab: "analytics".
apiVersion: pkg.internal/v1beta2
kind: Datalab
metadata:
name: s-jane
spec:
users:
- jane
secretName: s-jane
sessions:
- name: default
state: started
vcluster: true
data:
readOnlyMount: true
quota:
memory: 4Gi
storage: 40Gi
budget: x-large
registry:
enabled: true
storage: 3Gi
security:
policy: privileged
kubernetesRole: admin
databases:
pg0:
names:
- analytics
storage: 1Gi
backupStorage: 3Gi
- Jane’s workloads run inside an isolated virtual cluster (
vcluster: true). - The lab also runs in privileged mode, which enables Docker with 20 Gi of session-local workspace storage.
- The admin role grants full control within her namespace/vcluster.
- This is the registry-enabled example, so session-backed registry behavior can be validated here.
- Suitable for advanced development or testing requiring full Kubernetes control.
- Keycloak enforces role-based access protection for this lab.
Example: John (with Git-based workshop files)
# John has a datalab s-john with a default session automatically created.
# That session will run permanently until stopped by the operator.
# No vcluster is provisioned. Workshop and data files are pulled from Git,
# enabling the workshop tab in the Educates UI.
# The analysis session is declared but stopped, so it keeps its workspace PVC
# without creating a runtime.
# Credentials to storage are expected in a secret "s-john" in the same namespace.
# A Keycloak group, role, and client are created; user "john" must exist in Keycloak.
apiVersion: pkg.internal/v1beta2
kind: Datalab
metadata:
name: s-john
spec:
users:
- john
secretName: s-john
sessions:
- name: default
- name: analysis
state: stopped
vcluster: false
files:
- git:
url: https://github.com/versioneer-tech/datalab-example
ref: origin/main
includePaths:
- /workshop/**
- /data/**
- /README.md
path: .
- Preloads workshop materials from Git.
- Activates the workshop tab in the UI for guided exercises.
- Keycloak ensures only John has access to this environment and tooling.
Verifying Provisioning
Once a Datalab claim has been applied, you can verify that the provisioning worked.
Check Composite Status
kubectl get datalabs -n workspace
You should see all Datalabs READY=True once reconciliation is complete:
NAME SYNCED READY COMPOSITION AGE
s-joe True True datalab-educates 2m
s-jeff True True datalab-educates 2m
s-jane True True datalab-educates 2m
s-john True True datalab-educates 2m
Inspect details:
kubectl describe datalab s-jeff -n workspace
Look for conditions like Ready=True and any event messages.
Find the Storage Secret
Each Datalab references a storage Secret named by spec.secretName, or by the Datalab name when spec.secretName is omitted. The Secret lives in EnvironmentConfig.data.storage.secretNamespace, which is usually the same namespace as the Datalab claim.
For example, the claim s-jeff with secretName: s-jeff requires a Secret named s-jeff.
kubectl get secret s-jeff -n workspace -o yaml
Decode credentials (AWS-style):
kubectl get secret s-jeff -n workspace -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d; echo
kubectl get secret s-jeff -n workspace -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 -d; echo
Connect to Databases
Starting with version 0.3.0, databases can be provisioned on a dedicated PostgreSQL host. Optionally, these databases can also be exposed externally using a TLSRoute, enabling secure access from outside the cluster. External exposure requires a Kubernetes Gateway Controller that operates at Layer-4, such as Envoy.
All additional users are created as regular database roles with limited privileges. Full administrative access is provided through the built-in postgres superuser account. This account can create extensions, manage schemas, and grant permissions to other users as needed.
Database credentials are managed by the PostgreSQL operator and stored as Kubernetes Secrets. To locate the credentials for database users, look for Secrets matching:
*-pguser-*. These Secrets contain the connection details required to authenticate against the corresponding PostgreSQL roles.
Summary
- A
Datalabdefines users, sessions, optional vcluster, quotas, and security policies. - A
Datalabcan also define platform-managed databases, document stores, key-value/cache stores, vector stores, and registry storage. - Security controls combine Pod Security Standards, Kubernetes roles, and Docker privilege toggles.
- Each Datalab requires a storage credential Secret.
- Durable data services remain visible to operators, which is the basis for backup, restore, monitoring, and lifecycle responsibility.
- Object-storage buckets are created outside Provider Datalab, for example with Provider Storage; Provider Datalab consumes the resulting credentials.
- For Keycloak-managed access, users must already exist in Keycloak; the Datalab provisions groups, memberships, a client, roles, and role bindings.
- For delegated access,
auth.type = delegatedleaves authentication to the ingress layer or another platform component. - Sessions may be started for live work or stopped while keeping their workspace PVC.
- Workshop files enable the Educates UI workshop tab.
- Check
kubectl get datalabsfor readiness and confirm Secret and Keycloak resource creation where applicable.