Welcome to Provider Datalab
The Provider Datalab package turns Kubernetes into a platform for collaborative, cloud-native workspaces, built on Crossplane v2. It gives end users self-service access to reproducible environments for coding, data exploration, and analysis — and it gives operators a unified control plane to provision and secure those environments at scale.
It also works seamlessly with Provider Storage, enabling S3-compatible storage provisioning and integration directly into Datalabs.
Instead of hand-crafting Jupyter or Educates deployments, every workspace is declared through a single Kubernetes Custom Resource: the Datalab claim. This claim captures who should have access, whether a virtual cluster is needed, what sessions should run, and what files or datasets should be preloaded — while Crossplane and the compositions take care of provisioning all the moving parts.
For end users, this means:
- Launch personal or shared analysis environments with one manifest.
- Get preconfigured access to storage, credentials, and workshop material.
- Work inside familiar tools like VS Code Server, JupyterLab, or terminals, bundled with utilities such as
awscliandrclone.
For operators, this means:
- A consistent, declarative model for managing heterogeneous runtime stacks.
- Automated provisioning of vclusters, optional identity integration via Keycloak, and storage connections.
- Extensibility to plug in additional runtimes or policies without changing the user-facing API.
At its core, Provider Datalab provides:
- A Datalab Composite Resource Definition (XRD)
- Compositions powered by Crossplane v2 to provision environments with storage, sessions, vclusters, and identity wiring
- Authentication and access-control integration, either built into the runtime or delegated to the platform ingress layer
With Provider Datalab, workspaces become declarative, multi-tenant, and self-service, while operators retain full control over identity, security, and resource governance.
Features
- Workspace abstraction
Define and provision full-featured data labs based on Educates or Jupyter as a single resource. - Multi-tenant support
Each Datalab can run isolated inside a Kubernetes namespace or in a dedicated virtual cluster (vcluster). - Integrated or delegated identity
Use Keycloak-managed workspace access where appropriate, or keep runtime auth disabled and delegate authentication to the platform ingress layer. - Declarative storage
Provision and attach buckets with access policies. - Extensible by design
Built on Crossplane, ready to extend with new resources.
Installation
To install the configuration package into your Crossplane environment, e.g. based on Educates, use:
apiVersion: pkg.crossplane.io/v1
kind: Configuration
metadata:
name: datalab-educates
spec:
package: ghcr.io/versioneer-tech/provider-datalab/educates0.5.0-rc.3
skipDependencyResolution: true
Quickstart
Minimal Example
apiVersion: pkg.internal/v1beta1
kind: Datalab
metadata:
name: team-wonderland
spec:
users:
- alice
sessions:
- default
vcluster: true
This provisions a vcluster within a dedicated Kubernetes namespace and starts the Educates tooling stack (including VS Code Server and a terminal), together with bundled utilities. A storage browser is available with storage automatically mounted, and additional tools such as awscli and rclone are preinstalled to support typical data lab tasks like coding, data exploration, and wrangling.
Access to the datalab is intended for Alice, since she currently is the only user associated with this lab. Depending on the platform configuration, access can be enforced by Keycloak-managed resources or by delegated ingress authentication.
Combined with a small, cluster-specific EnvironmentConfig (realm, ingress domain/class, storage secret), the platform handles the rest—provisioning the chosen runtime, mounting credentials, and preloading content.
Note
All configuration packages built from provider-datalab (educates, jupyter,...) share the same Composite Resource Definition!
More Examples
Check the examples folder in the GitHub repository for complete scenarios, including: - Datalabs with multiple users - Datalabs with integrated storage - Identity-aware environments