Deploy Nebari on Existing Kubernetes Clusters

Nebari can be deployed on existing Kubernetes clusters across major cloud providers (AWS EKS, Azure AKS, Google Cloud GKE) or custom Kubernetes installations. This guide walks you through the process step by step.

info

For bare metal deployments using K3s, see Deploy Nebari on Bare Metal with K3s.

Prerequisites

Before starting, ensure you have:

An existing Kubernetes cluster (EKS, AKS, GKE, or custom)
kubectl configured with access to your cluster
Nebari CLI installed (installation guide)
Appropriate node groups/pools with sufficient resources
DNS domain for your Nebari deployment

Overview

The deployment process follows these general steps:

Evaluate your existing infrastructure
Create/verify appropriate node groups
Configure kubectl context
Initialize Nebari configuration
Configure Nebari for your cluster
Deploy Nebari

Let's walk through this process for each cloud provider.

Evaluating the Infrastructure

Before deploying Nebari, review your existing cluster to ensure it meets the requirements.

AWS
Azure
GCP

AWS EKS Requirements

For this example, we assume you have an existing EKS cluster. If you need to create one, follow AWS's EKS setup guide.

Your existing EKS cluster should have:

VPC with at least three subnets in different Availability Zones
Subnets configured to automatically assign public IP addresses
IAM Role with the following policies:
- AmazonEKSWorkerNodePolicy
- AmazonEC2ContainerRegistryReadOnly
- AmazonEKS_CNI_Policy

Additionally, for cluster autoscaling support, ensure the IAM role has the custom policy below:

Custom CNI and Autoscaling Policy (Click to expand)

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "eksWorkerAutoscalingAll",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeLaunchTemplateVersions",
                "autoscaling:DescribeTags",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeAutoScalingGroups"
            ],
            "Resource": "*"
        },
        {
            "Sid": "eksWorkerAutoscalingOwn",
            "Effect": "Allow",
            "Action": [
                "autoscaling:UpdateAutoScalingGroup",
                "autoscaling:TerminateInstanceInAutoScalingGroup",
                "autoscaling:SetDesiredCapacity"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "autoscaling:ResourceTag/k8s.io/cluster-autoscaler/enabled": [
                        "true"
                    ],
                    "autoscaling:ResourceTag/kubernetes.io/cluster/<your-cluster-name>": [
                        "owned"
                    ]
                }
            }
        }
    ]
}

Minimum Node Requirements:

General nodes: 8 vCPU / 32 GB RAM (e.g., t3.2xlarge)
User/Worker nodes: 4 vCPU / 16 GB RAM (e.g., t3.xlarge)
Storage: 200 GB EBS volume per node

Azure AKS Requirements

For Azure AKS deployments, you need an existing AKS cluster. If you don't have one, follow Azure's AKS setup guide.

Your existing AKS cluster should have:

Resource Group containing the AKS cluster
Virtual Network with appropriate subnet sizing
Service Principal or Managed Identity with required permissions:
- Azure Kubernetes Service RBAC Cluster Admin
- Contributor role on the resource group

Minimum Node Requirements:

General nodes: 8 vCPU / 32 GB RAM (e.g., Standard_D8s_v3)
User/Worker nodes: 4 vCPU / 16 GB RAM (e.g., Standard_D4s_v3)
Storage: 200 GB managed disk per node

Network Requirements:

Ensure your AKS cluster has a public IP or Load Balancer for ingress
Configure Network Security Groups (NSGs) to allow HTTPS traffic (port 443)

To check your current AKS configuration:

# List your AKS clusters
az aks list --output table

# Get cluster credentials
az aks get-credentials --resource-group <resource-group> --name <cluster-name>

# Verify node pools
az aks nodepool list --resource-group <resource-group> --cluster-name <cluster-name>

Google Cloud GKE Requirements

For GKE deployments, you need an existing GKE cluster. If you need to create one, follow Google's GKE setup guide.

Your existing GKE cluster should have:

VPC Network with appropriate subnet configuration
Service Account with required permissions:
- Kubernetes Engine Admin
- Service Account User
- Compute Admin (for node management)

Minimum Node Requirements:

General nodes: 8 vCPU / 32 GB RAM (e.g., n2-standard-8)
User/Worker nodes: 4 vCPU / 16 GB RAM (e.g., n2-standard-4)
Storage: 200 GB persistent disk per node

Network Requirements:

Cluster should have HTTP(S) Load Balancing enabled
Firewall rules allowing ingress on port 443

To check your current GKE configuration:

# List your GKE clusters
gcloud container clusters list

# Get cluster credentials
gcloud container clusters get-credentials <cluster-name> --zone <zone>

# Verify node pools
gcloud container node-pools list --cluster <cluster-name> --zone <zone>

Creating Node Groups

Nebari requires three types of node groups for optimal operation:

general: Core Nebari services (JupyterHub, monitoring, databases)
user: User JupyterLab notebook servers
worker: Dask distributed computing workers

Skip this step if appropriate node groups already exist.

AWS
Azure
GCP

Creating EKS Node Groups

Follow AWS's guide to create managed node groups.

General Node Group:

Name: general
Node IAM Role: The IAM role with policies described above
Instance type: t3.2xlarge or similar (8 vCPU / 32 GB RAM)
Disk size: 200 GB
Scaling: Min 1, Max 3, Desired 1
Subnets: Include all EKS subnets

User Node Group:

Name: user
Instance type: t3.xlarge or similar (4 vCPU / 16 GB RAM)
Disk size: 200 GB
Scaling: Min 0, Max 10, Desired 1
Enable autoscaling: Yes

Worker Node Group:

Name: worker
Instance type: t3.xlarge or similar (4 vCPU / 16 GB RAM)
Disk size: 200 GB
Scaling: Min 0, Max 20, Desired 1
Enable autoscaling: Yes

Using AWS CLI:

# Create general node group
aws eks create-nodegroup \
  --cluster-name <cluster-name> \
  --nodegroup-name general \
  --node-role <iam-role-arn> \
  --subnets <subnet-id-1> <subnet-id-2> <subnet-id-3> \
  --scaling-config minSize=1,maxSize=3,desiredSize=1 \
  --instance-types t3.2xlarge \
  --disk-size 200 \
  --labels nodegroup=general

# Create user node group
aws eks create-nodegroup \
  --cluster-name <cluster-name> \
  --nodegroup-name user \
  --node-role <iam-role-arn> \
  --subnets <subnet-id-1> <subnet-id-2> <subnet-id-3> \
  --scaling-config minSize=0,maxSize=10,desiredSize=1 \
  --instance-types t3.xlarge \
  --disk-size 200 \
  --labels nodegroup=user

# Create worker node group
aws eks create-nodegroup \
  --cluster-name <cluster-name> \
  --nodegroup-name worker \
  --node-role <iam-role-arn> \
  --subnets <subnet-id-1> <subnet-id-2> <subnet-id-3> \
  --scaling-config minSize=0,maxSize=20,desiredSize=1 \
  --instance-types t3.xlarge \
  --disk-size 200 \
  --labels nodegroup=worker

Creating AKS Node Pools

Follow Azure's guide to add node pools.

General Node Pool:

Name: general
VM Size: Standard_D8s_v3 (8 vCPU / 32 GB RAM)
Disk size: 200 GB
Count: Min 1, Max 3
Labels: nodepool=general

User Node Pool:

Name: user
VM Size: Standard_D4s_v3 (4 vCPU / 16 GB RAM)
Disk size: 200 GB
Count: Min 0, Max 10
Enable autoscaling: Yes
Labels: nodepool=user

Worker Node Pool:

Name: worker
VM Size: Standard_D4s_v3 (4 vCPU / 16 GB RAM)
Disk size: 200 GB
Count: Min 0, Max 20
Enable autoscaling: Yes
Labels: nodepool=worker

Using Azure CLI:

RESOURCE_GROUP="<resource-group>"
CLUSTER_NAME="<cluster-name>"

# Create general node pool
az aks nodepool add \
  --resource-group $RESOURCE_GROUP \
  --cluster-name $CLUSTER_NAME \
  --name general \
  --node-count 1 \
  --min-count 1 \
  --max-count 3 \
  --enable-cluster-autoscaler \
  --node-vm-size Standard_D8s_v3 \
  --node-osdisk-size 200 \
  --labels nodepool=general

# Create user node pool
az aks nodepool add \
  --resource-group $RESOURCE_GROUP \
  --cluster-name $CLUSTER_NAME \
  --name user \
  --node-count 1 \
  --min-count 0 \
  --max-count 10 \
  --enable-cluster-autoscaler \
  --node-vm-size Standard_D4s_v3 \
  --node-osdisk-size 200 \
  --labels nodepool=user

# Create worker node pool
az aks nodepool add \
  --resource-group $RESOURCE_GROUP \
  --cluster-name $CLUSTER_NAME \
  --name worker \
  --node-count 1 \
  --min-count 0 \
  --max-count 20 \
  --enable-cluster-autoscaler \
  --node-vm-size Standard_D4s_v3 \
  --node-osdisk-size 200 \
  --labels nodepool=worker

Creating GKE Node Pools

Follow Google's guide to add node pools.

General Node Pool:

Name: general
Machine type: n2-standard-8 (8 vCPU / 32 GB RAM)
Disk size: 200 GB
Count: Min 1, Max 3
Labels: nodepool=general

User Node Pool:

Name: user
Machine type: n2-standard-4 (4 vCPU / 16 GB RAM)
Disk size: 200 GB
Count: Min 0, Max 10
Enable autoscaling: Yes
Labels: nodepool=user

Worker Node Pool:

Name: worker
Machine type: n2-standard-4 (4 vCPU / 16 GB RAM)
Disk size: 200 GB
Count: Min 0, Max 20
Enable autoscaling: Yes
Labels: nodepool=worker

Using gcloud CLI:

CLUSTER_NAME="<cluster-name>"
ZONE="<zone>"  # e.g., us-central1-a

# Create general node pool
gcloud container node-pools create general \
  --cluster=$CLUSTER_NAME \
  --zone=$ZONE \
  --machine-type=n2-standard-8 \
  --disk-size=200 \
  --num-nodes=1 \
  --min-nodes=1 \
  --max-nodes=3 \
  --enable-autoscaling \
  --node-labels=nodepool=general

# Create user node pool
gcloud container node-pools create user \
  --cluster=$CLUSTER_NAME \
  --zone=$ZONE \
  --machine-type=n2-standard-4 \
  --disk-size=200 \
  --num-nodes=1 \
  --min-nodes=0 \
  --max-nodes=10 \
  --enable-autoscaling \
  --node-labels=nodepool=user

# Create worker node pool
gcloud container node-pools create worker \
  --cluster=$CLUSTER_NAME \
  --zone=$ZONE \
  --machine-type=n2-standard-4 \
  --disk-size=200 \
  --num-nodes=1 \
  --min-nodes=0 \
  --max-nodes=20 \
  --enable-autoscaling \
  --node-labels=nodepool=worker

Configuring kubectl Context

Ensure you're using your cluster's kubectl context. Verify with:

kubectl config current-context

If you need to switch contexts:

kubectl config use-context <context-name>

To list all available contexts:

kubectl config get-contexts

Deploying Nebari

Now you're ready to initialize and deploy Nebari on your existing cluster.

AWS
Azure
GCP

Initialize Nebari Configuration

Initialize Nebari using the existing provider:

nebari init existing \
  --project <project_name> \
  --domain <domain_name> \
  --auth-provider github

This creates a nebari-config.yaml file in your current directory.

Configure nebari-config.yaml

Update the configuration file with your EKS-specific settings. The key sections to modify are:

project_name: <project_name>
provider: existing
domain: <domain_name>

certificate:
  type: lets-encrypt
  acme_email: admin@example.com

security:
  authentication:
    type: GitHub
    config:
      client_id: <github-oauth-client-id>
      client_secret: <github-oauth-client-secret>
      oauth_callback_url: https://<domain_name>/hub/oauth_callback

local:
  # Set this to your EKS cluster context name
  kube_context: arn:aws:eks:<region>:<account-id>:cluster/<cluster-name>

  # Configure node selectors based on your node group labels
  node_selectors:
    general:
      key: eks.amazonaws.com/nodegroup
      value: general
    user:
      key: eks.amazonaws.com/nodegroup
      value: user
    worker:
      key: eks.amazonaws.com/nodegroup
      value: worker

profiles:
  jupyterlab:
    - display_name: Small Instance
      description: 2 CPU / 8 GB RAM
      default: true
      kubespawner_override:
        cpu_limit: 2
        cpu_guarantee: 1.5
        mem_limit: 8G
        mem_guarantee: 5G

    - display_name: Medium Instance
      description: 4 CPU / 16 GB RAM
      kubespawner_override:
        cpu_limit: 4
        cpu_guarantee: 3
        mem_limit: 16G
        mem_guarantee: 10G

  dask_worker:
    Small Worker:
      worker_cores_limit: 2
      worker_cores: 1.5
      worker_memory_limit: 8G
      worker_memory: 5G
      worker_threads: 2

    Medium Worker:
      worker_cores_limit: 4
      worker_cores: 3
      worker_memory_limit: 16G
      worker_memory: 10G
      worker_threads: 4

Complete example nebari-config.yaml for EKS (Click to expand)

project_name: my-nebari
provider: existing
domain: nebari.example.com

certificate:
  type: lets-encrypt
  acme_email: admin@example.com

security:
  authentication:
    type: GitHub
    config:
      client_id: your-github-client-id
      client_secret: your-github-client-secret
      oauth_callback_url: https://nebari.example.com/hub/oauth_callback

ci_cd:
  type: github-actions
  branch: main

terraform_state:
  type: remote

namespace: dev

local:
  kube_context: arn:aws:eks:us-west-2:123456789012:cluster/my-eks-cluster
  node_selectors:
    general:
      key: eks.amazonaws.com/nodegroup
      value: general
    user:
      key: eks.amazonaws.com/nodegroup
      value: user
    worker:
      key: eks.amazonaws.com/nodegroup
      value: worker

profiles:
  jupyterlab:
    - display_name: Small Instance
      description: 2 CPU / 8 GB RAM
      default: true
      kubespawner_override:
        cpu_limit: 2
        cpu_guarantee: 1.5
        mem_limit: 8G
        mem_guarantee: 5G
        image: quansight/nebari-jupyterlab:latest

    - display_name: Medium Instance
      description: 4 CPU / 16 GB RAM
      kubespawner_override:
        cpu_limit: 4
        cpu_guarantee: 3
        mem_limit: 16G
        mem_guarantee: 10G
        image: quansight/nebari-jupyterlab:latest

  dask_worker:
    Small Worker:
      worker_cores_limit: 2
      worker_cores: 1.5
      worker_memory_limit: 8G
      worker_memory: 5G
      worker_threads: 2
      image: quansight/nebari-dask-worker:latest

    Medium Worker:
      worker_cores_limit: 4
      worker_cores: 3
      worker_memory_limit: 16G
      worker_memory: 10G
      worker_threads: 4
      image: quansight/nebari-dask-worker:latest

environments:
  environment-default.yaml:
    name: default
    channels:
      - conda-forge
    dependencies:
      - python=3.11
      - ipykernel
      - ipywidgets

Deploy Nebari

Deploy Nebari to your EKS cluster:

nebari deploy --config nebari-config.yaml

When prompted, update your DNS records to point your domain to the cluster's load balancer. Nebari will provide the necessary DNS configuration details during deployment.

Initialize Nebari Configuration

Initialize Nebari using the existing provider:

nebari init existing \
  --project <project_name> \
  --domain <domain_name> \
  --auth-provider github

Configure nebari-config.yaml

Update the configuration file with your AKS-specific settings:

project_name: <project_name>
provider: existing
domain: <domain_name>

certificate:
  type: lets-encrypt
  acme_email: admin@example.com

security:
  authentication:
    type: GitHub
    config:
      client_id: <github-oauth-client-id>
      client_secret: <github-oauth-client-secret>
      oauth_callback_url: https://<domain_name>/hub/oauth_callback

local:
  # Set this to your AKS cluster context name
  kube_context: <cluster-name>  # e.g., "my-aks-cluster"

  # Configure node selectors based on your node pool labels
  node_selectors:
    general:
      key: agentpool
      value: general
    user:
      key: agentpool
      value: user
    worker:
      key: agentpool
      value: worker

profiles:
  jupyterlab:
    - display_name: Small Instance
      description: 2 CPU / 8 GB RAM
      default: true
      kubespawner_override:
        cpu_limit: 2
        cpu_guarantee: 1.5
        mem_limit: 8G
        mem_guarantee: 5G

    - display_name: Medium Instance
      description: 4 CPU / 16 GB RAM
      kubespawner_override:
        cpu_limit: 4
        cpu_guarantee: 3
        mem_limit: 16G
        mem_guarantee: 10G

  dask_worker:
    Small Worker:
      worker_cores_limit: 2
      worker_cores: 1.5
      worker_memory_limit: 8G
      worker_memory: 5G
      worker_threads: 2

    Medium Worker:
      worker_cores_limit: 4
      worker_cores: 3
      worker_memory_limit: 16G
      worker_memory: 10G
      worker_threads: 4

AKS Node Selectors

AKS uses agentpool as the label key for node pools by default. If you used custom labels when creating your node pools with --labels, adjust the node_selectors accordingly.

Deploy Nebari

nebari deploy --config nebari-config.yaml

Update your DNS records when prompted. You'll need to point your domain to the Azure Load Balancer IP address created by Nebari.

Initialize Nebari Configuration

Initialize Nebari using the existing provider:

nebari init existing \
  --project <project_name> \
  --domain <domain_name> \
  --auth-provider github

Configure nebari-config.yaml

Update the configuration file with your GKE-specific settings:

project_name: <project_name>
provider: existing
domain: <domain_name>

certificate:
  type: lets-encrypt
  acme_email: admin@example.com

security:
  authentication:
    type: GitHub
    config:
      client_id: <github-oauth-client-id>
      client_secret: <github-oauth-client-secret>
      oauth_callback_url: https://<domain_name>/hub/oauth_callback

local:
  # Set this to your GKE cluster context name
  kube_context: gke_<project-id>_<zone>_<cluster-name>

  # Configure node selectors based on your node pool labels
  node_selectors:
    general:
      key: cloud.google.com/gke-nodepool
      value: general
    user:
      key: cloud.google.com/gke-nodepool
      value: user
    worker:
      key: cloud.google.com/gke-nodepool
      value: worker

profiles:
  jupyterlab:
    - display_name: Small Instance
      description: 2 CPU / 8 GB RAM
      default: true
      kubespawner_override:
        cpu_limit: 2
        cpu_guarantee: 1.5
        mem_limit: 8G
        mem_guarantee: 5G

    - display_name: Medium Instance
      description: 4 CPU / 16 GB RAM
      kubespawner_override:
        cpu_limit: 4
        cpu_guarantee: 3
        mem_limit: 16G
        mem_guarantee: 10G

  dask_worker:
    Small Worker:
      worker_cores_limit: 2
      worker_cores: 1.5
      worker_memory_limit: 8G
      worker_memory: 5G
      worker_threads: 2

    Medium Worker:
      worker_cores_limit: 4
      worker_cores: 3
      worker_memory_limit: 16G
      worker_memory: 10G
      worker_threads: 4

GKE Node Selectors

GKE automatically applies the cloud.google.com/gke-nodepool label to nodes based on their node pool name. If you used custom labels with --node-labels, adjust the node_selectors accordingly.

Deploy Nebari

nebari deploy --config nebari-config.yaml

Update your DNS records when prompted. You'll need to point your domain to the GCP Load Balancer IP address created by Nebari.

Important Configuration Notes

Understanding kubernetes_context

The kube_context field in your nebari-config.yaml is critical—it tells Nebari which Kubernetes cluster to deploy to. This must exactly match a context name from your kubeconfig.

To find your context name:

kubectl config get-contexts

The output shows all available contexts. Use the value from the NAME column:

CURRENT   NAME                                                  CLUSTER                      AUTHINFO
*         arn:aws:eks:us-west-2:123456789:cluster/my-cluster   arn:aws:eks:...             arn:aws:eks:...
          gke_my-project_us-central1_my-cluster                gke_my-project_...          gke_my-project_...
          my-aks-cluster                                        my-aks-cluster              clusterUser_...

Node Selectors

Node selectors ensure Nebari components are scheduled on the appropriate nodes:

general: Core services (JupyterHub, Prometheus, etc.) - require stable, always-on nodes
user: User notebook servers - benefit from autoscaling
worker: Dask workers - benefit from aggressive autoscaling for compute workloads

The node selector keys vary by provider:

AWS EKS: eks.amazonaws.com/nodegroup
Azure AKS: agentpool (default) or custom labels
GCP GKE: cloud.google.com/gke-nodepool (default) or custom labels

You can verify node labels with:

kubectl get nodes --show-labels

Verifying the Deployment

After deployment completes:

Check pods are running:
```
kubectl get pods -A
```
Verify ingress is configured:
```
kubectl get ingress -A
```
Check services:
```
kubectl get svc -A
```
Access Nebari: Navigate to https://<your-domain> in your browser

Troubleshooting

Pods Stuck in Pending

If pods remain in Pending state:

kubectl describe pod <pod-name> -n <namespace>

Common causes:

Node selector mismatch: Labels in nebari-config.yaml don't match actual node labels
Insufficient resources: Nodes don't have enough CPU/memory
No nodes available: Node group/pool hasn't scaled up yet

Authentication Issues

If you can't log in to Nebari:

Verify OAuth application credentials in your nebari-config.yaml
Check the callback URL matches exactly: https://<domain>/hub/oauth_callback

Review JupyterHub logs:

kubectl logs -n <namespace> deployment/hub -f

LoadBalancer Service Pending

If the LoadBalancer service stays in Pending:

AWS EKS:

Verify subnets are tagged correctly for load balancer provisioning
Check AWS Load Balancer Controller is installed

Azure AKS:

Ensure the AKS cluster has permissions to create load balancers
Check resource group has available quota

GCP GKE:

Verify HTTP(S) Load Balancing is enabled on the cluster
Check firewall rules allow traffic on port 443

Deploy Nebari on Existing Kubernetes Clusters

Prerequisites

Overview

Evaluating the Infrastructure

AWS EKS Requirements

Azure AKS Requirements

Google Cloud GKE Requirements

Creating Node Groups

Creating EKS Node Groups

Creating AKS Node Pools

Creating GKE Node Pools

Configuring kubectl Context

Deploying Nebari

Initialize Nebari Configuration

Configure nebari-config.yaml

Deploy Nebari

Initialize Nebari Configuration

Configure nebari-config.yaml

Deploy Nebari

Initialize Nebari Configuration

Configure nebari-config.yaml

Deploy Nebari

Important Configuration Notes

Understanding kubernetes_context

Node Selectors

Verifying the Deployment

Troubleshooting

Pods Stuck in Pending

Authentication Issues

LoadBalancer Service Pending

Next Steps

Additional Resources

Prerequisites​

Overview​

Evaluating the Infrastructure​

AWS EKS Requirements​

Azure AKS Requirements​

Google Cloud GKE Requirements​

Creating Node Groups​

Creating EKS Node Groups​

Creating AKS Node Pools​

Creating GKE Node Pools​

Configuring kubectl Context​

Deploying Nebari​

Initialize Nebari Configuration​

Configure nebari-config.yaml​

Deploy Nebari​

Initialize Nebari Configuration​

Configure nebari-config.yaml​

Deploy Nebari​

Initialize Nebari Configuration​

Configure nebari-config.yaml​

Deploy Nebari​

Important Configuration Notes​

Understanding kubernetes_context​

Node Selectors​

Verifying the Deployment​

Troubleshooting​

Pods Stuck in Pending​

Authentication Issues​

LoadBalancer Service Pending​

Next Steps​

Additional Resources​

Prerequisites

Overview

Evaluating the Infrastructure

AWS EKS Requirements

Azure AKS Requirements

Google Cloud GKE Requirements

Creating Node Groups

Creating EKS Node Groups

Creating AKS Node Pools

Creating GKE Node Pools

Configuring kubectl Context

Deploying Nebari

Initialize Nebari Configuration

Configure nebari-config.yaml

Deploy Nebari

Initialize Nebari Configuration

Configure nebari-config.yaml

Deploy Nebari

Initialize Nebari Configuration

Configure nebari-config.yaml

Deploy Nebari

Important Configuration Notes

Understanding kubernetes_context

Node Selectors

Verifying the Deployment

Troubleshooting

Pods Stuck in Pending

Authentication Issues

LoadBalancer Service Pending

Next Steps

Additional Resources