Mustafa CavusogluMC

Command Palette

Search for a command to run...

AboutExperiencesProjects
Linux3Docker7Git4Kubernetes7Network2uv1Miniconda1OpenShift4
Back to Home
Kubernetes

Serverless workloads on Kubernetes: Knative Serving, Eventing, and OpenShift Serverless

serverlessknativeservingeventingscale-to-zeroopenshiftfaas

Kubernetes Serverless

Serverless workloads on Kubernetes with Knative and OpenShift Serverless.

What is Knative?

Knative extends Kubernetes to run serverless workloads:

  • Serving: Request-driven auto-scaling (including scale-to-zero)
  • Eventing: Event-driven architecture with sources and brokers

Knative Serving

Basic Knative Service

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: ml-inference
  namespace: ml-serving
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "0"
        autoscaling.knative.dev/maxScale: "10"
        autoscaling.knative.dev/target: "100"
    spec:
      containers:
        - image: my-registry/ml-model:v1
          ports:
            - containerPort: 8080
          resources:
            requests:
              memory: "512Mi"
              cpu: "500m"
            limits:
              memory: "1Gi"
              cpu: "1000m"
          env:
            - name: MODEL_PATH
              value: "/models/v1"
          readinessProbe:
            httpGet:
              path: /health
              port: 8080

Auto-Scaling Configuration

metadata:
  annotations:
    # Scale to zero when no traffic
    autoscaling.knative.dev/minScale: "0"
    # Maximum replicas
    autoscaling.knative.dev/maxScale: "20"
    # Concurrent requests per pod
    autoscaling.knative.dev/target: "50"
    # Metric type: concurrency or rps
    autoscaling.knative.dev/metric: "concurrency"
    # Scale down delay (seconds)
    autoscaling.knative.dev/scale-down-delay: "30s"
    # Scale to zero grace period
    autoscaling.knative.dev/scale-to-zero-grace-period: "60s"

Traffic Splitting (Canary / Blue-Green)

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: ml-inference
spec:
  template:
    metadata:
      name: ml-inference-v2
    spec:
      containers:
        - image: my-registry/ml-model:v2
  traffic:
    - revisionName: ml-inference-v1
      percent: 80
    - revisionName: ml-inference-v2
      percent: 20
# Canary → Full rollout
kn service update ml-inference --traffic ml-inference-v2=100

# Rollback
kn service update ml-inference --traffic ml-inference-v1=100

kn CLI

# Create service
kn service create my-app --image my-registry/app:v1 --port 8080

# List services
kn service list

# Update service
kn service update my-app --image my-registry/app:v2

# Describe service
kn service describe my-app

# Delete service
kn service delete my-app

# List revisions
kn revision list

# Get service URL
kn service describe my-app -o url

Knative Eventing

Event Source → Service

apiVersion: sources.knative.dev/v1
kind: ApiServerSource
metadata:
  name: pod-events
spec:
  serviceAccountName: event-watcher
  mode: Resource
  resources:
    - apiVersion: v1
      kind: Pod
  sink:
    ref:
      apiVersion: serving.knative.dev/v1
      kind: Service
      name: event-processor

Broker & Trigger Pattern

apiVersion: eventing.knative.dev/v1
kind: Broker
metadata:
  name: default
  namespace: ml-events
---
apiVersion: eventing.knative.dev/v1
kind: Trigger
metadata:
  name: ml-training-trigger
spec:
  broker: default
  filter:
    attributes:
      type: ml.data.updated
  subscriber:
    ref:
      apiVersion: serving.knative.dev/v1
      kind: Service
      name: training-pipeline

Serverless vs Deployment

FeatureDeploymentKnative Serverless
Minimum pods1+ always running0 (scale-to-zero)
ScalingHPA (manual config)Automatic (request-based)
CostAlways payingPay per request
Cold startNonePossible (scale from 0)
Traffic splitManual (multiple deployments)Built-in
Use caseSteady trafficBursty/intermittent traffic

When to Use Serverless

  • ML inference APIs with variable/bursty traffic
  • Data preprocessing triggered by events
  • Webhook handlers and event processors
  • Dev/staging environments (cost savings with scale-to-zero)
  • Batch scoring triggered on demand