This article was written using version 0.8.0 of the Trino community helm charts and version 1.21.9-gke.1002 of Kubernetes.

In this article, we are going to walk through deploying Trino with the community helm charts on the Google Cloud Platform.

This article assumes you already have a Kubernetes cluster created in Google Cloud and have a connection to it configured for kubectl.

We’ll start with a simple bare bones deployment and progress from there.

First, we need to add the helm repository for the Trino community charts.

helm repo add trino https://trinodb.github.io/charts/
helm repo update

Now that we have the chart repository added, we can get started.

Simple cluster with defaults

Let’s start wit the simplest possible deployment. We will use mostly the default values for the Trino chart.

The main things we will change are the environment name and how the cluster is exposed. We will expose the cluster with a LoadBalancer for now just for this article for simple testing.

Our custom value will be stored in a trino.yaml file. For our first example, our trino.yaml file will have the following contents:

server:
  node:
    environment: test

service:
  type: LoadBalancer

Now, lets deploy this cluster with helm:

helm install trino trino/trino --version 0.8.0 --values trino.yaml

This will create a Trino cluster with 1 coordinator and 2 workers. We can verify this by looking at the pods that have been deployed:

$ kubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
trino-coordinator-5dbcff8f8f-2g7b9   1/1     Running   0          51s
trino-worker-7cdd97b6c-kd26h         1/1     Running   0          51s
trino-worker-7cdd97b6c-vmhrf         1/1     Running   0          51s
$

Next, we’ll verify a LoadBalancer has been created with an external IP:

$ kubectl get svc trino
NAME    TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
trino   LoadBalancer   10.81.96.139   34.148.83.2   8080:32764/TCP   87s
$

Now that a LoadBalancer has been created with an external IP, we can connect with the Trino CLI and run a query:

$ trino --server http://34.148.83.2:8080 --user padraig
trino> select * from system.runtime.nodes;
              node_id               |        http_uri        | node_version | coordinator | state
------------------------------------+------------------------+--------------+-------------+--------
 trino-coordinator-5dbcff8f8f-2g7b9 | http://10.84.1.25:8080 | 375          | true        | active
 trino-worker-7cdd97b6c-vmhrf       | http://10.84.1.24:8080 | 375          | false       | active
 trino-worker-7cdd97b6c-kd26h       | http://10.84.2.13:8080 | 375          | false       | active
(3 rows)

Query 20220401_012800_00000_2numb, FINISHED, 2 nodes
Splits: 2 total, 2 done (100.00%)
2.56 [3 rows, 186B] [1 rows/s, 73B/s]

trino>

This shows all the nodes in our cluster.

Add a MySQL catalog

Next, let’s add a new MySQL catalog to our Trino cluster.

First, we’ll deploy an ephermal MySQL service that we can use in our cluster.

helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

The mysql.yaml file I’m using with custom values looks like:

auth:
  database: trino
  rootPassword: root

We deploy MySQL with helm:

helm install mysql bitnami/mysql --values mysql.yaml

This will deploy a MySQL server and create a schema named trino. It will also set the password for the root user to root.

Now, we will update our trino.yaml to add a new catalog for connecting to this MySQL database.

server:
  node:
    environment: test

additionalCatalogs:
  mysql: |
    connector.name=mysql
    connection-url=jdbc:mysql://mysql:3306/
    connection-user=root
    connection-password=root

service:
  type: LoadBalancer

Now, we can update our Trino deployment with helm:

helm upgrade trino trino/trino --install --version 0.8.0 --values trino.yaml
kubectl rollout restart deployment trino-coordinator
kubectl rollout restart deployment trino-worker

Notice we need to restart all the pods for the configuration to become immediately effective. There is work in progress in the Trino charts to automatically restart the pods in this scenario and avoid the need for the additional kubectl rollout restart command.

Once the Trino pods are restarted, try connecting to Trino with the CLI and verify the MySQL catalog is present with a schema named trino.

$ trino --server http://34.148.83.2:8080 --user padraig
trino> show schemas in mysql;
       Schema
--------------------
 trino
 information_schema
 performance_schema
(3 rows)

Query 20220401_014443_00004_typwa, FINISHED, 3 nodes
Splits: 6 total, 6 done (100.00%)
0.38 [3 rows, 58B] [7 rows/s, 153B/s]

trino>

Enable autoscaling

Lets now enable autoscaling for our Trino cluster to add workers when necessary. The only metric that can be used for autoscaling at the moment is CPU utilization.

To enable autoscaling, our trino.yaml file will now look like:

server:
  node:
    environment: test
  autoscaling:
    enabled: true
    maxReplicas: 4
    targetCPUUtilizationPercentage: 15

additionalCatalogs:
  mysql: |
    connector.name=mysql
    connection-url=jdbc:mysql://mysql:3306/
    connection-user=root
    connection-password=root

service:
  type: LoadBalancer

The default target CPU utilization is 50% so I set it to 15% to have auto scaling kick in quickly for demo purposes.

Now, lets redeploy Trino:

helm upgrade trino trino/trino --install --version 0.8.0 --values trino.yaml
kubectl rollout restart deployment trino-coordinator
kubectl rollout restart deployment trino-worker

Ad again, verify there are 2 worker pods initially.

$ kubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
mysql-0                              1/1     Running   0          28m
trino-coordinator-5dbcff8f8f-4mk6g   1/1     Running   0          32s
trino-worker-7cdd97b6c-jcrgr         1/1     Running   0          32s
trino-worker-7cdd97b6c-ls8tc         1/1     Running   0          32s
$

Now, lets kick off some CPU intensive queries in parallel. For example, kick this query off a few times in parallel:

select sum(quantity) from tpch.sf100000.lineitem;

While you have 4 or 5 of these CPU intensive queries running in parallel, monitor the number of worker pods and you will see auto scaling kick in:

NAME                                 READY   STATUS    RESTARTS   AGE
mysql-0                              1/1     Running   0          31m
trino-coordinator-5dbcff8f8f-4mk6g   1/1     Running   0          3m22s
trino-worker-7cdd97b6c-j5wg6         0/1     Running   0          6s
trino-worker-7cdd97b6c-jcrgr         1/1     Running   0          3m22s
trino-worker-7cdd97b6c-ls8tc         1/1     Running   0          3m22s

Notice the trino-worker-7cdd97b6c-j5wg6 worker pod that is starting up now.

Once the worker pod becomes ready, you will see the number of workers in the Trino web UI increase.

Change JVM configs

So far, we have not been setting the max heap size for the JVM on either the workers or the coordinator. Let’s say we want to specify what the max heap size for JVM should be.

This is possible by updating our trino.yaml file to look like:

server:
  node:
    environment: test
  autoscaling:
    enabled: true
    maxReplicas: 4
    targetCPUUtilizationPercentage: 15

coordinator:
  jvm:
    maxHeapSize: "12G"

worker:
  jvm:
    maxHeapSize: "12G"

additionalCatalogs:
  mysql: |
    connector.name=mysql
    connection-url=jdbc:mysql://mysql:3306/
    connection-user=root
    connection-password=root

service:
  type: LoadBalancer

Notice that the workers and coordinator can have different JVM heap sizes if we wished. Now let’s redeploy Trino:

helm upgrade trino trino/trino --install --version 0.8.0 --values trino.yaml
kubectl rollout restart deployment trino-coordinator
kubectl rollout restart deployment trino-worker

To verify the memory on the pods, we can use jcmd:

$ kubectl exec -it trino-coordinator-57bf54b59-rb5xs -- /bin/bash -c "jcmd 1 VM.flags"
-XX:CICompilerCount=2 -XX:ConcGCThreads=1 -XX:+ExitOnOutOfMemoryError -XX:+ExplicitGCInvokesConcurrent -XX:G1ConcRefinementThreads=1 -XX:G1HeapRegionSize=33554432 -XX:GCDrainStackTargetSize=64 -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=67108864 -XX:MarkStackSize=4194304 -XX:MaxHeapSize=12884901888 -XX:MaxNewSize=7717519360 -XX:MinHeapDeltaBytes=33554432 -XX:NonNMethodCodeHeapSize=5825164 -XX:NonProfiledCodeHeapSize=265522874 -XX:PerBytecodeRecompilationCutoff=10000 -XX:PerMethodRecompilationCutoff=10000 -XX:ProfiledCodeHeapSize=265522874 -XX:ReservedCodeCacheSize=536870912 -XX:+SegmentedCodeCache -XX:-UseBiasedLocking -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastUnorderedTimeStamps -XX:+UseG1GC -XX:+UseGCOverheadLimit
$

Notice -XX:MaxHeapSize=12884901888 corresponds to the value I specified in my YAML file.

TLS termination and authentication

Finally, we want to enable TLS and password authentication for out Trino cluster. We will use file based password authentiation.

We need to update our Trino config to enable PASSWORD based authentication. For file based authentication, the helm chart allows us to enter a list of username and passwords that will automatically get placed in a password.db file on the coordinator pod.

The password must be in one of the formats outlined in the Trino docs. For this example, I am going to use bcrypt format and I’m going to add a username/password entry for bob/bob.

We are also going to stop exposing the cluster with a LoadBalancer and instead expose it using the default method of ClusterIP.

As we are going to be doing TLS termination, we need set 1 additional config property in the coordinator - http-server.process-forwarded.

With all that said, our trino.yaml file will look like the following:

server:
  node:
    environment: test
  config:
    authenticationType: "PASSWORD"
  autoscaling:
    enabled: true
    maxReplicas: 4
    targetCPUUtilizationPercentage: 15
  coordinatorExtraConfig:
    http-server.process-forwarded=true

auth:
  passwordAuth: "bob:$2y$10$5bJMqeHSHUa/zh1GNJNJhOUuaOL/kF5clDgKz9s.cNlHrHIL8U/HW"

coordinator:
  jvm:
    maxHeapSize: "12G"

worker:
  jvm:
    maxHeapSize: "12G"

additionalCatalogs:
  mysql: |
    connector.name=mysql
    connection-url=jdbc:mysql://mysql:3306/
    connection-user=root
    connection-password=root

Redeploy our Trino cluster:

helm upgrade trino trino/trino --install --version 0.8.0 --values trino.yaml
kubectl rollout restart deployment trino-coordinator
kubectl rollout restart deployment trino-worker

The services on our Kubernetes cluster will now look like:

$ kubectl get service
NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
kubernetes       ClusterIP   10.81.96.1     <none>        443/TCP    4d2h
mysql            ClusterIP   10.81.96.189   <none>        3306/TCP   3d
mysql-headless   ClusterIP   None           <none>        3306/TCP   3d
trino            ClusterIP   10.81.96.179   <none>        8080/TCP   2d
$

Now to expose the cluster and configure TLS termination, we are going to manually create an Ingress.

First, we will create a static IP and configure a DNS A record. Since this article is for the Google Cloud platform, the way we create a static IP is:

gcloud compute addresses describe trino-static-ip --global

Once the static IP is ready, we configure a DNS A record with this IP.

We are going to use Google Managed certificates for enabling TLS to this cluster.

The first thing we need to do is create a managed certificate with the DNS record we configured for the static IP we created. We will place the following in a managed-cert.yaml file:

apiVersion: networking.gke.io/v1
kind: ManagedCertificate
metadata:
  name: trino-managed-cert
spec:
  domains:
    - trino-gateway.starburst-customer-success.com

And we will deploy it with kubectl:

kubectl apply -f managed-cert.yaml

It can take up to 60 minutes to provision a managed certificate on the Google Cloud Platform. Once it is provisioned, it should show a status of Active:

$ kubectl get managedcertificate
NAME           AGE   STATUS
managed-cert   47h   Active
$

Now we are ready to create an Ingress. We will put the following in an ingress.yaml file:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: trino-ingress
  annotations:
    kubernetes.io/ingress.global-static-ip-name: "trino-static-ip"
    networking.gke.io/managed-certificates: managed-cert
    kubernetes.io/ingress.class: "gce"
spec:
  defaultBackend:
    service:
      name: trino
      port:
        number: 8080

Notice we reference the managed certificate and static IP we created earlier.

Now deploy the Ingress:

kubectl apply -f ingress.yaml

After the Ingress has an external IP allocated, navigate to the web UI at the hostname we configured our DNS record for.

Trino web UI

We can also now connect with a username and password with the Trino CLI:

$ trino --server https://trino-gateway.starburst-customer-success.com --user bob --password
Password:
trino> show catalogs;
 Catalog
---------
 mysql
 system
 tpcds
 tpch
(4 rows)

Query 20220402_021515_00000_qhent, FINISHED, 2 nodes
Splits: 6 total, 6 done (100.00%)
4.03 [0 rows, 0B] [0 rows/s, 0B/s]

trino>

Troubleshooting

If you are unable to connect to the Trino cluster through the Ingress you configured, you should look at whether the Ingress thinks the backend Trino service is healthy or not.

You want to see something like this in the output of kubectl describe ingress trino-ingress:

  Host        Path  Backends
  ----        ----  --------
  *           *     trino:8080 (10.84.1.48:8080)
Annotations:  ingress.gcp.kubernetes.io/pre-shared-cert: mcrt-a18fe6f3-77de-4e3f-b1da-5a4dff917a1b
              ingress.kubernetes.io/backends: {"k8s1-a536ee33-default-trino-8080-1f12238e":"HEALTHY"}

If the backend Trino service is UNKNOWN or not HEALTHY, then we need to create a custom health check for the Trino service. This can be done by putting the following in a trino_gc_config.yaml file:

apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
  name: trino-hc-config
spec:
  healthCheck:
    checkIntervalSec: 15
    port: 8080
    type: HTTP
    requestPath: /v1/status

Now deploy the custom health check config:

kubectl apply -f trino_hc_config.yaml

And finally, annotate the Trino service with the custom health check we just created:

kubectl annotate svc trino cloud.google.com/backend-config='{"default": "trino-hc-config"}'

The above is specific to the Google Cloud Platform.

Conclusion

We covered a few different configurations that can be configured with the Trino community helm charts.

Many more are planned or in progress in the helm charts such as Ingress support in the helm charts and access control.

If anyone has any questions or would like to see more examples, please feel free to reach out to me on the Trino community slack or message me on twitter.