Node tuning
Manage node-level tuning with the Node Tuning Operator.
Creating a simple TuneD profile for setting sysctl settings
If you would like to set some node-level tuning on the nodes in your hosted cluster, you can use the Node Tuning Operator. In HyperShift, node tuning can be configured by creating ConfigMaps which contain Tuned objects, and referencing these ConfigMaps in your NodePools.
-
Create a ConfigMap which contains a valid Tuned manifest and reference it in a NodePool. The example Tuned manifest below defines a profile which sets
vm.dirty_ratioto 55, on Nodes which contain the Node labeltuned-1-node-labelwith any value.Save the ConfigMap manifest in a file called
tuned-1.yaml:apiVersion: v1 kind: ConfigMap metadata: name: tuned-1 namespace: clusters data: tuning: | apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: tuned-1 namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Custom OpenShift profile include=openshift-node [sysctl] vm.dirty_ratio="55" name: tuned-1-profile recommend: - priority: 20 profile: tuned-1-profileNOTE: In the case where no labels are added to an entry in the
spec.recommendsection of the Tuned spec, NodePool based matching is assumed, so the highest priority profile in thespec.recommendsection will be applied to Nodes in the pool. While more fine-grained Node label based matching is still possible by setting a label value in the Tuned.spec.recommend.match, users should be aware that Node labels will not persist during an upgrade, unless the NodePool.spec.management.upgradeTypeis set toInPlace.Create the ConfigMap in the management cluster:
oc --kubeconfig="$MGMT_KUBECONFIG" create -f tuned-1.yamlReference the ConfigMap in the NodePools
spec.tuningConfigfield, either by editing an existing NodePool or creating a new NodePool. In this example we assume we only have one NodePool callednodepool-1, containing 2 Nodes.apiVersion: hypershift.openshift.io/v1alpha1 kind: NodePool metadata: ... name: nodepool-1 namespace: clusters ... spec: ... tuningConfig: - name: tuned-1 status: ...NOTE: You may reference the same ConfigMap in multiple NodePools. In HyperShift, NTO will append a hash of the NodePool name and namespace to the name of the Tuneds to distinguish them. Outside of this case, users should be careful not to create multiple Tuned profiles of the same name in different Tuneds for the same hosted cluster.
-
Now that the ConfigMap containing a Tuned manifest has been created and referenced in a NodePool, the Node Tuning Operator will sync the Tuned objects into the hosted cluster. You can check which Tuneds are defined and which profiles are set for each Node.
List the Tuned objects in the hosted cluster:
oc --kubeconfig="$HC_KUBECONFIG" get Tuneds -n openshift-cluster-node-tuning-operatorExample output:
NAME AGE default 7m36s rendered 7m36s tuned-1 65sList the Profiles in the hosted cluster:
oc --kubeconfig="$HC_KUBECONFIG" get Profiles -n openshift-cluster-node-tuning-operatorExample output:
NAME TUNED APPLIED DEGRADED AGE nodepool-1-worker-1 tuned-1-profile True False 7m43s nodepool-1-worker-2 tuned-1-profile True False 7m14sAs we can see, both worker nodes in the NodePool have the tuned-1-profile applied. Note that if no custom profiles are created, the
openshift-nodeprofile will be applied by default. -
To confirm the tuning was applied correctly, we can start a debug shell on a Node and check the sysctl values:
oc --kubeconfig="$HC_KUBECONFIG" debug node/nodepool-1-worker-1 -- chroot /host sysctl vm.dirty_ratioExample output:
vm.dirty_ratio = 55
Applying tuning which requires kernel boot parameters
You can also use the Node Tuning Operator for more complex tuning which requires setting kernel boot parameters. As an example, the following steps can be followed to create a NodePool with huge pages reserved.
-
Create the following ConfigMap which contains a Tuned object manifest for creating 10 hugepages of size 2M.
Save this ConfigMap manifest in a file called
tuned-hugepages.yaml:apiVersion: v1 kind: ConfigMap metadata: name: tuned-hugepages namespace: clusters data: tuning: | apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: hugepages namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=Boot time configuration for hugepages include=openshift-node [bootloader] cmdline_openshift_node_hugepages=hugepagesz=2M hugepages=50 name: openshift-node-hugepages recommend: - priority: 20 profile: openshift-node-hugepagesNOTE: The
.spec.recommend.matchfield is intentionally left blank. In this case this Tuned will be applied to all Nodes in the NodePool where this ConfigMap is referenced. It is advised to group Nodes with the same hardware configuration into the same NodePool. Not following this practice might result in TuneD operands calculating conflicting kernel parameters for two or more nodes sharing the same NodePool.Create the ConfigMap in the management cluster:
oc --kubeconfig="$MGMT_KUBECONFIG" create -f tuned-hugepages.yaml -
Create a new NodePool manifest YAML file, customize the NodePools upgrade type, and reference the previously created ConfigMap in the
spec.tuningConfigsection before creating it in the management cluster.Create the NodePool manifest and save it in a file called
hugepages-nodepool.yaml:NODEPOOL_NAME=hugepages-example INSTANCE_TYPE=m5.2xlarge NODEPOOL_REPLICAS=2 hypershift create nodepool aws \ --cluster-name $CLUSTER_NAME \ --name $NODEPOOL_NAME \ --replicas $NODEPOOL_REPLICAS \ --instance-type $INSTANCE_TYPE \ --render > hugepages-nodepool.yamlEdit
hugepages-nodepool.yaml. Set.spec.management.upgradeTypetoInPlace, and set.spec.tuningConfigto reference thetuned-hugepagesConfigMap you created.apiVersion: hypershift.openshift.io/v1alpha1 kind: NodePool metadata: name: hugepages-nodepool namespace: clusters ... spec: management: ... upgradeType: InPlace ... tuningConfig: - name: tuned-hugepagesNOTE: Setting
.spec.management.upgradeTypetoInPlaceis recommended to avoid unnecessary Node recreations when applying the new MachineConfigs. With theReplaceupgrade type, Nodes will be fully deleted and new nodes will replace them when applying the new kernel boot parameters that are calculated by the TuneD operand.Create the NodePool in the management cluster:
oc --kubeconfig="$MGMT_KUBECONFIG" create -f hugepages-nodepool.yaml -
After the Nodes become available, the containerized TuneD daemon will calculate the required kernel boot parameters based on the applied TuneD profile. After the Nodes become
Readyand reboot once to apply the generated MachineConfig, you can verify that the Tuned profile is applied and that the kernel boot parameters have been set.List the Tuned objects in the hosted cluster:
oc --kubeconfig="$HC_KUBECONFIG" get Tuneds -n openshift-cluster-node-tuning-operatorExample output:
NAME AGE default 123m hugepages-8dfb1fed 1m23s rendered 123mList the Profiles in the hosted cluster:
oc --kubeconfig="$HC_KUBECONFIG" get Profiles -n openshift-cluster-node-tuning-operatorExample output:
NAME TUNED APPLIED DEGRADED AGE nodepool-1-worker-1 openshift-node True False 132m nodepool-1-worker-2 openshift-node True False 131m hugepages-nodepool-worker-1 openshift-node-hugepages True False 4m8s hugepages-nodepool-worker-2 openshift-node-hugepages True False 3m57sBoth worker nodes in the new NodePool have the
openshift-node-hugepagesprofile applied. -
To confirm the tuning was applied correctly, we can start a debug shell on a Node and check
/proc/cmdlineoc --kubeconfig="$HC_KUBECONFIG" debug node/nodepool-1-worker-1 -- chroot /host cat /proc/cmdlineExample output:
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-... hugepagesz=2M hugepages=50
How to debug Node Tuning issues
If you face issues with Node Tuning, first check the Condition ValidTuningConfig in the NodePool that references your Tuned config. This reports any issue that may prevent the configuration load.
- lastTransitionTime: "2023-03-06T14:30:35Z"
message: ConfigMap "tuned" not found
observedGeneration: 2
reason: ValidationFailed
status: "False"
type: ValidTuningConfig
If the NodePool condition shows no issues, it means that the configuration has been loaded and propagated to the NodePool. You can then check the status of the relevant Profile Custom Resource in your HostedCluster. In the conditions you should see if the configuration has been applied successfully and whether there are any outstanding Warning or Errors. An example can be seen below.
status:
bootcmdline: ""
conditions:
- lastTransitionTime: "2023-03-06T14:22:14Z"
message: The TuneD daemon profile not yet applied, or application failed.
reason: Failed
status: "False"
type: Applied
- lastTransitionTime: "2023-03-06T14:22:14Z"
message: 'TuneD daemon issued one or more error message(s) during profile application.
TuneD stderr: ERROR tuned.daemon.controller: Failed to reload TuneD: Cannot
load profile(s) ''tuned-1-profile'': Cannot find profile ''openshift-node-notexistin''
in ''[''/etc/tuned'', ''/usr/lib/tuned'']''.'
reason: TunedError
status: "True"
type: Degraded
tunedProfile: tuned-1-profile