Getting started
LLM Kernel Tuner is a framework that helps with tuning and optimizing kernels by utilizing Large Language Models (LLMs).
Here is a simple example of how it can be used for a simple matrixMultiply kernel:
from llm_kernel_tuner import LLMKernelTransformer
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model_name='gpt-5')
kernel_string = """
__global__ void matrixMultiply(float *A, float *B, float *C, int A_width, int A_height, int B_width) {
int col = threadIdx.x + blockDim.x * blockIdx.x;
int row = threadIdx.y + blockDim.y * blockIdx.y;
if (col < B_width && row < A_height) {
float sum = 0;
for (int k = 0; k < A_width; ++k) {
sum += A[row * A_width + k] * B[k * B_width + col];
}
C[row * B_width + col] = sum;
}
}
"""
if __name__ == "__main__":
kernel_transformer = LLMKernelTransformer(kernel_string, model)
tuned_kernel, best_params, performance_tracker = kernel_transformer.make_kernel_tunable()
print("Final kernel:")
print(tuned_kernel.code)
print("Best params:")
print(best_params)
print(f"Optimization steps: {len(performance_tracker.steps)}")
if performance_tracker.has_improvements():
print(f"Total improvement: {performance_tracker.get_total_improvement():.2f}%")
LLM Kernel Tuner uses the clang library to parse CUDA kernel code. If the Python clang bindings cannot automatically find your libclang.so (or equivalent) file, you may need to set the
LIBCLANG_PATHenvironment variable. For example:export LIBCLANG_PATH=/usr/lib/llvm-14/lib/libclang.soReplace the path with the actual location of the libclang shared library on your system.
Similarly
clang_argscan be specified when creatingLLMKernelTransformerlike sokernel_transformer = LLMKernelTransformer(..., clang_args=['your', 'args']). If you are getting the errorfatal error: '__clang_cuda_runtime_wrapper.h' file not found, you may need to specify the resource directory:clang_args=['-resource-dir', '/usr/lib/clang/18'](replace with your actual clang version and path).