Getting started

LLM Kernel Tuner is a framework that helps with tuning and optimizing kernels by utilizing Large Language Models (LLMs).

Here is a simple example of how it can be used for a simple matrixMultiply kernel:

from llm_kernel_tuner import LLMKernelTransformer
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model_name='gpt-5')

kernel_string = """
__global__ void matrixMultiply(float *A, float *B, float *C, int A_width, int A_height, int B_width) {
    int col = threadIdx.x + blockDim.x * blockIdx.x;
    int row = threadIdx.y + blockDim.y * blockIdx.y;
    if (col < B_width && row < A_height) {
        float sum = 0;
        for (int k = 0; k < A_width; ++k) {
            sum += A[row * A_width + k] * B[k * B_width + col];
        }
        C[row * B_width + col] = sum;
    }
}
"""

if __name__ == "__main__":
    kernel_transformer = LLMKernelTransformer(kernel_string, model)
    tuned_kernel, best_params, performance_tracker = kernel_transformer.make_kernel_tunable()
    print("Final kernel:")
    print(tuned_kernel.code)
    print("Best params:")
    print(best_params)
    print(f"Optimization steps: {len(performance_tracker.steps)}")
    if performance_tracker.has_improvements():
        print(f"Total improvement: {performance_tracker.get_total_improvement():.2f}%")

The example above will use OpenAI’s gpt-5 model to tune the kernel.
You can chose any langchain chat model, most commonly used models can be found here.
By default LLM Kernel Tuner uses Autonomous Tuning Strategy and Naive Tester Strategy. But you can change these strategies to a different tuning strategy, create your own tuning strategy or create your own testing strategy.

LLM Kernel Tuner uses the clang library to parse CUDA kernel code. If the Python clang bindings cannot automatically find your libclang.so (or equivalent) file, you may need to set the LIBCLANG_PATH environment variable. For example:
export LIBCLANG_PATH=/usr/lib/llvm-14/lib/libclang.so
Replace the path with the actual location of the libclang shared library on your system.

Similarly clang_args can be specified when creating LLMKernelTransformer like so kernel_transformer = LLMKernelTransformer(..., clang_args=['your', 'args']). If you are getting the error fatal error: '__clang_cuda_runtime_wrapper.h' file not found, you may need to specify the resource directory: clang_args=['-resource-dir', '/usr/lib/clang/18'] (replace with your actual clang version and path).