Overview

Since OpenAI released ChatGPT to the world in November 2022, OpenAI's LLM has become the de facto standard. Many open-source and commercial solutions supporting LLM integration offer OpenAI Compatible APIs that function identically to OpenAI's API. This means that many companies can build and operate their own OpenAI Compatible Servers tailored to their internal security environments and use cases.
An LLM Proxy serves as an intermediary layer between client applications and various LLM providers. It standardizes the interaction interface while adding essential enterprise features such as authentication, monitoring, and failover capabilities. This approach allows organizations to maintain control over their AI operations while leveraging different LLM services through a unified interface.
In this post, we'll outline how to create an OpenAI Compatible Server using Kotlin, Spring Boot with Azure OpenAI, Amazon Bedrock Claude.

Why Should You Run Your Own OpenAI-Compatible API Server?

Integration with internal authentication systems(SSO, OAuth, etc.) enables permission management and usage limits at department or team member levels. It also allows for detailed usage monitoring and audit log management.
Sensitive corporate data can be securely processed using internal LLMs only, and prompt filtering can be implemented when necessary to prevent data leakage.
Multiple LLM services such as Azure OpenAI and Amazon Bedrock can be flexibly selected and used according to specific situations.
Automatic failover to alternative LLMs is possible when a specific LLM experiences an outage.
While maintaining these advantages, popular LLM integration solutions like LangChain and Aider can immediately utilize it as an OpenAI-compatible API. Migration of existing OpenAI-based applications is also straightforward.

OpenAI Compatible Server Specification

The core of an OpenAI Compatible Server is to accurately emulate the operation of the OpenAI Chat Completion API. The server should be able to handle client requests like the following and perform LLM operations:

$ curl -X POST "http://localhost:8080/v1/openai/chat/completions" \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer {YOUR_API_KEY}" \
      -d '{
            "model": "gpt4-o",
            "messages": [
              {
                "role": "user",
                "content": "Hello, how are you?"
              }
            ],
            "maxTokens": 4096,
            "temperature": 0.1,
            "stream": true
          }'

For streaming responses, the server should be able to send each response Chunk to the client using Server-Sent Events as follows:

{
   "id": "unique-emitter-id",
   "object": "chat.completion.chunk",
   "created": 1633024800,
   "model": "gpt4-o",
   "choices": [
     {
       "delta": {
         "content": "Hello"
       }
     }
   ]
 }

When the streaming response is complete, the server should be able to send a completion message to the client using Server-Sent Events as follows:

[DONE]

Project Creation

Install Spring Initializr locally and create a new project as follows:

$ sdk install springboot
$ spring init --type gradle-project-kotlin --language kotlin --java-version 21 --dependencies=web openai-comp-demo
$ cd openai-comp-demo

build.gradle.kts

Add the LangChain4j library dependency to the build.gradle.kts file in the project root as follows:

val langChain4jVersion = "0.35.0"
val awsSdkVersion = "2.29.6"
dependencies {
    implementation("dev.langchain4j:langchain4j-core:$langChain4jVersion")
    implementation("dev.langchain4j:langchain4j-azure-open-ai:$langChain4jVersion")
    implementation("software.amazon.awssdk:bedrockruntime:$awsSdkVersion")
    implementation("software.amazon.awssdk:apache-client:$awsSdkVersion")
    implementation("software.amazon.awssdk:netty-nio-client:$awsSdkVersion")
}

Creating JsonConfig

Create an ObjectMapper bean that will convert responses from the REST API into DTOs.

@Configuration
class JsonConfig {

    @Bean("objectMapper")
    @Primary
    fun objectMapper(): ObjectMapper {

        return Jackson2ObjectMapperBuilder
            .json()
            .serializationInclusion(JsonInclude.Include.ALWAYS)
            .failOnEmptyBeans(false)
            .failOnUnknownProperties(false)
            .featuresToDisable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS)
            .modulesToInstall(kotlinModule(), JavaTimeModule())
            .build()
    }
}

Creating OpenAiCompatibleChatCompletionDTO

Create DTOs that comply with the OpenAi Compatible API as follows:

import com.fasterxml.jackson.annotation.JsonProperty
import com.fasterxml.jackson.core.JsonGenerator
import com.fasterxml.jackson.core.JsonParser
import com.fasterxml.jackson.core.JsonToken
import com.fasterxml.jackson.core.type.TypeReference
import com.fasterxml.jackson.databind.DeserializationContext
import com.fasterxml.jackson.databind.JsonDeserializer
import com.fasterxml.jackson.databind.JsonSerializer
import com.fasterxml.jackson.databind.SerializerProvider
import com.fasterxml.jackson.databind.annotation.JsonDeserialize
import com.fasterxml.jackson.databind.annotation.JsonSerialize

/**
 * Represents a chat completion request in OpenAI-compatible format.
 * @property model The model identifier to use for completion
 * @property messages The conversation history as a list of messages
 * @property maxCompletionTokens Maximum tokens to generate in the response
 * @property temperature Controls randomness in the response (0.0 = deterministic, 1.0 = creative)
 * @property stream Whether to stream the response or return it all at once
 */
data class OpenAiCompatibleChatCompletionRequest(
    val model: String = "gpt-4o",
    val messages: List<OpenAiCompatibleChatMessage>,
    val maxCompletionTokens: Int = 16384,
    val temperature: Float = 0.0f,
    val stream: Boolean = false
)

/**
 * Represents a chat message in OpenAI-compatible format.
 * @property role The role of the message sender (e.g., "system", "user", "assistant")
 * @property content List of content items that can include text and images
 */
data class OpenAiCompatibleChatMessage(
    val role: String = "user",
    @JsonDeserialize(using = ContentDeserializer::class)
    @JsonSerialize(using = ContentSerializer::class)
    val content: List<OpenAiCompatibleContentItem>? = null
)

/**
 * Represents a single content item in a chat message.
 * @property type Content type identifier ("text" or "image_url")
 * @property text The text content if type is "text"
 * @property imageUrl The image URL details if type is "image_url"
 */
data class OpenAiCompatibleContentItem(
    val type: String = "text",
    val text: String? = null,
    @JsonProperty("image_url")
    val imageUrl: ImageUrl? = null
)

/**
 * Contains image URL information for image content items.
 * @property url The actual URL of the image (can be http(s) or base64 data URI)
 * @property detail The desired detail level for image analysis
 */
data class ImageUrl(
    val url: String,
    val detail: String? = "auto"
)

/**
 * Represents a complete response from the chat completion API.
 * @property id Unique identifier for the completion
 * @property object Type identifier for the response
 * @property created Timestamp of when the completion was created
 * @property model The model used for completion
 * @property choices List of completion choices/responses
 * @property usage Token usage statistics for the request
 */
data class OpenAiCompatibleChatCompletionResponse(
    val id: String,
    val `object`: String,
    val created: Long,
    val model: String,
    val choices: List<OpenAiCompatibleChoice>,
    val usage: OpenAiCompatibleUsage? = null
)

/**
 * Represents a single completion choice in the response.
 * @property message The generated message content
 * @property finishReason Why the completion stopped (e.g., "stop", "length")
 */
data class OpenAiCompatibleChoice(
    val message: OpenAiCompatibleChatMessage,
    val finishReason: String? = null
)

/**
 * Represents a chunk of the streaming response.
 * Used when stream=true in the request.
 */
data class OpenAiCompatibleChatCompletionChunk(
    val id: String,
    val `object`: String,
    val created: Long,
    val model: String,
    val choices: List<OpenAiCompatibleChunkChoice>
)

/**
 * Represents a choice within a streaming response chunk.
 */
data class OpenAiCompatibleChunkChoice(
    val delta: OpenAiCompatibleDelta,
    val finishReason: String? = null
)

/**
 * Represents the incremental changes in a streaming response.
 */
data class OpenAiCompatibleDelta(
    val content: String? = null,
    val role: String? = null
)

/**
 * Contains token usage statistics for the request.
 * @property promptTokens Number of tokens in the input prompt
 * @property completionTokens Number of tokens in the generated completion
 * @property totalTokens Total tokens used (prompt + completion)
 */
data class OpenAiCompatibleUsage(
    val promptTokens: Int,
    val completionTokens: Int,
    val totalTokens: Int
)

/**
 * Custom serializer for chat message content.
 * Converts structured content arrays to string format for compatibility with litellm.
 */
class ContentSerializer : JsonSerializer<List<OpenAiCompatibleContentItem>>() {

    override fun serialize(
        value: List<OpenAiCompatibleContentItem>?,
        gen: JsonGenerator,
        serializers: SerializerProvider
    ) {
        when {
            value == null -> gen.writeNull()
            value.isEmpty() -> gen.writeString("")
            else -> {
                // Combine all text content into a single string
                val combinedText = value.mapNotNull { item ->
                    when (item.type) {
                        "text" -> item.text
                        else -> null
                    }
                }.joinToString("\n")
                gen.writeString(combinedText)
            }
        }
    }
}

/**
 * Custom deserializer for chat message content.
 * Handles both string-only content and structured content arrays.
 * Converts legacy string content to the new structured format for compatibility.
 */
class ContentDeserializer : JsonDeserializer<List<OpenAiCompatibleContentItem>>() {

    override fun deserialize(p: JsonParser, ctxt: DeserializationContext): List<OpenAiCompatibleContentItem> {
        return when (p.currentToken) {
            JsonToken.VALUE_STRING -> {
                // Convert legacy string content to structured format
                listOf(OpenAiCompatibleContentItem(type = "text", text = p.valueAsString))
            }

            JsonToken.START_ARRAY -> {
                // Parse structured content array
                val typeRef = object : TypeReference<List<OpenAiCompatibleContentItem>>() {}
                p.codec.readValue(p, typeRef)
            }

            JsonToken.VALUE_NULL -> {
                emptyList()
            }

            else -> {
                throw ctxt.weirdStringException(p.text, List::class.java, "Unexpected JSON token")
            }
        }
    }
}

Creating OpenAiCompatibleService

Before creating the actual implementation service class that performs the role of an LLM Proxy, we create an interface to accommodate various LLMs.

import org.springframework.web.servlet.mvc.method.annotation.SseEmitter

interface OpenAiCompatibleService {
    fun createChatCompletion(request: OpenAiCompatibleChatCompletionRequest): OpenAiCompatibleChatCompletionResponse
    fun createStreamingChatCompletion(request: OpenAiCompatibleChatCompletionRequest): SseEmitter
}

Creating OpenAiCompatibleAzureOpenAiServiceImpl

Create an OpenAiCompatibleAzureOpenAiServiceImpl bean that supports both streaming and non-streaming methods:

import com.fasterxml.jackson.databind.ObjectMapper
import dev.langchain4j.data.message.AiMessage
import dev.langchain4j.data.message.UserMessage
import dev.langchain4j.model.StreamingResponseHandler
import dev.langchain4j.model.azure.AzureOpenAiChatModel
import dev.langchain4j.model.azure.AzureOpenAiStreamingChatModel
import dev.langchain4j.model.output.Response
import org.springframework.http.MediaType
import org.springframework.stereotype.Service
import org.springframework.web.servlet.mvc.method.annotation.SseEmitter
import java.io.IOException
import java.time.Instant
import java.util.*
import java.util.concurrent.ConcurrentHashMap

@Service
class OpenAiCompatibleAzureOpenAiServiceImpl(
    private val objectMapper: ObjectMapper
) : OpenAiCompatibleService {
    private val emitters = ConcurrentHashMap<String, SseEmitter>()

    override fun createChatCompletion(request: OpenAiCompatibleChatCompletionRequest): OpenAiCompatibleChatCompletionResponse {

        val chatLanguageModel = AzureOpenAiChatModel.builder()
            .apiKey("{your-azure-openai-api-key}")
            .endpoint("{your-azure-openai-endpoint}")
            .deploymentName("{your-azure-openai-deployment-name}")
            .temperature(request.temperature.toDouble())
            .maxTokens(request.maxCompletionTokens)
            .topP(0.3)
            .logRequestsAndResponses(true)
            .build()


        val messages = request.messages.map { msg ->
            val content = msg.content?.joinToString("\n") { item ->
                when (item.type) {
                    "text" -> item.text ?: ""
                    else -> ""
                }
            } ?: ""
            UserMessage.from(content)
        }
        val response = chatLanguageModel.generate(messages.toList())

        return OpenAiCompatibleChatCompletionResponse(
            id = UUID.randomUUID().toString(),
            `object` = "chat.completion",
            created = Instant.now().epochSecond,
            model = request.model,
            choices = listOf(
                OpenAiCompatibleChoice(
                    OpenAiCompatibleChatMessage(
                        role = "assistant",
                        content = listOf(OpenAiCompatibleContentItem(type = "text", text = response.content().text()))
                    )
                )
            )
        )
    }

    override fun createStreamingChatCompletion(request: OpenAiCompatibleChatCompletionRequest): SseEmitter {

        val streamingChatLanguageModel = AzureOpenAiStreamingChatModel.builder()
            .apiKey("{your-azure-openai-api-key}")
            .endpoint("{your-azure-openai-endpoint}")
            .deploymentName("{your-azure-openai-deployment-name}")
            .temperature(request.temperature.toDouble())
            .maxTokens(request.maxCompletionTokens)
            .logRequestsAndResponses(true)
            .build()

        val emitter = SseEmitter()
        val emitterId = UUID.randomUUID().toString()
        emitters[emitterId] = emitter

        val messages = request.messages.map { msg ->
            val content = msg.content?.joinToString("\n") { item ->
                when (item.type) {
                    "text" -> item.text ?: ""
                    else -> ""
                }
            } ?: ""
            UserMessage.from(content)
        }

        streamingChatLanguageModel.generate(messages.toList(), object : StreamingResponseHandler<AiMessage> {
            override fun onNext(token: String) {
                val chunk = OpenAiCompatibleChatCompletionChunk(
                    id = emitterId,
                    `object` = "chat.completion.chunk",
                    created = Instant.now().epochSecond,
                    model = request.model,
                    choices = listOf(OpenAiCompatibleChunkChoice(OpenAiCompatibleDelta(content = token)))
                )
                try {
                    try {
                        emitter.send(
                            SseEmitter.event()
                                .data(objectMapper.writeValueAsString(chunk), MediaType.APPLICATION_NDJSON)
                        )
                    } catch (e: IOException) {
                        emitter.completeWithError(e)
                        emitters.remove(emitterId)
                    }
                } catch (e: IOException) {
                    emitter.completeWithError(e)
                    emitters.remove(emitterId)
                }
            }

            override fun onComplete(response: Response<AiMessage>) {
                try {
                    emitter.send(SseEmitter.event().data("[DONE]"))
                    emitter.complete()
                    emitters.remove(emitterId)
                } catch (e: IOException) {
                    emitter.completeWithError(e)
                    emitters.remove(emitterId)
                }
            }

            override fun onError(error: Throwable) {
                emitter.completeWithError(error)
                emitters.remove(emitterId)
            }
        })

        return emitter
    }
}

Creating OpenAiCompatibleAmazonBedrockClaudeServiceImpl

Create an OpenAiCompatibleAmazonBedrockClaudeServiceImpl bean that supports both streaming and non-streaming methods:

import com.fasterxml.jackson.databind.ObjectMapper
import org.springframework.http.MediaType
import org.springframework.stereotype.Service
import org.springframework.web.servlet.mvc.method.annotation.SseEmitter
import software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider
import software.amazon.awssdk.core.SdkBytes
import software.amazon.awssdk.http.apache.ApacheHttpClient
import software.amazon.awssdk.http.nio.netty.ProxyConfiguration
import software.amazon.awssdk.regions.Region
import software.amazon.awssdk.services.bedrockruntime.BedrockRuntimeAsyncClient
import software.amazon.awssdk.services.bedrockruntime.BedrockRuntimeClient
import software.amazon.awssdk.services.bedrockruntime.model.*
import java.net.HttpURLConnection
import java.time.Duration
import java.time.Instant
import java.util.*
import java.util.concurrent.CompletableFuture
import java.util.concurrent.ExecutionException
import java.util.concurrent.TimeUnit
import java.util.concurrent.TimeoutException

/**
 * Implementation of OpenAI-compatible API using Amazon Bedrock Claude model.
 * Provides both streaming and non-streaming chat completions with OpenAI-compatible interface.
 */
@Service
class OpenAiCompatibleAmazonBedrockClaudeServiceImpl(
    private val objectMapper: ObjectMapper
) : OpenAiCompatibleService {

    companion object {
        // Maximum time to wait for model response before timing out
        private const val TIMEOUT_SECONDS = 180L

        // Claude model identifier - latest stable version as of 2024
        private const val MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"
    }

    /**
     * Synchronous Bedrock client for non-streaming requests.
     * Configured with appropriate timeouts and AWS credentials.
     */
    private val bedrockRuntimeClient: BedrockRuntimeClient by lazy {
        val httpClient = ApacheHttpClient.builder()
            .connectionTimeout(Duration.ofSeconds(TIMEOUT_SECONDS))
            .socketTimeout(Duration.ofSeconds(TIMEOUT_SECONDS))
            .build()

        BedrockRuntimeClient.builder()
            .region(Region.US_WEST_2)
            .credentialsProvider(DefaultCredentialsProvider.create())
            .httpClient(httpClient)
            .build()
    }

    /**
     * Asynchronous Bedrock client optimized for streaming responses.
     * Configured with proxy settings to bypass corporate proxies for AWS services,
     * appropriate timeouts, and AWS credentials.
     */
    private val bedrockRuntimeAsyncClient: BedrockRuntimeAsyncClient by lazy {
        System.setProperty("http.nonProxyHosts", "*.amazonaws.com|*.amazon.com")

        val asyncHttpClient = software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient.builder()
            .connectionTimeout(Duration.ofSeconds(TIMEOUT_SECONDS))
            .readTimeout(Duration.ofSeconds(TIMEOUT_SECONDS))
            .proxyConfiguration(
                ProxyConfiguration.builder()
                    .nonProxyHosts(setOf("*.amazonaws.com", "*.amazon.com"))
                    .useSystemPropertyValues(true)
                    .build()
            )
            .build()

        BedrockRuntimeAsyncClient.builder()
            .region(Region.US_WEST_2)
            .credentialsProvider(DefaultCredentialsProvider.create())
            .httpClient(asyncHttpClient)
            .build()
    }

    /**
     * Creates a non-streaming chat completion using Claude model.
     * Handles the asynchronous request-response cycle with Amazon Bedrock,
     * maintaining OpenAI API compatibility for seamless integration.
     */
    override fun createChatCompletion(request: OpenAiCompatibleChatCompletionRequest): OpenAiCompatibleChatCompletionResponse {

        try {
            // Normalize and validate message sequence
            val normalizedMessages = normalizeMessages(request.messages)
            validateMessages(normalizedMessages)

            // Set up CompletableFuture for async response handling
            val future = CompletableFuture<OpenAiCompatibleChatCompletionResponse>()

            // Invoke Bedrock's Claude model asynchronously
            bedrockRuntimeAsyncClient.converse { params ->
                params.modelId(MODEL_ID)
                    .messages(normalizedMessages)
                    .inferenceConfig { config ->
                        config.maxTokens(request.maxCompletionTokens)
                            .temperature(request.temperature)
                    }
            }.whenComplete { response, error ->
                if (error != null) {
                    future.completeExceptionally(error)
                } else {
                    val inputText = normalizedMessages.joinToString("\n") { msg ->
                        msg.content().joinToString("\n") { item ->
                            when (item.type()) {
                                ContentBlock.Type.TEXT -> item.text()
                                else -> ""
                            }
                        }
                    }
                    val outputText = response.output().message().content()[0].text()
                    val usage = response.usage()

                    println("===== Input text: $inputText")
                    println("===== Output text: $outputText")
                    println("===== Input tokens: ${usage.inputTokens()}")
                    println("===== Output tokens: ${usage.outputTokens()}")
                    println("===== Total tokens: ${usage.totalTokens()}")

                    val compatibleResponse = OpenAiCompatibleChatCompletionResponse(
                        id = UUID.randomUUID().toString(),
                        `object` = "chat.completion",
                        created = Instant.now().epochSecond,
                        model = request.model,
                        choices = listOf(
                            OpenAiCompatibleChoice(
                                OpenAiCompatibleChatMessage(
                                    role = "assistant",
                                    content = listOf(OpenAiCompatibleContentItem(type = "text", text = outputText))
                                )
                            )
                        )
                    )
                    future.complete(compatibleResponse)
                }
            }

            return future.get(TIMEOUT_SECONDS, TimeUnit.SECONDS)

        } catch (e: Exception) {
            when (e) {
                is TimeoutException -> throw RuntimeException("Request timed out after $TIMEOUT_SECONDS seconds", e)
                is ExecutionException -> throw RuntimeException("Bedrock API Error: ${e.cause?.message}", e)
                else -> throw RuntimeException("Unexpected error: ${e.message}", e)
            }
        }
    }

    /**
     * Creates a streaming chat completion using Claude model.
     * Uses Server-Sent Events (SSE) to stream responses in OpenAI-compatible format.
     */
    override fun createStreamingChatCompletion(request: OpenAiCompatibleChatCompletionRequest): SseEmitter {

        // Initialize SSE emitter with timeout
        val emitter = SseEmitter(TIMEOUT_SECONDS * 1000)
        val emitterId = UUID.randomUUID().toString()

        // StringBuilder to accumulate response text
        val responseBuilder = StringBuilder()
        val inputText = request.messages.joinToString("\n") { msg ->
            msg.content?.joinToString("\n") { item ->
                when (item.type) {
                    "text" -> item.text ?: ""
                    else -> ""
                }
            } ?: ""
        }

        // Variable to track token usage
        var lastTokenUsage: TokenUsage? = null

        try {
            val normalizedMessages = normalizeMessages(request.messages)
            validateMessages(normalizedMessages)

            val responseStreamHandler = ConverseStreamResponseHandler.builder()
                .subscriber(
                    ConverseStreamResponseHandler.Visitor.builder()
                        .onContentBlockDelta { chunk ->
                            val deltaContent = chunk.delta().text()
                            responseBuilder.append(deltaContent)

                            val compatibleChunk = OpenAiCompatibleChatCompletionChunk(
                                id = emitterId,
                                `object` = "chat.completion.chunk",
                                created = Instant.now().epochSecond,
                                model = request.model,
                                choices = listOf(
                                    OpenAiCompatibleChunkChoice(
                                        delta = OpenAiCompatibleDelta(content = deltaContent)
                                    )
                                )
                            )

                            emitter.send(
                                SseEmitter.event()
                                    .data(objectMapper.writeValueAsString(compatibleChunk), MediaType.APPLICATION_JSON)
                            )
                        }
                        .onMetadata { metadata ->
                            // Update token usage metrics from metadata
                            lastTokenUsage = metadata.usage()
                        }
                        .build()
                )
                .onError { err ->
                    emitter.completeWithError(RuntimeException("Bedrock API Error: ${err.message}"))
                }
                .build()

            bedrockRuntimeAsyncClient.converseStream(
                { builder ->
                    builder.modelId(MODEL_ID)
                        .messages(normalizedMessages)
                        .inferenceConfig { config ->
                            config.maxTokens(request.maxCompletionTokens)
                                .temperature(request.temperature)
                        }
                },
                responseStreamHandler
            ).whenComplete { _, error ->
                if (error != null) {
                    emitter.completeWithError(error)
                } else {
                    println("===== Input text: $inputText")
                    println("===== Output text: $responseBuilder")
                    lastTokenUsage?.let { usage ->
                        println("===== Input tokens: ${usage.inputTokens()}")
                        println("===== Output tokens: ${usage.outputTokens()}")
                        println("===== Total tokens: ${usage.totalTokens()}")
                    }

                    emitter.send(SseEmitter.event().data("[DONE]"))
                    emitter.complete()
                }
            }

        } catch (e: Exception) {
            emitter.completeWithError(e)
        }

        return emitter
    }

    /**
     * Converts OpenAI message format to Claude's expected format.
     * Handles:
     * - Adding default system message if not present
     * - Converting message roles (system/user/assistant)
     * - Processing text and image content
     * - Merging consecutive messages from same role
     *
     * @param messages List of OpenAI-formatted messages
     * @return List of Claude-formatted messages
     */
    private fun normalizeMessages(messages: List<OpenAiCompatibleChatMessage>): List<Message> {
        val defaultSystemMessage = Message.builder()
            .content(ContentBlock.fromText("You are a helpful assistant."))
            .role(ConversationRole.USER)
            .build()

        val convertedMessages = messages.mapIndexed { index, msg ->
            val contentBlocks = mutableListOf<ContentBlock>()
            msg.content?.forEach { item ->
                when (item.type) {
                    "text" -> item.text?.let { text ->
                        contentBlocks.add(ContentBlock.fromText(text))
                    }

                    "image_url" -> item.imageUrl?.let { imageUrl ->
                        val sdkBytes = when {
                            imageUrl.url.startsWith("data:") -> {
                                val base64Data = imageUrl.url.substringAfter("base64,")
                                val decodedBytes = Base64.getDecoder().decode(base64Data)
                                SdkBytes.fromByteArray(decodedBytes)
                            }

                            imageUrl.url.startsWith("http://") || imageUrl.url.startsWith("https://") -> {
                                val connection =
                                    java.net.URI(imageUrl.url).toURL().openConnection() as HttpURLConnection
                                connection.connectTimeout = 10000
                                connection.readTimeout = 10000
                                connection.inputStream.use { inputStream ->
                                    SdkBytes.fromInputStream(inputStream)
                                }
                            }

                            else -> throw IllegalArgumentException("Unsupported image URL format: ${imageUrl.url}")
                        }

                        contentBlocks.add(
                            ContentBlock.fromImage(
                                ImageBlock.builder()
                                    .source(ImageSource.builder().bytes(sdkBytes).build())
                                    .format(ImageFormat.PNG)
                                    .build()
                            )
                        )
                    }
                }
            }

            Message.builder()
                .content(contentBlocks)
                .role(
                    when {
                        index == 0 && msg.role == "system" -> ConversationRole.USER
                        msg.role == "user" -> ConversationRole.USER
                        msg.role == "assistant" -> ConversationRole.ASSISTANT
                        else -> ConversationRole.USER
                    }
                )
                .build()
        }

        // Prepend default system message if needed
        val initialMessages = if (messages.firstOrNull()?.role != "system") {
            listOf(defaultSystemMessage) + convertedMessages
        } else {
            convertedMessages
        }

        // Merge consecutive messages from the same role
        return initialMessages.fold(mutableListOf()) { acc, message ->
            if (acc.isEmpty() || acc.last().role() != message.role()) {
                acc.add(message)
            } else {
                val lastMessage = acc.last()
                acc[acc.lastIndex] = Message.builder()
                    .content(
                        ContentBlock.fromText(
                            buildString {
                                lastMessage.content().forEach { block ->
                                    if (block.type() == ContentBlock.Type.TEXT) {
                                        append(block.text())
                                        append("\n")
                                    }
                                }
                                message.content().forEach { block ->
                                    if (block.type() == ContentBlock.Type.TEXT) {
                                        append(block.text())
                                        append("\n")
                                    }
                                }
                            }.trimEnd()
                        )
                    )
                    .role(lastMessage.role())
                    .build()
            }
            acc
        }
    }

    /**
     * Validates message sequence according to Claude model requirements.
     * Ensures:
     * - Messages list is not empty
     * - Proper role alternation between user and assistant
     *
     * @param messages List of normalized messages to validate
     * @throws IllegalArgumentException if validation fails
     */
    private fun validateMessages(messages: List<Message>) {

        if (messages.isEmpty()) {
            throw IllegalArgumentException("Messages cannot be empty")
        }

        messages.windowed(2).forEach { (prev, current) ->
            if (prev.role() == current.role()) {
                throw IllegalArgumentException("Messages must alternate between user and assistant roles")
            }
        }
    }
}

Creating OpenAiCompatibleController

Finally, create the OpenAiCompatibleController bean:

import org.springframework.beans.factory.annotation.Qualifier
import org.springframework.http.MediaType
import org.springframework.web.bind.annotation.*

@RestController
@RequestMapping("/v1/openai")
class OpenAiCompatibleController(
    // Specify the implementation for [Azure OpenAI] or [Amazon Bedrock Claude]
    @Qualifier("openAiCompatibleAmazonBedrockClaudeServiceImpl") private val openAiCompatibleService: OpenAiCompatibleService
) {
    @PostMapping("/chat/completions", produces = [MediaType.APPLICATION_JSON_VALUE])
    fun chatCompletions(
        @RequestHeader("Authorization") authHeader: String?,
        @RequestBody request: OpenAiCompatibleChatCompletionRequest
    ): Any {

        val apiKey = authHeader?.removePrefix("Bearer ")
        // Custom authentication can be applied using the obtained API_KEY

        return if (request.stream) {
            openAICompatibleService.createStreamingChatCompletion(request)
        } else {
            openAICompatibleService.createChatCompletion(request)
        }
    }
}

Testing the OpenAI compatible API

The creation of the OpenAI Compatible Server is complete. You can run the server and set environment variables for Aider, a popular AI coding assistant tool, to verify its operation.

# Run the project
$ ./gradlew bootRun

# Set the API of the running project in Aider's environment variables
$ export OPENAI_API_BASE=http://localhost:8080/v1/openai/
$ export OPENAI_API_KEY={YOUR_API_KEY}

# Reset token-related settings when using Amazon Bedrock Claude implementation
$ nano ~/.aider.model.metadata.json
{
    "openai/gpt-4o": {
        "max_tokens": 8192,
        "max_input_tokens": 200000,
        "max_output_tokens": 8192,
        "input_cost_per_token": 0.000003,
        "output_cost_per_token": 0.000015,
        "litellm_provider": "openai",
        "mode": "chat",
        "supports_function_calling": true,
        "supports_vision": true,
        "tool_use_system_prompt_tokens": 159,
        "supports_assistant_prefill": true
    }
}

# Run Aider
$ aider --model openai/gpt-4o
Aider v0.63.1
Model: openai/custom with whole edit format, infinite output
Git repo: .git with 22 files
Repo-map: disabled
Use /help <question> for help, run "aider --help" to see cmd line args
> Hello, how are you?

Hello! I'm doing well, thank you. How can I assist you with your project today? If you have any specific changes or questions, feel
free to let me know!

Building a Custom OpenAI-Compatible API Server with Kotlin, Spring Boot, LangChain4j

Overview

Why Should You Run Your Own OpenAI-Compatible API Server?

OpenAI Compatible Server Specification

Project Creation

build.gradle.kts

Creating JsonConfig

Creating OpenAiCompatibleChatCompletionDTO

Creating OpenAiCompatibleService

Creating OpenAiCompatibleAzureOpenAiServiceImpl

Creating OpenAiCompatibleAmazonBedrockClaudeServiceImpl

Creating OpenAiCompatibleController

Testing the OpenAI compatible API

References and Further Reading

Subscribe to my newsletter

Taehyeong Lee

Taehyeong Lee