White Paper: How AI-Based SLM Model works and deployment strategies

White Paper: How AI-Based SLM Model works and deployment strategies
1. Introduction
As part of an AI learning initiative, an in-depth research study was undertaken to design a model capable of processing text inputs, analyzing contextual information, and generating coherent, human-like responses. The vision was to create a domain-specific model tailored to our Channels operations, effectively serving as a Smart Language Model (SLM). The goal was to enhance response time and simulate human-like interactions in digital environments.
2. Objective
Following extensive investigation, Proof of Concept (POC) was initiated with the following technical objectives:
To develop a text-processing AI model using state-of-the-art NLP techniques.
To implement the model using the Hugging Face T5 Transformer architecture.
To leverage PyTorch for training and fine-tuning the model to ensure accurate prompt interpretation and optimized output.
To export the trained model in the ONNX (Open Neural Network Exchange) format, enabling offline usability on mobile platforms.
The POC aimed not only to evaluate the technical feasibility of building a localized SLM but also to validate its effectiveness in real-world deployment scenarios.
3. Methodology & Tools Used
Model Architecture:
Model: Hugging Face T5 Transformer
Framework: PyTorch
Fine-tuning: Customized datasets tailored to the domain-specific prompts
Export Format: ONNX for platform independence and offline accessibility
Key Implementation Steps:
Prompt engineering to simulate real-world queries.
Iterative training and tuning to refine model accuracy.
ONNX conversion to ensure cross-platform support.
4. Demonstration
The POC included a hands-on demonstration of the model's capabilities in understanding and responding to domain-specific text prompts. The AI exhibited context-aware understanding and generated natural language outputs aligned with expected responses, thereby validating the model’s performance.
Prompts were trained using a massive 5k+ training data and when prompts were made for example I want to view my statement it responded with the data by breaking down the prompts with action to call .
5. Next Steps
Deployment Strategy:
API-Based Deployment:
Deploy as an API using FastAPI or FlaskAPI.
Host the model server-side to allow easy integration with applications.
Offline Deployment:
Convert and deploy the model on mobile devices using the ONNX format for local inference without server dependency.
Cloud Deployment:
Use Gradio for user interface and integrate with Hugging Face Spaces.
Deploy using Hugging Face CLI (huggingface-cli login and git push).
Containerization:
Containerize the solution using FlaskAPI or Flask Server.
Deploy to AWS Lambda + API Gateway, given the model size is compressed to <512 MB.
Cost Estimate:
Compute: 10,000 × 2GB × 1s = 20,000 GB-sec
Pricing: 20,000 × $0.00001667 ≈ $0.33/month
Request cost: $0.20 per 10K requests
Total ≈ $0.53/month
High-Performance Hosting (For Larger Models):
Use AWS SageMaker
Cost estimate: 8 hours/day × 30 days × $0.086 = $20.64/month
Minor model storage (~$0.10/month)
6. Conclusion
The development and demonstration of the domain-specific AI SLM model provide a strong foundation for leveraging advanced NLP within the organization. The model’s adaptability, combined with cost-effective deployment options, offers significant potential for scalable integration across platforms, both online and offline.
7. Acknowledgment
Thank you to all contributors and stakeholders involved in the ideation, development, and evaluation of this POC.
Subscribe to my newsletter
Read articles from Shri Guru Baskar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
