Introduction

University life is one of the most pivotal stages in a person’s journey, laying the foundation for future career decisions and personal growth. A key aspect of this experience is coursework, which is carefully designed to help students learn, develop, and specialize in their chosen fields. Yet, despite its importance, many students find the process of selecting the right courses both challenging and overwhelming. Often, this confusion stems from uncertainty about which subjects will best support their career goals, or simply from the sheer number of options available in a large course catalog.

From my conversations with university students, I’ve observed that this is a widespread issue, especially in colleges that offer a broad range of electives. While academic advisors are available to guide students through these choices, they often lack the capacity to provide personalized, in-depth advice to every individual. As a result, students frequently end up enrolling in “popular” courses recommended by their seniors or following their friends’ choices without fully considering how these decisions align with their long-term educational and career aspirations.

Motivation

When university coursework plays such a critical role in shaping one’s career and when students invest significant time and money in each course, the decision-making process shouldn’t be left to chance or peer influence. I believe there should be a system that allows anyone to access personalized course recommendations, along with clear explanations for why each course is suggested.

I first introduced this concept at the RISE Expo (Research, Innovation, Scholarship, and Entrepreneurship) held at Northeastern University in Spring 2024. There, I presented a poster on the idea and discussed it with both undergraduate and graduate students, as well as educators. The feedback was overwhelmingly positive, though I wasn’t able to pursue implementation until now.

With this project, my goal was to develop an AI-based course recommendation system that could assist both students and advisors in identifying the most suitable coursework for each individual. The system, called Ayuda (Spanish for “Help”), builds a comprehensive user profile by considering factors such as skills, achievements, work experience, and previous coursework. Based on this analysis, Ayuda recommends the best-matching courses tailored to each student’s unique background and goals.

Dataset

Every AI project starts with data. For the initial version of this project, I focused on a small dataset sourced from Northeastern University, specifically using the MGEN department’s course curriculum from the Graduate College of Engineering. The dataset is structured in CSV format, containing columns such as course_id, course_name, course_description, prerequisites, major, domains, and skills_associated.

course_id: The official course ID as designated by the university
course_name: The title of the course
course_description: A brief summary of the course content
prerequisites: Any required prior courses (can be zero or more)
major: For this version, only MGEN department majors such as CSYE, DAMG, INFO, etc., are included
domains: The subject area or specialization, like AI or Software Development
skills_associated: Key skills or concepts linked to the course

The complete MGEN dataset consists of about 73 entries. Since the dataset was relatively small and centralized, I was able to gather it manually. I then used a Python script to clean and format the data according to the project’s requirements.

Approach / Solution

High-level Architecture and Workflow

Ayuda is an AI-powered course recommendation system that delivers personalized course suggestions to students, tailoring recommendations to their unique academic backgrounds and profiles. The system is built on a modular, service-oriented architecture, leveraging modern technologies at each stage:

Frontend

Students interact with Ayuda through a sleek web interface built with React, CSS and Material-UI. The frontend handles everything from user registration and login to uploading resumes and viewing personalized course recommendations.

Backend

The core application logic runs on FastAPI, which exposes RESTful APIs for user management, course operations, recommendation processing, and admin tasks. This ensures smooth communication between the frontend and all underlying services.

Data Storage

Ayuda utilizes a combination of specialized databases, each serving a unique purpose:

PostgreSQL: Manages all relational data, such as user profiles and course catalogs.
Redis: Handles caching and provides fast, keyword-based lookups for efficient data retrieval.
Neo4j: Stores and manages course prerequisites and dependencies in a graph structure, making it easy to visualize and query complex relationships.
Pinecone: Functions as a vector database, storing course and resume embeddings for semantic similarity search.

GenAI Integration

Advanced AI capabilities are at the heart of Ayuda:

Hugging Face Sentence Transformers are used to generate vector embeddings from student resumes.
Ollama LLM (Large Language Model) powers the system’s ability to provide explainable, personalized course recommendations.

How It All Works

User Onboarding: Students register or log in and upload their resumes via the web interface. They also update their profile with additional skills and previously completed coursework.
Profile Analysis: The system extracts relevant information and generates embeddings from the uploaded resumes.
Recommendation Engine: Ayuda combines semantic matching (via Pinecone) and keyword search (via Redis and PostgreSQL), checks prerequisite requirements using Neo4j, and delivers explainable recommendations with the help of Ollama LLM.
Admin Features: Administrators can easily manage users, courses, and monitor system health through dedicated backend endpoints.

Choice of Algorithms and Models: Why They Matter

To deliver highly personalized and accurate course recommendations, Ayuda leverages a combination of advanced algorithms and models. Each component plays a specific role in the recommendation pipeline:

Semantic Similarity (Sentence Transformers + Pinecone)

Ayuda uses state-of-the-art Sentence Transformers to generate vector embeddings from both user resumes and course descriptions. These embeddings are stored and searched using Pinecone, a vector database. This approach enables the system to match students with courses based on deeper, contextual similarities, going far beyond simple keyword overlap. As a result, recommendations reflect the nuanced alignment between a student’s background and course content.

Keyword Matching (Redis)

To complement semantic search, Ayuda also incorporates traditional keyword matching using Redis. This ensures that explicit skills or topics mentioned in the user’s resume are matched directly with relevant courses. Redis’s in-memory architecture enables lightning-fast lookups, contributing to a seamless, real-time user experience.

Hybrid Scoring

The system combines scores from both semantic similarity and keyword matching, resulting in more robust and accurate recommendations. This hybrid approach ensures that both subtle contextual matches and explicit requirements are considered.

Prerequisite Checking (Neo4j Graph Queries)

To ensure students are only recommended courses they are eligible for, Ayuda uses Neo4j to manage and query course prerequisites and dependencies. This guarantees that every recommendation is both relevant and academically feasible for the user.

LLM Reasoning (Ollama Llama3)

For the final touch, Ayuda employs Ollama Llama3, a large language model, to generate personalized and human-like explanations for each recommendation. This not only increases transparency but also helps students understand the rationale behind each suggested course, building trust in the system.

Technologies and Tools Behind Ayuda

Ayuda brings together a modern tech stack designed for performance, scalability, and robust user experience. Here’s a look at the core technologies powering the platform:

Backend

FastAPI: A high-performance Python web framework, ideal for building scalable APIs quickly and efficiently.
PostgreSQL 17: Serves as the primary relational database, providing reliable and persistent storage for user profiles, courses, and system data.
Redis 7: Acts as an in-memory cache and a high-speed keyword store, ensuring real-time responsiveness for lookups and searches.
Neo4j 5+: A graph database used to model and query complex course prerequisite relationships, enabling advanced academic path validation.
Pinecone: A managed vector database that supports fast and scalable semantic search, crucial for matching students to relevant courses.
Ollama (Llama3 model): A locally hosted Large Language Model used to generate clear, human-like explanations for course recommendations.
Alembic: Manages database migrations, making it easy to evolve the database schema as the project grows.
Sentence Transformers: Generates vector embeddings from resume text, powering Ayuda’s semantic search capabilities.
PyMuPDF, python-docx: Libraries for extracting and parsing text from uploaded resumes in various formats.
httpx, asyncio: Enable asynchronous and non-blocking backend operations, boosting performance and scalability.
Docker & Docker Compose: Streamline local development, deployment, and service orchestration across different environments.
Custom Logging: Implements structured JSON logs for effective monitoring and analytics.

Other Core Components

JWT & bcrypt: Secure the platform with robust authentication (JWT) and password hashing (bcrypt).
Pydantic: Handles data validation and serialization for incoming and outgoing API data.
SQLAlchemy: An ORM (Object Relational Mapper) that simplifies database interactions.
pytest: A powerful testing framework, ensuring code quality and reliability.

Unique Techniques or Innovations

Hybrid Recommendation Engine: What sets Ayuda apart is its integration of several innovative techniques to deliver a truly personalized and reliable course recommendation experience. At the heart of the system is a hybrid recommendation engine that combines semantic search (using vector-based similarity) with traditional keyword matching. This dual approach ensures that recommendations are not only contextually relevant by capturing the nuanced relationships between student profiles and course content, but also precise when it comes to explicit skill or topic requirements.

Graph-based Prerequisite Validation: To make sure every suggestion is actionable and fits each student’s academic path, Ayuda leverages graph-based prerequisite validation powered by Neo4j. By modeling course dependencies as a graph, the system can dynamically assess whether a student meets all necessary prerequisites, ensuring that recommendations are always feasible.

Explainable AI via LLM: A major innovation lies in Ayuda’s focus on explainable AI. Using a locally hosted large language model (Llama3) via Ollama, the platform generates clear, personalized explanations for every recommendation. This transparency helps users understand why certain courses are suggested, building trust and confidence in the system.

Resume Embedding Pipeline: The platform also features an automated resume embedding pipeline, which handles the extraction and vectorization of resume data from both PDF and DOCX files. This streamlines the onboarding process and ensures that recommendations are based on a comprehensive understanding of each student’s experiences and achievements.

Performance Optimizations, Admin Analytics and Logging: To deliver a responsive and scalable user experience, Ayuda incorporates advanced performance optimizations including asynchronous programming, connection pooling, and efficient caching via Redis. Finally, a robust analytics and logging layer tracks system health, user interactions, and recommendation effectiveness, enabling continuous improvement and offering valuable insights for administrators.

Services and Components

Recommendation Engine

# For users with completed courses:
hybrid_score = (
    semantic_score * 0.4 +      # 40% semantic similarity
    keyword_score * 0.2 +       # 20% keyword matching
    prerequisite_bonus +        # 30% prerequisite completion bonus
    progression_bonus * 0.05 +  # 5% progression bonus
    background_bonus * 0.05     # 5% background fit
)

# For new users (no completed courses):
hybrid_score = (
    semantic_score * 0.35 +     # 35% semantic similarity
    keyword_score * 0.25 +      # 25% keyword matching (higher weight)
    prerequisite_bonus +        # 30% prerequisite bonus (foundational courses)
    background_bonus * 0.1      # 10% background fit (higher weight)
)

Why These Weights?

When designing Ayuda’s hybrid recommendation score, I carefully chose the weight of each component to reflect its importance in delivering both relevant and achievable course suggestions. Here’s the reasoning behind these choices:

Semantic Score (40% for experienced users, 35% for new users):
Semantic similarity receives the highest weight because it allows Ayuda to go beyond simple keyword matching, capturing the deeper context and meaning in a student’s resume. This enables the system to recommend courses that are truly aligned with a student’s actual interests, skills, and academic journey. For new users, this weight is slightly reduced (35%) since they benefit from more explicit guidance through keywords.

Prerequisite Bonus (30%):
Ensuring that a student is eligible to take a course is crucial. That’s why the prerequisite bonus consistently gets a substantial 30% weight, whether the user is experienced or new. This component validates that the student has completed the required foundational knowledge, guaranteeing that recommendations are not only relevant but also actionable.

Keyword Score (20% for experienced users, 25% for new users):
Keyword matching directly connects the skills and topics in a student’s resume to those specified in the course catalog. For new users, this weight is slightly higher (25%) to provide more direct, easily interpretable matches, which is an important factor when students have little to no course history for the system to learn from. For users with completed courses, it is still important but plays more of a supporting role (20%).

Background Bonus (5% for experienced users, 10% for new users):
This bonus checks how well the recommended courses align with the student’s declared major or academic background. New users receive a higher background bonus (10%) because, in the absence of a course history, it’s especially important to keep recommendations tightly aligned with their intended field. For experienced users, this weight is lower (5%), as their course history and skills already provide strong context.

Progression Bonus (5%, experienced users only):
Finally, for students who have already completed some courses, an additional progression bonus encourages recommendations that build logically on their prior coursework, supporting smooth academic growth.

By thoughtfully tuning these weights, Ayuda delivers recommendations that are not only tailored and aspirational, but also achievable and practical for every student’s unique situation.

User Management & Resume Processing

Embedding generation pipeline:

# Lazy loading for performance
_embedding_model = SentenceTransformer('all-MiniLM-L6-v2')  # 384-dimensional embeddings

# Enhanced embedding creation
def create_enhanced_user_embedding(self, user_id: str, 
                                 resume_weight: float = 0.7, 
                                 skills_weight: float = 0.3):
    # Combines resume embedding (70%) with skills embedding (30%)

A key part of Ayuda’s intelligence lies in how it represents each user’s background through embeddings (compact vector representations) that capture the essence of their experience and skills. To do this efficiently and accurately, I designed a pipeline that combines information from both the user’s resume and their self-reported skills.

Model Choice:
For generating embeddings, I chose the all-MiniLM-L6-v2 model from Sentence Transformers. This model strikes an ideal balance between speed, accuracy, and resource usage: its 384-dimensional output is both powerful and memory-efficient, making it well-suited for real-time recommendations and scalable deployments (as opposed to larger, slower 768-dimensional models).

Weighted Embedding Combination:
When creating a user profile, Ayuda generates two separate embeddings: one from the uploaded resume and another from the explicit skills the user has listed. These are then combined, with the resume embedding given a higher weight (70%) and the skills embedding given a supporting weight (30%). This approach reflects the reality that the resume typically contains richer, more detailed information about the user’s academic and professional journey, while skills provide explicit signals about areas of expertise.

Lazy Loading for Performance:
To optimize performance, the embedding model is loaded lazily (only when it’s first needed), rather than on every single request. This prevents the unnecessary overhead of loading a large (150MB+) model into memory each time, ensuring the system remains fast and responsive even as usage scales.

By thoughtfully structuring the embedding pipeline this way, Ayuda efficiently builds comprehensive user profiles that power more accurate and personalized course recommendations.

Graph Database (Neo4j) Operations

Neo4j Connection pooling configuration:

self.driver = GraphDatabase.driver(
    settings.neo4j_uri,
    auth=(settings.neo4j_username, settings.neo4j_password),
    max_connection_lifetime=3600,  # 1 hour
    max_connection_pool_size=50,   # Increased pool size
    connection_acquisition_timeout=60,  # 60 seconds timeout
    connection_timeout=30,  # 30 seconds connection timeout
    max_transaction_retry_time=15  # 15 seconds retry timeout
)

Ayuda uses Neo4j as its graph database to manage and traverse complex course prerequisite relationships, ensuring every course recommendation is both academically feasible and tailored to the student’s learning path. To keep Neo4j responsive and reliable especially under heavy usage, the system uses a carefully tuned connection pooling strategy.

Optimized Connection Pool Size:
The pool is configured to allow up to 50 concurrent connections. This larger pool size enables Ayuda to handle multiple recommendation requests at the same time without bottlenecks, which is crucial as the user base grows or during periods of high activity.

Connection Lifetime (1 hour):
Each connection in the pool has a maximum lifetime of 3600 seconds (one hour). This strikes a balance between reusing established connections for efficiency and freeing up memory over time, helping to keep resource usage in check.

Acquisition Timeout (60 seconds):
The connection acquisition timeout is set to 60 seconds, meaning that if all connections are busy, the system will wait up to a minute before returning an error. This prevents requests from getting stuck indefinitely, even when the database is under heavy load.

Quick Failure Detection (15 seconds retry timeout):
A retry timeout of 15 seconds is implemented for failed transactions. This ensures that if something goes wrong, the system quickly recognizes the issue and either retries or gracefully fails, maintaining a smooth user experience.

By thoughtfully tuning these parameters, Ayuda ensures that its graph database operations are both robust and scalable, delivering accurate course recommendations efficiently, even as demand fluctuates.

Prerequisite Checking Algorithm:

# Cypher query for prerequisite validation
query = """
MATCH (course:Course {id: $course_id})-[r:PREREQUISITE]->(prereq:Course)
WHERE prereq.id IN $completed_courses
RETURN count(r) as completed_count, count(*) as total_count
"""

Caching & Keyword Matching using Redis

Keyword Extraction Algorithm:

# Text normalization and filtering
normalized_text = re.sub(r'[^\w\s]', ' ', text.lower())
words = set(normalized_text.split())

# Stop word filtering (40+ common words removed)
stop_words = {'the', 'a', 'an', 'and', 'or', 'but', ...}

# Keyword filtering criteria
keywords = {
    word for word in words 
    if (len(word) >= 3 and 
        word not in stop_words and 
        not word.isdigit())
}

Hybrid Scoring:

def calculate_hybrid_score(self, semantic_score: float, 
                            keyword_score: float, 
                            semantic_weight: float = 0.7, 
                            keyword_weight: float = 0.3):
    return (semantic_score * semantic_weight) + (keyword_score * keyword_weight)

The parameters in Ayuda’s keyword extraction and scoring process were carefully selected to maximize recommendation quality and relevance. By setting a minimum word length of three characters, the algorithm filters out short, insignificant words, ensuring that only meaningful terms are considered. Assigning a 70% weight to semantic similarity emphasizes deeper, contextual understanding between a student’s background and the course content, while the 30% keyword weight ensures that explicit skill matches are not overlooked. Finally, comprehensive stop word removal eliminates 40-50% of common, non-informative words from the text, significantly reducing noise and allowing the system to focus on the information that truly matters.

AI Reasoning Engine using Ollama

Prompt Engineering:

# Structured prompt with specific sections
prompt = f"""
STUDENT PROFILE:
- Major: {major}
- Completed Courses: {completed_courses}
- Skills from Resume: {resume_skills}

COURSE INFORMATION:
- Course Name: {course_name}
- Skills Covered: {skills_associated}

RECOMMENDATION CONTEXT:
- Semantic Similarity Score: {semantic_score:.2f}
- Prerequisite Status: {prerequisite_status}

Provide a concise explanation (1 paragraph, max 100 words)...
"""

Connection Configuration:

async with httpx.AsyncClient(
    timeout=httpx.Timeout(
        connect=5.0,    # 5 seconds to establish connection
        read=15.0,      # 15 seconds for response
        write=10.0,     # 10 seconds to write request
        pool=30.0       # 30 seconds pool timeout
    )
) as client:

Each timeout parameter for connecting to the Ollama AI engine is tuned for reliability and user experience. A 5-second connect timeout ensures the system quickly detects and recovers from network issues, preventing users from waiting on failed connections. The 15-second read timeout strikes a balance between giving the AI model enough time to generate high-quality responses and keeping users from waiting too long. A 10-second write timeout is set to efficiently handle the transmission of large, structured prompts without unnecessary delays. Finally, the 30-second pool timeout helps prevent connection leaks, ensuring that resources are released promptly and the system remains stable during high demand.

Course Management Service

Embedding Generation:

# Lazy loading for performance
model = SentenceTransformer('all-MiniLM-L6-v2')

# Batch processing for efficiency
for course in courses:
    embedding = model.encode(text).tolist()  # 384-dimensional vector
    metadata = {
        "type": "course",
        "course_name": name,
        "major": major,
        "domains": domains,
        "skills_associated": skills
    }

The design of Ayuda’s course management service emphasizes both efficiency and consistency. The embedding model (all-MiniLM-L6-v2) is deliberately kept the same as the one used for user profiles, ensuring that both course and user embeddings share the same feature space for reliable comparison and matching. Each course embedding is enriched with a well-structured metadata dictionary containing the course name, major, domains, and associated skills which allows for precise filtering and context-aware searches when students are looking for relevant courses. Additionally, batch processing is used to generate embeddings for multiple courses at once, significantly reducing the number of API calls made to Pinecone by up to 90%. This not only speeds up the overall workflow but also conserves computational resources and enhances scalability as the number of courses grows.

Results

When a user requests course recommendations, the system identifies the top k courses for which the user is currently eligible, as well as k additional courses that closely match the user’s profile but are currently unavailable due to prerequisite requirements.

If the user chooses to click the Analyze Recommendation button, a custom-tuned, Ollama-based language model provides a holistic analysis of the recommendation in relation to the user’s profile. It then generates a clear, easy-to-understand explanation, offering deeper insights into why each course has been suggested and how it aligns with the user’s goals.

Future Work

While the current version of Ayuda demonstrates strong potential, there are several important features still needed to make the system truly production-ready and valuable for real users.

At present, Ayuda relies exclusively on a content-based filtering approach, where recommendations are generated by measuring the semantic similarity between a user’s profile and available courses. This is achieved through 384-dimensional embeddings created by Sentence Transformers (all-MiniLM-L6-v2) for both users and courses, stored and compared in Pinecone using cosine similarity. Exact keyword matching further refines these results, as NLP techniques identify critical keywords from both the user profiles and course descriptions. Additionally, the system tracks course prerequisites using a Neo4j graph database, allowing it to recommend courses that a user is eligible for as well as courses that are a strong fit but require additional prerequisites. The system also incorporates contextual filters based on a student’s major, completed courses, and academic background to ensure recommendations align with their progression.

However, one key enhancement that would elevate Ayuda’s effectiveness is the addition of collaborative filtering. By analyzing data from other users, the system could recommend courses based not just on content similarity, but also on the behaviors and preferences of students with similar profiles. This approach, known as user-user similarity, would allow recommendations to reflect course popularity and user ratings, greatly improving the relevance and usefulness of suggestions. Implementing a feedback and rating system would further enrich this feature, enabling the system to learn from direct user input.

Looking ahead, another exciting direction is to evolve Ayuda into an Agentic AI-enabled platform. Imagine an autonomous Course Advisor Agent that actively assists students in planning their academic journeys, leveraging real-time database queries and insights from language models in a multi-step, interactive manner. Additionally, introducing a learning agent that adapts its recommendations based on continuous user feedback and performance metrics, rather than relying solely on manual oversight would make the system even more dynamic and effective.

By integrating these advanced features, Ayuda can become a comprehensive, intelligent academic advisor that continuously improves its guidance and empowers students to make better, more informed choices throughout their educational journey.

Repository

You can refer the code on GitHub:
Server Repository: https://github.com/Ayuda-ai/ayuda_server
UI Repository: https://github.com/Ayuda-ai/ayuda_ui

Ayuda - Course Recommendation System for University Students

Table of contents

Introduction

Motivation

Dataset

Approach / Solution

High-level Architecture and Workflow

Frontend

Backend

Data Storage

GenAI Integration

How It All Works

Choice of Algorithms and Models: Why They Matter

Semantic Similarity (Sentence Transformers + Pinecone)

Keyword Matching (Redis)

Hybrid Scoring

Prerequisite Checking (Neo4j Graph Queries)

LLM Reasoning (Ollama Llama3)

Technologies and Tools Behind Ayuda

Backend

Other Core Components

Unique Techniques or Innovations

Services and Components

Recommendation Engine

Why These Weights?

User Management & Resume Processing

Graph Database (Neo4j) Operations

Caching & Keyword Matching using Redis

AI Reasoning Engine using Ollama

Course Management Service

Results

Future Work

Repository

Subscribe to my newsletter

Kartikey Hebbar

Kartikey Hebbar