Theoretical Rule-Based AI Model Initial Structure

❗

Everything that will be discussed here is merely a theoretical description that may be right or wrong, as it has not yet been applied practically. The content of the article may change and be updated according to what will be reached later in this research.

Introduction

In the field of artificial intelligence, large language models (LLMs) such as GPT and BERT have become fundamental components of natural language processing (NLP). However, these models face inherent challenges related to interpretability and reliance on vast datasets. In this article, I will review the mechanism of current models and their shortcomings, then present a preliminary theoretical framework as a starting point for envisioning our research model, noting that more comprehensive details will be discussed in upcoming articles.

Mechanism of Current Models

Current models rely on transforming text into numerical representations through an analysis process, where patterns are extracted from massive training datasets. These representations are processed through complex systems to generate predictions for subsequent words or complete sentences based on probabilities, without relying on explicit semantic understanding.

Shortcomings of Current Models

Current models encounter several challenges, including:

Lack of Interpretability: They operate as opaque systems, making it difficult to trace the reasoning behind the selection of a specific word, particularly in critical applications such as medicine or law.
Reliance on Vast Datasets: They require enormous amounts of data for training, limiting their effectiveness in low-resource languages or specialized domains.
Hallucination: They may generate inaccurate or inconsistent information due to reliance on statistical patterns rather than logical knowledge.

Preliminary Theoretical Framework for Our New Model

We propose a theoretical model based on a structured approach to language understanding. Our research model requests the completion of a missing word, mimicking the word or token selection process that occurs in current models, but through our unique research method. The model analyzes text components: direct verbs or events as "functions" (e.g., runs, draws, etc.), nouns/entities as "classes" in one category, and other elements as distinct linguistic tools in another category. The latter serves as a means to assign values to entity properties, such as (sky : color = blue), and to set parameter values in events, such as (event of running(speed = 1 meter per second)). This assignment is determined by examining the context before and after the linguistic tool, as a single tool can have multiple uses depending on its surroundings. For example, "is" with a noun before it and a type designation after it, as in "John is a doctor" (John is a doctor), contrasts with its use as a property value assigner for an entity, like "apple is red" (the apple is red). The latter case is the correct scenario for "sky is blue" (the sky is blue), where the sky has a property type with a value of color. This property has default values determined by the event and its parameters that lead to that value, such as the sky being blue due to interactions of atmospheric gases with the time parameter set to daytime. It could also be red for the same event but with the time parameter set to pre-evening or due to the sunset event.

The model’s concept relies on organizing language through clear structures, where the "dir" column indicates the pathway of an event or entity within systems for two reasons:

Principle of Inheritance: From general to specialized systems, such as a human belonging to living organisms, then to the animal kingdom, vertebrates, mammals, and finally to the representation of a human, inheriting characteristics from broader systems, with a path like: living organisms/animals/vertebrates/mammals/.../human, used only when needed based on context.
Multiple Meanings: A single word may carry different meanings across various systems, such as "reflection," which means the redirection of light in physics and self-contemplation in psychology.

Dictionaries of entity and event labels, along with linguistic tools as guides or assigners, are core components for understanding and handling natural language, with their contents varying by language—especially linguistic tools—though the fundamental idea remains consistent. In contrast, the data graph structure (Graph Data Structure) remains uniform across all languages, defining the operational properties of entities and events, which are retrieved from it through language components, enabling our model to truly understand the input text, unlike current models. There is an external section addressing the creation of a rule-based machine learning model tailored to our system, as rule-based machine learning differs significantly from current machine learning methods. It may not be slower than LLMs, which require extensive processing before the final "guess" for each word, whereas our model retrieves content directly based on clear relationships.

Conclusion

While current models offer significant flexibility, their limitations call for exploring alternatives. The proposed theoretical model represents an initial step toward a more accurate and insightful artificial intelligence, with further development and details to be explored in future research.

Initial structure of the theoretical rule-based AI model

Table of contents