1 Introduction

In e-commerce, product titles are often written by sellers in a detailed and descriptive manner, including brand names, places of origin, attributes, and other qualifiers. Unlike English, which uses spaces to separate words, Chinese lacks clear word boundaries, making it harder to extract core product terms—the words that specify what the product actually is. For example, in the Chinese title “漂亮实用的自行车手套” (“beautiful and practical bicycle gloves”), the core term is “gloves”, not “bicycle“. Another challenge is that titles are often redundant and filled with noisy or irrelevant terms. Accurately identifying core terms is essential for improving search and recommendation systems by enabling better user preference modeling and more relevant results.

Named Entity Recognition (NER) is a natural language processing technique used to identify and extract specific types of entities from text, such as names of people, locations, organizations, and other proper nouns. NER typically relies on sequence labeling methods, a machine learning paradigm where a sequence of input tokens X is mapped to a sequence of output labels 𝑌, indicating the entity class of each token.

To represent the position of each token within a named entity, a common labeling scheme called BIEOS is often used. This notation includes five labels:

B (Begin): the beginning of an entity,
I (Intermediate): a token inside (but not at the beginning or end) of an entity,
E (End): the final token of a multi-token entity,
O (Other): a token that does not belong to any named entity,
S (Single): a single-token entity.

The labels are often combined with entity types such as PER (person), LOC (location), or ORG (organization). For example, in the Chinese sentence: "海钓比赛地点在厦门与金门之间的海域" ("The location of the sea fishing competition is in the waters between Xiamen and Kinmen"), the tokens are labeled as follows:

海 O
钓 O
比 O
赛 O
地 O
点 O
在 O
厦 B-LOC
门 E-LOC
与 O
金 B-LOC
门 E-LOC
之 O
间 O
的 O
海 O
域 O

This shows that “厦门” and “金门” are recognized as two separate location entities.

Another commonly used notation is BIO (Begin, Inside, Outside), which is a simplified version of BIEOS. BIO omits the End and Single labels, marking only the beginning and continuation of entities. The choice between BIEOS and BIO often depends on the specific NER task and model architecture, with BIEOS providing more granular information at the cost of increased complexity.

2 Dataset

To construct a high-quality dataset for core product term extraction, we selected product titles based on two key criteria: product category and order volume. The objective was to ensure broad coverage across major e-commerce categories while also prioritizing titles associated with a higher number of orders, thus capturing products that are more relevant and representative of real-world consumer behavior.

To enhance data quality, we applied preprocessing steps to filter out titles with excessive noise. Titles containing unusually short tokens or a high proportion of irrelevant or meaningless content were removed. This helped reduce ambiguity and improved the overall clarity of the entity boundaries during the annotation process.

After filtering, we curated a final dataset consisting of approximately 28,000 product titles. Each title was annotated using the BIEOS tagging scheme, which offers finer granularity and better entity boundary representation compared to simpler schemes like BIO.

The dataset was then divided into three subsets for model training and evaluation: training set (~90%), validation set (~1%), and test set (~9%). This distribution ensures that the model has ample data to learn from while retaining a robust and unbiased test set for performance evaluation.

3 Model design

The BiLSTM-CRF architecture has been widely used in early NER research [1], where BiLSTM captures bidirectional context and the CRF layer enforces label consistency (e.g., "I-" must follow "B-"). For Chinese NER, character-level embeddings are preferred over word-level ones due to the language's lack of explicit word boundaries.

The introduction of BERT [2] marked a shift in NER modeling. Unlike BiLSTM, which processes text in two separate directions, BERT captures bidirectional context natively and excels at modeling long-range dependencies. This has led to hybrid models like BERT + Bi-LSTM + CRF.

In our experiments, removing the Bi-LSTM layer from this setup resulted in only a 0.1 drop in F1 score, while significantly reducing parameters and speeding up inference. This indicates that BERT alone often provides sufficient contextual understanding for effective NER, making additional Bi-LSTM layers unnecessary in many cases.

4 Evaluation

Since our labeled dataset is not publicly available, we conducted a series of benchmark experiments on the Renmin Daily dataset, a widely used corpus for Chinese Named Entity Recognition (NER). This dataset includes annotations for three common entity types: Organization (ORG), Person (PER), and Location (LOC). For evaluation, we adopted standard metrics: Precision, Recall, and F1-score, and also calculated entity-level performance to assess the models more comprehensively. The experimental results for the three models—BiLSTM + CRF, BiLSTM + TimeDistributed Dense + CRF, and BERT + TimeDistributed Dense + CRF—are presented below:

	Precision	Recall	F1-score
LOC	0.8493	0.8653	0.8572
PER	0.9121	0.8604	0.8855
ORG	0.7898	0.8012	0.7955
micro avg	0.8462	0.8455	0.8458
macro avg	0.8473	0.8455	0.8461

	Precision	Recall	F1-score
LOC	0.8671	0.8586	0.8629
ORG	0.8168	0.7942	0.8053
PER	0.8837	0.8759	0.8798
micro avg	0.8568	0.8441	0.8504
macro avg	0.8565	0.8441	0.8503

	Precision	Recall	F1-score
LOC	0.9329	0.9426	0.9377
ORG	0.8918	0.8869	0.8894
PER	0.9751	0.9687	0.9719
micro avg	0.9313	0.9328	0.9320
macro avg	0.9313	0.9328	0.9320

To map BERT embeddings to label scores using a CRF layer, we need to apply the same projection to each time step for every sample. TimeDistributed(Dense(...)) ensures that the same Dense layer is applied independently to each word in the sequence.

The results clearly show that BERT significantly outperforms BiLSTM-based models across all metrics. This performance gain is largely attributed to BERT’s pretraining on large-scale corpora, which enables it to capture deep contextual representations with minimal task-specific labeled data.

In practice, BERT demonstrates strong performance even with limited annotated data, whereas BiLSTM models require more extensive labeled datasets to achieve comparable results. This highlights BERT's efficiency and effectiveness in low-resource NER scenarios, making it a favorable choice for real-world applications where annotated data is scarce.

5 References

[1] Lample, Guillaume, et al. "Neural architectures for named entity recognition." arXiv preprint arXiv:1603.01360 (2016).

[2] Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 2019.

How to Use NER to Extract Core Product Terms from E-Commerce Titles