Coreference Resolution System

π Building a Python Coreference Resolution System from Scratch
π TL;DR
Developed a robust coreference resolution engine using spaCy, HuggingFace Transformers, and FastAPI
Created a simple Streamlit UI for real-time visualization
Integrated rule-based and neural methods, evaluation scripts, and test cases
Shared full-source code, instructions, and bonus features on GitHub β link here
π 1. Why Coreference Resolution?
Coreference resolution identifies when different words refer to the same entityβe.g., βAliceβ and βShe.β Itβs essential for natural language understanding, chatbots, summarization, and more.
π§± 2. Project Architecture
coreference-resolution-system/
β
βββ coref_model/
β βββ resolver.py # Core logic (ruleβbased + Transformers)
β
βββ api/
β βββ main.py # FastAPI service endpoint
β
βββ app/
β βββ main.py # Streamlit UI
β
βββ evaluation/
β βββ evaluate.py # Metrics (MUC, BΒ³, CEAF) + sample cases
β
βββ tests/
β βββ test_coref.py # pytest unit tests
β
βββ README.md # Setup & usage
π§ 3. Core Algorithm
Rule-based layers:
Gender & number agreement
Proximity scoring (closer mentions stronger link)
Neural layer (optional):
Span-based model over RoBERTa embeddings
Easy to enable for improved accuracy
Pipeline:
Tokenize & identify mentions using spaCy
Apply rule- or neural-based linking
Output chains of coreferenced spans
π§ͺ 4. Testing & Benchmarking
Unit tests included (pytest; test examples like βAlice β¦ Sheβ)
Evaluation script compares against benchmarks using MUC, BΒ³, CEAF metrics
Target: \>80% F1 on CoNLL-2012
π¨ 5. Interactive and API Access
Streamlit UI
Color-coded highlights of chains
Real-time UI via
streamlit run app/
main.py
FastAPI Endpoint
POST
/resolve
with JSON{"text": "..."}
returns coreference chainsDOCS auto-generated at
/docs
π οΈ 6. Setup & Usage
git clone https://github.com/your-gh/coreference-resolution-system.git
cd coreference-resolution-system
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Run UI:
streamlit run app/main.py
# Run API:
uvicorn api.main:app --reload
π 7. Results & Next Steps
Achieved solid baseline performance
Documented limitations (nested entities, pronoun ambiguity)
Roadmap:
Better neural model training
Multilingual expansion
Docker / CI/CD pipelines (Jenkins/GitHub Actions)
βοΈ 8. Final Thoughts
This project demonstrates how to build from rule-based heuristics to a neural coreference system, complete with visual tools and web interfaces. Itβs modular, extensible, and gives a real-world, hands-on grasp of NLP pipelines.
π¬ Want to try it out?
π GitHub repo: https://github.com/AyaanShaheer/coreference-resolution-system
ποΈ Happy to answer any questions in the comments!
Subscribe to my newsletter
Read articles from Ayaan Shaheer directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
