Coreference Resolution System

Ayaan ShaheerAyaan Shaheer
2 min read

πŸ” Building a Python Coreference Resolution System from Scratch

πŸš€ TL;DR

  • Developed a robust coreference resolution engine using spaCy, HuggingFace Transformers, and FastAPI

  • Created a simple Streamlit UI for real-time visualization

  • Integrated rule-based and neural methods, evaluation scripts, and test cases

  • Shared full-source code, instructions, and bonus features on GitHub ← link here


πŸ“Œ 1. Why Coreference Resolution?

Coreference resolution identifies when different words refer to the same entityβ€”e.g., β€œAlice” and β€œShe.” It’s essential for natural language understanding, chatbots, summarization, and more.


🧱 2. Project Architecture

coreference-resolution-system/
β”‚
β”œβ”€β”€ coref_model/
β”‚   └── resolver.py      # Core logic (rule‑based + Transformers)
β”‚
β”œβ”€β”€ api/
β”‚   └── main.py          # FastAPI service endpoint
β”‚
β”œβ”€β”€ app/
β”‚   └── main.py          # Streamlit UI
β”‚
β”œβ”€β”€ evaluation/
β”‚   └── evaluate.py      # Metrics (MUC, BΒ³, CEAF) + sample cases
β”‚
β”œβ”€β”€ tests/
β”‚   └── test_coref.py    # pytest unit tests
β”‚
└── README.md            # Setup & usage

🧠 3. Core Algorithm

Rule-based layers:

  • Gender & number agreement

  • Proximity scoring (closer mentions stronger link)

Neural layer (optional):

  • Span-based model over RoBERTa embeddings

  • Easy to enable for improved accuracy

Pipeline:

  1. Tokenize & identify mentions using spaCy

  2. Apply rule- or neural-based linking

  3. Output chains of coreferenced spans


πŸ§ͺ 4. Testing & Benchmarking

  • Unit tests included (pytest; test examples like β€œAlice … She”)

  • Evaluation script compares against benchmarks using MUC, BΒ³, CEAF metrics

  • Target: \>80% F1 on CoNLL-2012


🎨 5. Interactive and API Access

Streamlit UI

  • Color-coded highlights of chains

  • Real-time UI via streamlit run app/main.py

FastAPI Endpoint

  • POST /resolve with JSON {"text": "..."} returns coreference chains

  • DOCS auto-generated at /docs


πŸ› οΈ 6. Setup & Usage

git clone https://github.com/your-gh/coreference-resolution-system.git
cd coreference-resolution-system
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Run UI:
streamlit run app/main.py

# Run API:
uvicorn api.main:app --reload

πŸ“ˆ 7. Results & Next Steps

  • Achieved solid baseline performance

  • Documented limitations (nested entities, pronoun ambiguity)

  • Roadmap:

    • Better neural model training

    • Multilingual expansion

    • Docker / CI/CD pipelines (Jenkins/GitHub Actions)


✍️ 8. Final Thoughts

This project demonstrates how to build from rule-based heuristics to a neural coreference system, complete with visual tools and web interfaces. It’s modular, extensible, and gives a real-world, hands-on grasp of NLP pipelines.


πŸ’¬ Want to try it out?


0
Subscribe to my newsletter

Read articles from Ayaan Shaheer directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ayaan Shaheer
Ayaan Shaheer