Arquibot: From Wikimedia Edits to Web Archives

Naomi IbeNaomi Ibe
3 min read

It's already Week 5 of my Outreachy internship at Wikimedia ,time really does fly when you're learning so much! In this blog, I’ll walk you through what I’ve been working on, it's a project called Arquibot. I’ll explain what Arquibot is, why it matters, and how it connects to the broader Wikimedia mission. Whether you're curious about Outreachy, bots, or just love behind-the-scenes tech, I hope this post gives you a glimpse into the exciting world I’ve been exploring.

Wikipedia articles rely heavily on external links for citations. But over time, many of these links break or disappear, leaving readers unable to check the sources. This link rot undermines Wikipedia’s reliability. Have you ever clicked a link on Wikipedia only to find it broken or missing? That’s called link rot—when web pages disappear or move, leaving references dead and information unverifiable. This is a huge challenge because Wikipedia depends on reliable sources to stay credible.

Our project, Arquibot, is designed to help fix this problem by automatically finding and archiving these external references. Arquibot scans recent article edits to find new links and saves copies of the URLs on these webpages using the Wayback Machine. This means that even if the original link later becomes inaccessible, the archived copy will always be accessible. Arquibot is part of a larger effort within the Wikimedia community to maintain the quality and verifiability of content. Once completed, it will work alongside other bots and tools that help with tasks like fixing vandalism or updating templates.

By automating the archiving of links, editors will benefit because they don’t have to manually archive every link they add. Readers will also benefit because references remain accessible, supporting Wikipedia’s credibility. In general, it helps preserve knowledge, ensuring that useful information doesn’t disappear just because a website changes or goes offline. Arquibot also logs key statistics, like how many links were checked and archived, so the community can track its impact.

For me, the most exciting part is that I am building a tool that would make a real difference in preserving knowledge for everyone. I’ve been learning a lot about how Portuguese Wikipedia templates work, how to interact with APIs like the MediaWiki API and Wayback Machine API, and how to write clean, maintainable Python code. It’s also inspiring to collaborate with a global community and contribute to a resource that would later be widely used in the community.

That said, it hasn’t been all smooth sailing. I’ve spent the past five weeks figuring out how to archive only the right URLs we need, how to get different APIs to work together, how to properly log bot activity and show real-time stats on a webpage, how to detect recent Wikipedia edits and extract reference links from them and how to retrieve and access the archived links. At first, I was completely lost, I didn’t know where to start. I had never heard of the Wayback Machine or citation templates like {{citar web}}. But with the guidance of my mentors, I broke the project down into smaller pieces and tackled each challenge one step at a time. Slowly, it started to make sense.

Being part of the Wikimedia community has been very rewarding for me, I have been able to push past the limits of what I thought I could do. I'm inspired by the global team behind it and proud to contribute to something so meaningful. Arquibot might seem like a small tool right now, but it'll play a big role in keeping Wikipedia reliable and trustworthy. I can’t wait to see how much more I’ll learn and build in the weeks ahead!

7
Subscribe to my newsletter

Read articles from Naomi Ibe directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Naomi Ibe
Naomi Ibe