From Beginner to AI Enthusiast: My ML Journey in ML Summer School

No alternative text description for this image

Ever wondered how Netflix knows exactly what show you'll binge next, or how self-driving cars "see" the road? That's the magic of machine learning (ML for short) — a type of AI where computers spot patterns in data all on their own. If that sounds intriguing but a bit mysterious, you're in the right place!

I recently joined the Cohere Labs Open Science Community ML Summer School, a super accessible program designed for anyone curious about machine learning. As a complete beginner, I stumbled upon this program and immediately got excited and honestly, it turned out to be one of the best ways I could’ve spent my summer.

Every session was packed with cool ideas, exciting research paper reviews, and mind-blowing concepts. Sure, there were moments when I didn’t fully understand everything, but I learned to embrace the confusion, trust the process, and just keep exploring. And that made all the difference.

What is the Cohere Labs ML Summer School?

The Cohere Labs Open Science Community ML Summer School is based on a simple but powerful idea: machine learning should be accessible to anyone, no matter their background, location, or experience level. It’s about learning together, staying curious, and building with others.

This summer, Cohere Labs launched an amazing learning initiative featuring speakers from INRIA, Meta (FAIR), Google DeepMind, Cohere, and more. These are some of the leading minds in the field, and they shared insights on topics like foundation models, retrieval systems, multimodal learning, and even how AI can be used for social good.

The best part? It was completely open and beginner-friendly. Whether you were just getting started or already experimenting with models, there was something for everyone. At the end of the program, every participant received a digital certificate recognizing their participation.

But what really made it special was the community. Being part of a global group of learners who were all equally excited to explore ML made the experience even more inspiring. It wasn’t just about learning concepts — it was about growing together.

A Quick Look at the Sessions

The summer school was structured around a series of live sessions, each led by experts working at the cutting edge of machine learning. Every session brought something new — from core concepts to advanced techniques and made them surprisingly approachable.

Session 1: ML Math Refresher

Speaker: Katrina Lawrence
Applied Mathematician
Topic: Foundational Math for Machine Learning

We began the summer school with a back-to-basics session that felt essential -- especially for someone like me coming in without a strong math background.

Katrina walked us through the core mathematical concepts that underpin most machine learning algorithms:

Derivatives
Vector Calculus
Linear Algebra

What I appreciated most was how clearly she explained things. She emphasized that understanding these basics isn’t about memorizing formulas, but about building intuition. Whether it’s calculating gradients for optimization or working with matrices in neural networks, this session helped demystify the math behind ML.

If you're looking for an approachable way to refresh your math skills, check out her YouTube channel, Math Unlocked.

No alternative text description for this image

🔍 Session 2: Introduction to Embeddings & Retrieval

Speaker: Nils Reimers
VP of AI Search at Cohere
Topic: How Embeddings Power Modern Search Systems

In the second session, we dove into the world of embeddings and neural retrieval with Nils Reimers, who leads AI Search at Cohere.

This was a big shift from theory to real-world application. Nils explained how transformer-based models (like BERT) are used to generate embeddings — dense numerical representations that capture the meaning of text. These embeddings allow models to search and compare information in a far more nuanced way than traditional keyword methods.

Key concepts he covered:

Retriever + Ranker architecture
Dense vs. Sparse Embeddings
Challenges with limited labeled data
Context Engineering for better search relevance

What stood out most to me was how context engineering and smart architecture choices can make or break a search system. It was a powerful reminder that great models alone aren’t enough, how you use them really matters.

No alternative text description for this image

Session 3: Introduction to Transformers and the Evolution of Large Language Models

Speaker: Siddhant Gupta
NLP Community Lead, Cohere Labs
Topic: How Transformers Changed the Game in NLP

On Day 2, we explored the architecture that powers modern AI : Transformers in a session led by Siddhant Gupta, who leads the NLP community at Cohere Labs.

We started with a brief history of models that came before transformers:

RNNs (Recurrent Neural Networks): Designed for sequential data, but struggled with long-term memory.
LSTMs (Long Short-Term Memory networks): An improvement over RNNs with gated memory units for handling longer sequences better.

Then came the real highlight — understanding Transformers, which completely reshaped how language models work. Instead of relying on sequential processing, transformers introduced attention mechanisms, allowing models to process entire sequences in parallel while still maintaining context.

Key Components of a Transformer:

Embedding Layer: Converts tokens (words or subwords) into dense vectors.
Positional Encoding: Adds information about token order to embeddings.
Self-Attention & Multi-Head Attention: Enables the model to focus on relevant words throughout the sequence.
Feedforward Neural Networks: Processes the attended information.
Stacked Layers: Allow deeper understanding by layering multiple transformer blocks.

Concepts Covered:

Tokenization: Splitting text into smaller units like words or subwords.
Word Embeddings: Vector representations that help the model understand meaning and relationships.
Attention Mechanisms:
- Self-Attention: Each word attends to others in the same input.
- Cross-Attention: Used in models like encoder-decoder architectures.

Transformer Variants:

We also explored different transformer-based models and how they work:

BERT (Encoder-only): For tasks like classification, sentiment analysis, and question answering.
GPT (Decoder-only): Ideal for text generation and conversational AI.
RAG (Retrieval-Augmented Generation): Merges retrieval and generation, useful for providing accurate and up-to-date responses.

Siddhant did a great job breaking down what can be an overwhelming topic into something digestible. I particularly appreciated how he showed real-world applications like Google Docs suggestions and Gmail’s Smart Compose using BERT, or how GPT is behind models like ChatGPT.

If you're interested in going through the session slides, you can check them out here.

No alternative text description for this image

Session 3: Scaling Self-Supervised Learning for Vision — An Introduction to DINOv2

Speaker: Timothée Darcet
PhD Researcher, Meta AI (FAIR) & Inria
Topic: Self-Supervised Learning in Computer Vision with DINOv2

This session introduced us to the exciting world of self-supervised learning (SSL) in computer vision — a method where models learn to understand images without needing manually labeled data. Instead of relying on external annotations, these models generate their own pseudo-labels during training.

Timothée Darcet explained the motivation behind SSL and walked us through key techniques like contrastive learning and masked image modeling. The highlight was a deep dive into DINOv2, a cutting-edge SSL model used for learning high-quality visual representations.

Key Takeaways:

DINOv2 is trained on a curated dataset of 142 million images using a mix of loss functions (DINO, iBOT, COLIO).
It outperforms models like CLIP in tasks such as segmentation and feature extraction.
Its general-purpose nature makes it suitable for specialized domains, including medical imaging.
DINOv2 is particularly strong in feature map quality and interpretability, enabling precise image understanding without labels.

This session helped bridge the gap between complex CV models and practical applications, offering a fresh perspective on how vision models are evolving beyond supervised learning.

No alternative text description for this image

Session 4: A Temperature Check on Web Agents

Speaker: Lawrence Jang
Researcher at Meta
Topic: Autonomous Web Interaction with Language Models

With large language models gaining the ability to understand and generate text, the next frontier is getting them to act especially on the web. In this session, Lawrence Jang explored the emerging field of LLM-powered web agents, which can autonomously navigate websites, click buttons, scroll pages, and even fill out forms using natural language instructions.

Highlights:

WebArena was introduced as a benchmark, where humans achieve 80% task success, while LLM agents currently achieve only around 14%, highlighting how early the field still is.
Advanced benchmarks like VisualWebArena and VideoWebArena extend evaluation to visual and video-based tasks.
ICAL is one approach that uses human feedback to fine-tune web agents for better task performance.
The session addressed major challenges:
- Following instructions accurately
- Aligning text with visual content
- Memory and long-term planning
- Preventing hallucinations and ensuring safe behavior
We also got a glimpse into practical tools like LangChain, and discussions on future directions such as multi-agent systems, visual grounding, and ethical considerations.

Together, these two sessions(3 &4) gave us a look into the cutting-edge of AI - - from models that learn to see without supervision to agents that learn to act in the digital world. The possibilities, and the challenges, are both massive and inspiring.

No alternative text description for this image

Session 5: Test-Time Scaling Small LMs to o1 Level

Speaker: Isha Puri

AI PhD at MIT
Date: July 10, 2025

As large language models reach diminishing returns from scale, Isha Puri presented a compelling direction: achieving high performance at test-time without retraining. Her method rooted in particle-based inference and process reward models emphasizes diversity, balancing exploration and exploitation during decoding.

The results are remarkable: small models (1.5B parameters) were shown to outperform GPT-4o in just four inference rollouts. For 7B models, scaling up to o1-level capabilities took only 32 rollouts.

What stood out was how this technique bypasses the early pruning limitations of greedy or beam search. By unlocking latent capabilities through smarter inference rather than brute-force training, this approach opens the door to democratizing powerful LLM reasoning at lower cost, latency, and compute.

No alternative text description for this image

Session 6: Secret Life of Noise — Understanding Diffusion Models

Speaker: Gowthami Somepalli

Research Scientist at Adobe Firefly
Date: July 11, 2025

Gowthami walked us through the evolution of diffusion models — the engines behind modern generative art tools like Firefly. Beginning with DDPMs (Denoising Diffusion Probabilistic Models) and extending to DDIMs (Deterministic variants), the session was a deep dive into how structured noise can be harnessed to produce realistic, diverse outputs.

She clarified how noise schedules determine the quality and control of generated content, while also introducing flow matching, a deterministic framework offering more direct distribution transformation, potentially bridging the gap between variational autoencoders and diffusion models.

Notably, these models offer:

Superior sample quality over GANs
Stable training dynamics
Mathematical rigor
Inference-time flexibility, making them ideal for creative applications.

Session 7: Understanding Transformers via N-gram Statistics

Speaker: Timothy Nguyen

AI Researcher at Google DeepMind
Date: July 11, 2025

This session reimagined the transformer’s inner workings not as black boxes, but as statistical machines. Timothy Nguyen revealed how up to 79% of transformer predictions on the TinyStories dataset could be explained using optimal N-gram rules derived from training data.

Key takeaways:

Low-variance predictions align closely with N-gram patterns
Transformers exhibit curriculum-like learning, progressing from simple to complex rules
Introduced a novel, training-intrinsic metric to detect overfitting without needing a validation set

This reframing provides practical tools to better understand when LLMs memorize, generalize, or hallucinate, offering a statistically grounded perspective on model interpretability.

🔗 Research Paper

No alternative text description for this image

Session 8: Distributed Training in Machine Learning

Speaker: Arthur Douillard
Senior Researcher, Google DeepMind
Topic: Distributed Training Strategies for Large Language Models

In this session, Arthur Douillard took us behind the scenes of what it really takes to train large language models (LLMs). With their enormous size, these models can’t be trained on a single GPU, distributed training is essential. Arthur unpacked the core strategies used in practice today, like Fully Sharded Data Parallelism (FSDP), Tensor and Pipeline Parallelism, and Expert Parallelism.

What stood out was his dive into experimental methods like DiLoCo, SWARM, PowerSGD, and DeMo. These techniques aim to scale LLMs across devices even when they’re not co-located but often at the cost of some accuracy or performance. He also touched on the real-world challenges: GPU hardware failures, communication bottlenecks, and the inherent complexity of coordinating planetary-scale training. While we’re not fully there yet, we’re inching closer to a future where training across global clusters is a reality.

No alternative text description for this image

Session 9: Research Mentorship

Speaker: Sara Hooker
Head of Cohere Labs
Topic: Finding Meaningful Directions in ML Research

Sara Hooker’s mentorship session felt like a compass for anyone early in their ML research journey. She began with a reflection on the evolution of AI research, urging us to think deeply about how and why we choose problems to work on. Instead of chasing incremental papers or buzzwords, she encouraged us to:

Master a topic deeply and thoroughly
Collaborate openly and generously
Learn by teaching others
Constantly ask: "Is this scientifically meaningful?"

Sara also introduced the idea of a “third path” between academia and industry, represented by Cohere Labs and other open science communities. These spaces provide an alternative for those who want to contribute to cutting-edge research without being bound by the formal structures of universities or corporate labs. Her session was as inspiring as it was practical, offering a vision of research that is both rigorous and radically accessible.

Speakers: Madeline Smith & Brittawnya Prince
Team: Cohere Labs Operations
Topic: Building Community Through Open Science

To wrap up the summer school, Cohere Labs hosted a virtual social, an informal yet deeply meaningful session. It was a space for researchers from across the globe to connect, share stories, and brainstorm future ideas together. The event captured the spirit of open science: diverse voices, shared curiosity, and a collective drive to explore the unknown.

More than just a networking event, it felt like a celebration of everything we had learned, unlearned, and reimagined during the program. It was a fitting finale to a summer spent not just learning machine learning but living it as a collaborative, creative, and community-first endeavor.

Reflections: More Than Just a Summer School

Looking back, the Cohere Labs ML Summer School wasn’t just a series of lectures — it was a turning point. Coming in with beginner-level knowledge, I walked away not only understanding complex topics (maybe not always but still great learning) like self-supervised learning, distributed training, and tokenization but also feeling part of a vibrant open science community.

What stood out most was the spirit of accessibility. The sessions weren’t about gatekeeping knowledge but they were about opening doors. Each speaker, from leading researchers at Meta and DeepMind to pioneers at Cohere Labs, made the content feel approachable without watering it down.

I also learned that doing machine learning research isn't about knowing everything from the start — it’s about being curious, collaborative, and resilient. Whether it's contributing to open-source projects, diving deeper into topics like explainability or fairness, or just asking better questions, I now feel equipped to take meaningful next steps in my ML journey.

What’s Next?

This summer school planted the seed and now it’s up to me (and all of us who joined) to keep it growing. I’m planning to build hands-on projects, explore open research challenges, and stay connected with the community I’ve found here.

If you’ve ever felt like machine learning was too vast or too complex to dive into - trust me, you’re not alone. But with communities like Cohere Labs and the right mindset, you can absolutely get started.

Let the exploration continue 🚀

Explore More & Stay Connected

If you’re interested in watching the recorded sessions or learning more about the Cohere Labs Open Science Community, you can visit:
🔗 https://sites.google.com/cohere.com/coherelabs-community/community-programs/summer-school

🔗 https://cohere.com/research

They regularly host talks, reading groups, and other open learning initiatives - highly recommended for anyone passionate about ML and open science!

Feel free to connect with me if you’d like to discuss anything from the sessions, share ideas, or collaborate on projects.

📬 Connect with me on LinkedIn

My Journey Through Cohere Labs ML Summer School: From Beginner to AI Enthusiast

Table of contents

What is the Cohere Labs ML Summer School?