Available Thesis Topics

When applying for a thesis topic, follow the procedure described here and CC the advisor(s) in your email.

Click on a topic for more details.


Bachelor Topics

Master Topics

Vector databases are essential components of modern applications such as machine learning, pattern recognition, and recommendation systems. They are designed to efficiently store, index, and query high-dimensional vectors. Exact nearest neighbor search becomes inefficient for high-dimensional data, whereas Approximate Nearest Neighbor Search (ANNS) offers significant performance improvements for the retrieval process by finding near-optimal matches with reduced computational cost. This thesis aims to explore recent advances in ANNS systems, with a focus on disk-based solutions.

Goal & Steps:

  • Comprehensive literature review
    • Developing a thorough understanding of the basics of vector databases and ANNS
    • Reviewing state-of-the-art ANNS algorithms and systems, with a focus on disk-based schemes
    • Analyzing algorithmic and system-level optimizations
  • Benchmarking the state-of-the-art schemes
    • Identifying the common baseline and the non-GPU based schemes to evaluate
    • Conducting empirical benchmarking study of existing schemes on realistic system setups and workloads
  • Proposing novel schemes
    • Designing and implementing novel solutions and evaluating them against the existing solutions

Target: M.Sc. Students

Prerequisites & Considerations:

  • Proficiency in C/C++ & Python programming
  • Strong interest in system research, with focus on system optimizations for emerging applications such as machine learning
  • Required system setup & optimizations
  • Problem solving & research capability
  • For a master’s thesis, the expected outcome is a contribution of publishable quality, however, publishing the paper is not mandatory

To get more familiar with the topic, you can start with having a look at the following papers:

  • S. J. Subramanya, F. Devvrit, H. V. Simhadri, R. Krishnawamy, and R. Kadekodi, “DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node,” in NeurIPS, 2019.
  • H. Guo and Y. Lu, “Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD,” in USENIX OSDI, 2025.
  • H. Guo and Y. Lu, “OdinANN: Direct Insert for Consistently Stable Performance in Billion-Scale Graph-Based Vector Search”, in USENIX FAST, 2026.

Advisor: Mostafa Hadizadeh

Arancini is a hybrid binary translator developed in our group, that translates x86 binaries to Arm and RISC-V architectures. Currently, it only supports translating Linux applications. To increase its usability, we want to add support for Windows applications to be translated and executed on Linux platforms.

Goals of the thesis: In this context, you will add support for Windows applications in Arancini by:

  • Exploring existing Windows emulation layers such as Wine or Proton
  • Implementing the necessary system call translations in Arancini
  • Testing and evaluating the performance of Windows applications executed through Arancini

Target: Master

Prerequisites:

  • Programming language: C++
  • Previous knowledge in operating systems and system programming is appreciated

Advisor: Redha Gouicem

To reduce disk access latency, modern operating systems rely on page caches to keep frequently accessed data in memory.

In virtualized environments, however, modern hypervisors typically enforce static memory partitioning, allocating a fixed amount of RAM to each virtual machine (VM). If one VM becomes highly I/O-intensive, its guest page cache remains constrained by this fixed memory limit. Ideally, this I/O-bound VM should be able to utilize the unused memory of the host machine or idle neighboring VMs, preventing resource stranding.

Therefore, instead of each VM managing its own isolated page cache, the core idea behind this thesis is to bypass the guest page caches entirely and rely on a unified, dynamically shared page cache managed at the host level.

Goal of this thesis: This project offers significant room for independent research and architectural design. Your main objectives will be to:

  • Design and implement an optimized, host-level page cache shared across co-located VMs, exploring the best mechanisms to minimize latency and context-switching overhead.
  • Investigate and propose global page cache eviction policies, evaluating your own algorithms for managing memory pressure fairly under diverse workloads.
  • Explore architectural extensions, such as supporting efficient file sharing or memory deduplication, driving the project toward the optimizations you find most promising.

Target: M.Sc. students

Prerequisites:

  • Programming language: C
  • Linux Kernel Programming

Advisor: Jérôme Coquisart