thesis
declarative recall for approximate nearest neighbor vector search
Declarative recall is a mechanism to elimate parameter tuning from approximate nearest neighbor vector search algorithms, since it allows users to specify a desired recall level instead of trying to tune the various hyperparameters of the underlying algorithm. Recently, declarative recall has been provided by using early termination techniques, which allow the search to stop early when a certain recall level is reached.
my thesis work (May 2025 - present)
Our work on declarative recall is based recall prediction, a simple1 but powerful early termination idea that estimates the recall of a query at runtime, and allows the search to stop early when the desired recall is reached.
Our work constitutes the first general solution to the problem: works for k-NN graph and IVF indexes, supports all common distance measures, allows users to choose a different target recall for each query at query time, and users do not need to set any parameters.
Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better.
- DARTH: Declarative recall through early termination for approximate nearest neighbor search [paper] [code]Proceedings of the ACM on Management of Data, 2025Volume 3, number 4, pages 1--26. ACM New York, NY, USA.DARTH is the basis of the declarative recall mechanism that we are currently developing. It proposes a recall prediction model that estimates the recall of individual queries at runtime, and allows the search to terminate early when the desired recall is reached or surpassed. DARTH idea is general, can be applied to any index and distance measure, and elimates all hyperparameter tuning.
- DARTH+: Approximate Nearest Neighbor Search with Declarative Recall and Quality Guarantees [preprint]March 2026DARTH+ extends and optimizes DARTH. It significantly improves recall prediction quality and required training times of DARTH, while it also provides optional probabilistic quality guarantees for the predicted recall of each query, allowing for robust early termination.
- VADER: Declarative Recall for Filtered Vector SearchSubmitted, coming soon!Filtered vector search is extremely important for real-world applications, since it allows users to filter the search space based on certain attributes of the data. However, hyperparameter tuning for filtered vector search is even more difficult than for unfiltered vector search, since the filtering can significantly change the search space. VADER fixes that! Stay tuned for more details.
- ArXiv preprint, 2026.Learning-based methods for vector search usually require groundtruths. DaiSy, an open-source library for super-fast exact vector and data series similarity search, can be used to generate groundtruths for large datasets, and eliminates the overheads of brute-force search.
reading list
I am maintaining a github repo summarizing the latest and most interesting papers on early termination.