3rd Workshop: Perspectives on the Evaluation of Recommender Systems
Workshop at ACM Recommender Systems 2023


Evaluation is essential when conducting rigorous research in recommender systems (RS). It may span the evaluation of early ideas and approaches up to elaborate systems in operation; it may target a wide spectrum of different aspects being evaluated. Naturally, we do (and have to) take various perspectives on the evaluation of RS. Thereby, the term “perspective” may, for instance, refer to various purposes of a RS, the various stakeholders affected by a RS, or the potential risks that ought to be minimized. Further, we have to consider that various methodological approaches and experimental designs represent different perspectives on evaluation. The perspective on the evaluation of RS may also be substantially characterized by the available resources. The access to resources will likely be different for PhD students compared to established researchers in industry.

The goal of the workshop is to capture the current state of evaluation, and gauge whether there is, or should be, a different target that RS evaluation should strive for. The workshop will address the question: where should we go from here as a community? and aims at coming up with concrete steps for action.

We have a particularly strong commitment to invite and integrate researchers at the beginning of their careers and want to equally integrate established researchers and practitioners, from industry and academia alike. It is our particular concern to give a voice to the various perspectives involved.

PERSPECTIVES has a very high level of interaction with discussions and group work.
PERSPECTIVES is open for everyone at RecSys to attend.

Call for Papers

Topics of interest include, but are not limited to, the following:

  • Case studies of difficult, hard-to-evaluate scenarios
  • Evaluations with contradicting results
  • Showcasing (structural) problems in RS evaluation
  • Integration of offline and online experiments
  • Multi-Stakeholder evaluation
  • Divergence between evaluation goals and what is actually captured by the evaluation
  • Nontrivial and unexpected experiences from practitioners

We deliberately solicit papers reporting problems and (negative) experiences regarding RS evaluation, as we consider the reflection on unsuccessful, inadequate, or insufficient evaluations as a fruitful source for yet another perspective on RS evaluation that can spark discussions at the workshop. This also includes papers reporting negative study results. Accordingly, submissions may also address the following themes:

(a) “lessons learned” from the successful application of RS evaluation or from “post mortem” analyses describing specific evaluation strategies that failed to uncover decisive elements,
(b) “overview papers” analyzing patterns of challenges or obstacles to evaluation,
(c) “solution papers” presenting solutions for specific evaluation scenarios, and
(d) “visionary papers” discussing novel and future evaluation aspects will also be considered.


We solicit two forms of contributions. First, we solicit paper submissions that will undergo peer review. Accepted papers will be published and presented at the workshop. Second, we offer the opportunity to present ideas without a paper submission. In this case, we call for the submission of abstracts which will be reviewed by the workshop organizers. Accepted abstracts will be presented at the workshop, but not published.

Paper Submissions

We solicit papers with 4 up to 12 pages (excluding references). We do not distinguish between full and short (or position) papers. Papers should be formatted in CEURART’s single-column template:, which is also available as an Overleaf template:

Submitted papers must not be under review in any other conference, workshop, or journal at the time of submission. Papers must be submitted through the workshop’s EasyChair page at

Submissions will undergo single-blind peer review by at least three program committee members and will be selected based on quality, novelty, clarity, and relevance. Authors of accepted papers will be invited to present their work as part of the workshop and will be published as open-access workshop proceedings via At least one author of each accepted paper must attend the workshop and present the work.

Abstract Submissions

We solicit abstracts with 200-350 words, submitted through the workshop’s EasyChair page at

The workshop organizers will select abstracts based on quality, clarity, relevance, and their potential to spark interesting discussions during the workshop. Authors of accepted abstracts will be invited to present their work during the workshop.

Important Dates

  • Paper submission: July 21st, 2023 AoE extended deadline: July 26th, 2023 AoE
  • Author notification: August 11th, 2023
  • Camera-ready version: September 1st, 2023
  • Workshop: September 19th, 2023, 9:00am to 12:35pm, Singapore local time (remote participation possible)


Thursday, September 19th, 2023, 09:00-12:35 (hybrid)

09:00-09:20 Welcome & Introduction
09:20-10:10 Keynote & Q&A: Noam Koenigstein
10:10-10:30 Topic pitch (5 min pitch, 5 min questions)
10:30-11:00 Break
11:00-12:15 Group discussions on site (breakout rooms on Zoom)
12:10-12:30 Wrap up

Times are in Singapore Standard Time (SGT) (Singapore, local time).


Keynote: Teaching Algorithms to Explain Recommender Systems: A Counterfactual Evaluation Approach

Noam Koenigstein

Abstract In this talk, I will introduce the Learning to eXplain Recommendations (LXR) framework, a model-agnostic, post-hoc framework to explain recommender systems. LXR can work with any differentiable recommender and learns to score the importance of users' personal data with respect to a recommended item. The framework’s objective employs a novel counterfactual loss function that aims to identify the user data that best explains the item’s recommendation. Additionally, in order to evaluate LXR, we propose several new evaluation metrics, inspired by saliency map evaluation in computer vision, for assessing explanations in recommender systems. Assessing explanations in artificial intelligence, especially those pertaining to recommender systems, is an emerging area of study that currently lacks universally accepted metrics or standardized test sets. Our evaluations are based on counterfactual test that attempt to gauge the impact of the provided “explanations” on the ultimate recommendations. LXR’s code is publicly available at

Dr. Noam Koenigstein earned his B.Sc. in Computer Science with honors from the Technion – Israel Institute of Technology in Haifa in 2007. He followed this with an M.Sc. in Electrical Engineering from Tel-Aviv University in 2009 and a Ph.D. from the same institution in 2013. Noam began his professional career at Microsoft in 2011, joining the Xbox Machine Learning research team where he was instrumental in developing the recommendation algorithm that now serves millions of users globally. He later led Microsoft Store’s recommendation research team.

In 2017, he transitioned to the financial sector, taking on the role of Senior Vice President and Head of Data Science at Citi Bank’s Israeli Innovation Lab. At Citi, he directed all data science initiatives at the Israeli research facility. The following year, Noam returned to academia, joining Tel Aviv University’s Industrial Engineering Department as an Assistant Professor (Senior Lecturer). He currently spearheads the DELTA Lab, focusing on the practical application of deep-learning technologies with a special emphasis on recommender systems. Under his guidance, students explore a wide range of machine learning applications across various real-world challenges.

Accepted Contributions

All Teaser Videos on a single page.

Proceedings (

Accepted Papers

Multiobjective Hyperparameter Optimization of Recommender Systems
Marta Moscati, Yashar Deldjoo, Giulio Davide Carparelli, Markus Schedl

Annotation Practices in Societally Impactful Machine Learning Applications: What are Popular Recommender Systems Models Actually Trained On?
Andra-Georgiana Sav, Andrew M. Demetriou, Cynthia C. S. Liem

Exploring Effect-Size-Based Meta-Analysis for Multi-Dataset Evaluation
Mete Sertkan, Sophia Althammer, Sebastian Hofstätter, Peter Knees, Julia Neidhardt

Unveiling Challenging Cases in Text-based Recommender Systems
Ghazaleh Haratinezhad Torbati, Anna Tigunova, Gerhard Weikum

The Effect of Random Seeds for Data Splitting on Recommendation Accuracy
Lukas Wegmeth, Tobias Vente, Lennart Purucker, Joeran Beel

Accepted Abstract

A Common Misassumption in Online Experiments with Machine Learning Models
Olivier Jeunen

Program Committee

Workshop Chairs

Program Committee

  • Joeran Beel (University of Siegen, Germany)
  • Li Chen (Hong Kong Baptist University, China)
  • Michael Ekstrand (Boise State University, USA)
  • Mehdi Elahi (University of Bergen, Norway)
  • Andrés Ferraro (Pandora-SiriusXM, USA)
  • Hanna Hauptmann (Utrecht University, The Netherlands)
  • Dietmar Jannach (University of Klagenfurt, Austria)
  • Olivier Jeunen (ShareChat, UK)
  • Mesut Kaya (Aalborg University Copenhagen, Denmark)
  • Jaehun Kim (Pandora-SiriusXM, USA)
  • Bart Knijnenburg (Clemson University, USA)
  • Dominik Kowald (Know-Center Graz, Austria)
  • Lien Michiels (University of Antwerp, Belgium)
  • Julia Neidhardt (TU Wien, Austria)
  • Maria Soledad Pera (TU Delft, The Netherlands)
  • Lorenzo Porcaro (Joint Research Centre (European Commission), Italy)