Perspectives on the Evaluation of Recommender Systems Workshop at ACM Recommender Systems 2021
Evaluation is essential when conducting rigorous research in recommender systems (RS). It may span the evaluation of early ideas and approaches up to elaborate systems in operation; it may target a wide spectrum of different aspects being evaluated. Naturally, we do (and have to) take various perspectives on the evaluation of RS. Thereby, the term “perspective” may, for instance, refer to various purposes of a RS, the various stakeholders affected by a RS, or the potential risks that ought to be minimized. Further, we have to consider that various methodological approaches and experimental designs represent different perspectives on evaluation. The perspective on the evaluation of RS may also be substantially characterized by the available resources. The access to resources will likely be different for PhD students compared to established researchers in industry.
Acknowledging that there are various perspectives on the evaluation of RS, we want to put into discussion whether there is a “golden standard” for the evaluation of RS, and—if so—if it indeed is “golden” in any sense. We postulate that the various perspectives are valid and reasonable, and aim to reach out to the community to discuss and reason about.
The goal of the workshop is to capture the current state of evaluation, and gauge whether there is, or should be, a different target that RS evaluation should strive for. The workshop will address the question: where should we go from here as a community? and aims at coming up with concrete steps for action.
We have a particularly strong commitment to invite and integrate researchers at the beginning of their careers and want to equally integrate established researchers and practitioners, from industry and academia alike. It is our particular concern to give a voice to the various perspectives involved.
Topics of interest include, but are not limited to, the following:
We deliberately solicit papers reporting problems and (negative) experiences regarding RS evaluation, as we consider the reflection on unsuccessful, inadequate or insufficient evaluations as a fruitful source for yet another perspective on RS evaluation that can spark discussions at the workshop. Accordingly, submissions may also address the following themes:
(a) “lessons learned” from the successful application of RS evaluation or from “post mortem” analyses describing specific evaluation strategies that failed to uncover decisive elements,
(b) “overview papers” analyzing patterns of challenges or obstacles to evaluation,
(c) “solution papers” presenting solutions for specific evaluation scenarios, and
(d) “visionary papers” discussing novel and future evaluation aspects will be considered as well.
We solicit two forms of contributions. First, we solicit paper submissions that will undergo peer review. Accepted papers will be published and presented at the workshop. Second, we offer the opportunity to present ideas without a paper submission. In this case, we call for the submission of abstracts that will be reviewed by the workshop organizers. Accepted abstracts will be presented at the workshop, but not published.
We solicit papers with 4 up to 10 pages (excluding references). Along the lines of this year’s call for papers of the main conference, we do not distinguish between full and short (or position) papers. Papers should be formatted in the new ACM single-column format, following the official templates:
\documentclass[manuscript]{acmart}
command to generate the output in a single-column formatSubmitted papers must not be under review in any other conference, workshop, or journal at the time of submission. Papers should be submitted through the workshop’s EasyChair page at https://easychair.org/conferences/?conf=perspectives2021.
Submissions will undergo single-blind peer review by a minimum of three program committee members and will be selected based on quality, novelty, clarity, and relevance. Authors of accepted papers will be invited to present their work during the workshop and will be published as open access workshop proceedings via ceur-ws.org. At least one author of each accepted paper must attend the workshop and present the work.
We solicit abstracts with 200-350 words, to be submitted through the workshop’s EasyChair page at https://easychair.org/conferences/?conf=perspectives2021.
The workshop organizers will select abstracts based on quality, clarity, relevance, and their potential to spark interesting discussion during the workshop. Authors of accepted abstracts will be invited to present their work during the workshop.
Please watch the videos of the accepted papers before the workshop takes place. We will focus on discussion at the workshop.
15.00-15.10 Welcome
15.10-15.45 Keynote: Recommender system evaluation: One gold standard, but no silver bullets by Zeno Gantner, Zalando
15.45-16.00 Break
16.00-16.45 Discussions in break-out rooms
16.45-17.15 Break
17.15-17.30 Evaluating Recommenders with Distributions (Michael D. Ekstrand, Ben Carterette, Fernando Diaz)
17.30-18.00 General discussions
Times are in CEST (Amsterdam local time).
We will have an on-site meeting where we’ll (informally) discuss open issues and problems regarding RecSys evaluation.
Zeno Gantner, Zalando
Abstract
In the field of recommender systems, we have a large and diverse set of evaluation methods at our disposal for both academic research and industrial applications.
Randomized controlled trials in the form of online A/B tests are widely accepted for data-driven decision making, but because of their cost in terms of time and effort they cannot support every single decision. We need different methods for different scenarios.
I will present a case study on controlling the effects of different levels of exploration between control and treatment group in an online A/B test.
Besides that, I will also talk about so-called diff-testing, which is a set of methods that allow estimating the impact of a change without relying on annotations or user feedback. Diff-testing is not covered much in the literature, but can be a valuable addition to a practitioner’s toolkit of evaluation methods.
Bio
Zeno Gantner is a principal applied scientist at Zalando, responsible for the area of fashion recommendations. He has more than 10 years of industry experience implementing, running, and improving ML-based production services for millions of users. His publications on diverse AI topics such as applied machine learning, knowledge representation, reasoning, and recommender systems, have been cited more than 6,000 times according to Google Scholar. Zeno has contributed to more than a dozen Free Software/Open Source projects, and has been a contributor to the online encyclopedia Wikipedia since 2002. The first ACM RecSys conference he participated in was 2009 in New York.
All Teaser Videos on a single page.
Extended abstract about the workshop as part of the RecSys proceedings.
Proceedings (CEUr-WS.org).
Coupled or Decoupled Evaluation for Group Recommendation Methods?
Ladislav Peska and Ladislav Maleček
Video
Evaluating recommender systems with and for children: towards a multi-perspective framework
Emilia Gómez, Vicky Charisi and Stephane Chaudron
Video
MOCHI: an Offline Evaluation Framework for Educational Recommendations
Chunpai Wang, Shaghayegh Sahebi and Peter Brusilovsky
Video
Modeling Online Behavior in Recommender Systems: The Importance of Temporal Context
Milena Filipovic, Blagoj Mitrevski, Diego Antognini, Emma Lejal Glaude, Boi Faltings and Claudiu Musat
Video
On Evaluating Session-Based Recommendation with Implicit Feedback
Fernando Diaz
Video
Prediction Accuracy and Autonomy
Anton Angwald, Kalle Areskoug and Alan Said
Video
Recommender systems meet species distribution modelling
Indre Zliobaite
Video
Sequence or Pseudo-Sequence? An Analysis of Sequential Recommendation Datasets
Daniel Woolridge, Sean Wilner and Madeleine Glick
Video
Statistical Inference: The Missing Piece of RecSys Experiment Reliability Discourse
Ngozi Ihemelandu and Michael Ekstrand
Video
Time-dependent Evaluation of Recommender Systems (Best Paper Award)
Teresa Scheidt and Joeran Beel
Video
Toward Benchmarking Group Explanations: Evaluating the Effect of Aggregation Strategies versus Explanation
Francesco Barile, Shabnam Najafian, Tim Draws, Oana Inel, Alisa Rieger, Rishav Hada and Nava Tintarev
Video
Unboxing the Algorithm with Understandability: On Algorithmic Experience in Music Recommender Systems
Anna Marie Schröder and Maliheh Ghajargar
Video
Evaluating Recommenders with Distributions
Michael D. Ekstrand, Ben Carterette and Fernando Diaz
Slides