Navigation auf uzh.ch
Time & Location: every 2-3 weeks on Tuesdays from 10:15 am to 12:00 pm in room BIN-2-A.10.
Please note that the room has changed from the previous semester.
Online participation via the MS Teams Team CL Colloquium is also possible.
Responsible: Sina Ahmadi
17.09.2024 | Dr. Gail Weiss (EPFL) | |
01.10.2024 | Dr. Yingqiang Gao | Dr. Sina Ahmadi |
15.10.2024 | Cui Ding | Dr. Jannis Vamvas |
29.10.2024 | Patrick Haller | Dr. Reto Gubelmann & Ghassen Karray |
12.11.2024 | Andrianos Michail | Sant Muniesa |
26.11.2024 | Jan Brasser | Lucas Möller (Universität Stuttgart) |
10.12.2024 | Prof. Dr. Sarah Ebling & IICT team | Michelle Wastl |
With the help of the RASP programming language, we can better imagine how transformers---the powerful attention based sequence processing architecture---solve certain tasks. Some tasks, such as simply repeating or reversing an input sequence, have reasonably straightforward solutions, but many others are more difficult. To unlock a fuller intuition of what can and cannot be achieved with transformers, we must understand not just the RASP operations but also how to use them effectively. In this session, I would like to discuss some useful tricks with you in more detail. How is the powerful selector_width operation yielded from the true RASP operations? How can a fixed-depth RASP program perform arbitrary length long-addition, despite the equally large number of potential carry operations such a computation entails? How might a transformer perform in-context reasoning? And are any of these solutions reasonable, i.e., realisable in practice? I will begin with a brief introduction of the base RASP operations to ground our discussion, and then walk us through several interesting task solutions. Following this, and armed with this deeper intuition of how transformers solve several tasks, we will conclude with a discussion of what this implies for how knowledge and computations must spread out in transformer layers and embeddings in practice.
In today's talk, we show the critical role of well-structured scientific texts in writing arguments, which enhances clarity and reduces misinformation while promoting knowledge dissemination. We identified the challenges researchers face in maintaining coherence and factual accuracy during the writing process, highlighting the need for automation through AI-driven tools that integrate text retrieval and generation. Despite advancements in Natural Language Processing and Large Language Models, effective scientific writing assistants face hurdles, particularly in automatic text alignment and the reliability of generated content. To address these issues, we investigated empirical unsupervised methods for retrieving, aligning, and generating arguments in scientific documents, culminating in the development of a web application that applies these argument mining techniques.
Lexical borrowing, the adoption of words from one language into another, is a ubiquitous linguistic phenomenon influenced by geopolitical, societal, and technological factors. This talk explores lexical borrowing from a computational linguistics perspective. I present our effort to create a novel contrastive dataset comprising sentences with and without loanwords, designed to evaluate the impact of borrowings. Using this dataset, the performance of state-of-the-art machine translation and pretrained language models is assessed, quantifying their behavior and robustness in the presence and absence of loanwords. Our findings provide valuable insights into the challenges lexical borrowing poses for computational models and offer extensive analysis in multilingual contexts.
Psycholinguistic theories traditionally assume similar cognitive mechanisms across different speakers. However, researchers have recently begun to recognize the need to account for individual differences that must be considered when explaining human cognition. To address this issue, an increasing body of work is investigating how individual differences interact with human sentence processing. Implicitly, these studies assume that individual effects are replicable over experimental sessions and that the method of assessment (e.g., ET vs SPR) is interchangeable. However, as noted in the reliability paradox (Hedge et al., 2018), this assumption is unwarranted. A crucial first step for a principled investigation of individual differences in sentence processing is establishing their measurement reliability, that is, the correlation of individual-level effects across multiple experimental sessions and methodological contexts. In this talk, I present the first German naturalistic reading corpus with four experimental sessions from each participant (two eye-tracking and two self-paced reading sessions), including a comprehensive assessment of participants' cognitive capacities and reading skills. I deploy a two-task Bayesian hierarchical model to assess the measurement reliability of individual differences among a range of effects in response to predictors of sentence processing difficulty that are well-established at the population level.
I am introducing a new research project called «InvestigaDiff», which aims to enable synchronization of documents across different languages. Inspired by how programmers use diff tools to highlight changes in code, we are exploring whether similar concepts can be applied to natural language texts, even when they are in different languages. One research direction involves representation learning at the token level. I will present an idea for an approach that uses soft prompts to guide an LLM in rewriting one text into the other, with these soft prompts serving as the vector representations of textual difference.
A growing body of work has been querying LLMs with questionnaires developed for human respondents to evaluate their potential biases, such as political or cultural biases. In this talk, I will present two projects aimed at studying the stability and reliability of these evaluations in the context of political questions. The first project investigates response stability in language models by probing LMs using 500 paraphrases per question to assess variability and structural biases. The second project introduces Questionnaire Modeling, a new probing task that incorporates human survey data as in-context examples to improve the stability of bias evaluation.
We present research examining the political bias and the brittleness of LLMs in NLI. We first distinguish the concept of a strict, un-political notion of formal validity from notions of material and informal validity that are inherently perspectival. The article then assesses state-of-the-art LLMs regarding their political bias and brittleness in judging the validity or quality of such inferences. We run all experiments in English with samples from American politics as well as in German with sample arguments from Swiss politics. Our results show that the models exhibit bias in English, which can be mitigated with few-shot-prompts, as well as substantial brittleness, which, in contrast, increases with few-shot prompting.
The task of determining whether two texts are paraphrases has long been a challenge in NLP. However, the prevailing notion of paraphrase is often quite simplistic, offering only a limited view of the vast spectrum of paraphrase phenomena. Indeed, we find that evaluating models in a paraphrase dataset can leave uncertainty about their true semantic understanding. To alleviate this, we release paraphrasus, a benchmark designed for multi-dimensional assessment of paraphrase detection models and finer model selection. We find that paraphrase detection models under a fine-grained evaluation lens exhibit trade-offs that cannot be captured through a single classification dataset.
TBA
Siamese encoder models such as sentence transformers (SBERT) learn similarities between two inputs. They have proven to generate highly generalizable embeddings for tasks including semantic textual similarity, information retrieval, classification and clustering. However, little is known about how they actually compare inputs. A barrier is that similarities depend on feature-interactions rather than individual features alone. Therefore, common feature attribution methods do not function for this model class. To address this gap in our recent work, we have derived a local attribution method especially for Siamese encoders. The output takes the form of a token–token matrix and points out which token-pairs from the two inputs are important for an individual prediction. Applying it to SBERT models, we gain insights into which parts of speech and syntactic roles these models attend to, confirm that they mostly ignore negation, explore how they judge semantically opposite adjectives, and find that they exhibit lexical bias. In a collaboration with UZH, we are now looking into multi-lingual models and first results indicate that these models can learn strong cross-lingual alignment abilities despite their simple contrastive training objective.
TBA
TBA