|
|
Speaker: Shangqian Gao Date: Nov 21, 2:15 – 3:05 pm Abstract: Routing has emerged as a fundamental mechanism for enabling efficiency, specialization, and adaptive computation in modern AI systems. At its core, routing seeks to dynamically allocate computation—whether by activating specific subnetworks, selecting among multiple experts or models—based on the characteristics of each input. In this presentation, we will discuss a series of recent works that explore routing from three perspectives: word/token-level routing, where fine-grained token representations guide expert construction and activation within a single Large Language Model (LLM); sentence/prompt-level routing, where semantic embeddings determine which expert or model to use within experts or a model zoo; and temporal-level routing, where temporal embeddings guide expert selection to capture the evolving complexity of the denoising trajectory in diffusion models. Across these settings, routing decisions are driven by distinct forms of representation—token features, prompt embeddings, and temporal signals—yet they share the common objective of tailoring computation to the specific difficulty and structure of the input. By examining routing in both LLMs and diffusion models, this presentation highlights a unified view: effective routing aligns computational resources with input-specific needs to achieve a maximal trade-off between costs and performance, whether at the token, sentence, or temporal level. Location: LOV 307 and ZOOM |