Pruneformer

System

NSC Web

Front Page

Getting Access

Support Email

support@nsc.liu.se

Feedback

Give Feedback

Pruneformer

Title:	Pruneformer
DNr:	Berzelius-2022-27
Project Type:	LiU Berzelius
Principal Investigator:	Ulme Wennberg <ulme.wennberg@gmail.com>
Affiliation:	Kungliga Tekniska högskolan
Duration:	2022-02-14 – 2022-09-01
Classification:	10208
Keywords:

Abstract

Original self-attention as proposed Vaswani et al. is quadratic in sequence length. This puts a practical cap on the sequence length that can be used, as it makes training large language models for long sequences cumbersome. Pruneformer is, to our knowledge, the first transformer variant that simultaneously models both local and global attention patterns, while having memory usage scale linearly with sequence length, and its number of parameters be independent of sequence length. The idea is to combine two approaches: - O(n) self attention - extrapolation abilities (which is something that almost all language models fail terribly with).

National Supercomputer Centre at Linköping University

Abstract