Decoder-Only Transformers: The Brains Behind Generative AI, Large Language Models and Large Multimodal Models

Dishita Naik, Ishita Naik, Nitin Naik

Research output: Chapter in Book/Published conference outputConference publication

Abstract

The rise of creative machines is attributed to generative AI which enabled machines to create new contents. Wherein the introduction of the advanced neural network architecture known as a transformer revolutionized the landscape of generative AI. A transformer transforms one sequence into another sequence, and is primarily used in natural language processing and computer vision tasks. Which determines the relationship between tokens or words in a sequence to understand the context, while processing these tokens or words simultaneously. Transformers were built to resolve various issues of its previous neural networks, such as a Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM); and are now the brains behind majority of generative AI models, for example, most Large Language Models (LLMs) and Large Multimodal Models (LMMs). This paper will explain about the transformer and its architectural components and working. Subsequently, it will illustrate the decoder-only transformer architecture and its components and working, including the reason why this type of transformer architecture is used in most generative AI models such as majority of LLMs and LMMs.
Original languageEnglish
Title of host publicationContributions Presented at The International Conference on Computing, Communication, Cybersecurity and AI, July 3–4, 2024, London, UK: The C3AI 2024
EditorsNitin Naik, Paul Jenkins, Shaligram Prajapat, Paul Grace
Pages315-331
ISBN (Electronic)978-3-031-74443-3
DOIs
Publication statusPublished - 20 Dec 2024

Publication series

NameLecture Notes in Networks and Systems (LNNS)
PublisherSpringer Cham
Volume884
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Fingerprint

Dive into the research topics of 'Decoder-Only Transformers: The Brains Behind Generative AI, Large Language Models and Large Multimodal Models'. Together they form a unique fingerprint.

Cite this