TY - GEN
T1 - Decoder-Only Transformers: The Brains Behind Generative AI, Large Language Models and Large Multimodal Models
AU - Naik, Dishita
AU - Naik, Ishita
AU - Naik, Nitin
PY - 2024/12/20
Y1 - 2024/12/20
N2 - The rise of creative machines is attributed to generative AI which enabled machines to create new contents. Wherein the introduction of the advanced neural network architecture known as a transformer revolutionized the landscape of generative AI. A transformer transforms one sequence into another sequence, and is primarily used in natural language processing and computer vision tasks. Which determines the relationship between tokens or words in a sequence to understand the context, while processing these tokens or words simultaneously. Transformers were built to resolve various issues of its previous neural networks, such as a Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM); and are now the brains behind majority of generative AI models, for example, most Large Language Models (LLMs) and Large Multimodal Models (LMMs). This paper will explain about the transformer and its architectural components and working. Subsequently, it will illustrate the decoder-only transformer architecture and its components and working, including the reason why this type of transformer architecture is used in most generative AI models such as majority of LLMs and LMMs.
AB - The rise of creative machines is attributed to generative AI which enabled machines to create new contents. Wherein the introduction of the advanced neural network architecture known as a transformer revolutionized the landscape of generative AI. A transformer transforms one sequence into another sequence, and is primarily used in natural language processing and computer vision tasks. Which determines the relationship between tokens or words in a sequence to understand the context, while processing these tokens or words simultaneously. Transformers were built to resolve various issues of its previous neural networks, such as a Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM); and are now the brains behind majority of generative AI models, for example, most Large Language Models (LLMs) and Large Multimodal Models (LMMs). This paper will explain about the transformer and its architectural components and working. Subsequently, it will illustrate the decoder-only transformer architecture and its components and working, including the reason why this type of transformer architecture is used in most generative AI models such as majority of LLMs and LMMs.
UR - https://www.techrxiv.org/users/845749/articles/1240125-decoder-only-transformers-the-brains-behind-generative-ai-large-language-models-and-large-multimodal-models?commit=025665ebfdde27f88642ff31edef365c9a9087e7
UR - https://link.springer.com/chapter/10.1007/978-3-031-74443-3_19
U2 - 10.1007/978-3-031-74443-3_19
DO - 10.1007/978-3-031-74443-3_19
M3 - Conference publication
SN - 978-3-031-74442-6
T3 - Lecture Notes in Networks and Systems (LNNS)
SP - 315
EP - 331
BT - Contributions Presented at The International Conference on Computing, Communication, Cybersecurity and AI, July 3–4, 2024, London, UK: The C3AI 2024
A2 - Naik, Nitin
A2 - Jenkins, Paul
A2 - Prajapat, Shaligram
A2 - Grace, Paul
ER -