Parallelization of Recurrent Neural Network-Based Equalizer for Coherent Optical Systems via Knowledge Distillation

Sasipim Srivallapanondh; Pedro J. Freire; Bernhard Spinnler; Nelson Costa; Antonio Napoli; Sergei K. Turitsyn; Jaroslaw E. Prilepsky

doi:10.1109/jlt.2023.3337604

Parallelization of Recurrent Neural Network-Based Equalizer for Coherent Optical Systems via Knowledge Distillation

Sasipim Srivallapanondh, Pedro J. Freire, Bernhard Spinnler, Nelson Costa, Antonio Napoli, Sergei K. Turitsyn, Jaroslaw E. Prilepsky

Research output: Contribution to journal › Article › peer-review

Abstract

The recurrent neural network (RNN)-based equalizers, especially the bidirectional long-short-term memory (biLSTM) structure, have already been proven to outperform the feed-forward NNs in nonlinear mitigation in coherent optical systems. However, the recurrent connections still prevent the computation from being fully parallelizable. To circumvent the non-parallelizability of recurrent-based equalizers, we propose, for the first time, knowledge distillation (KD) to recast the biLSTM into a parallelizable feed-forward 1D-convolutional NN structure. In this work, we applied KD to the cross-architecture regression problem, which is still in its infancy. We highlight how the KD helps the student's learning from the teacher in the regression problem. Additionally, we provide a comparative study of the performance of the NN-based equalizers for both the teacher and the students with different NN architectures. The performance comparison was carried out in terms of the Q-factor, inference speed, and computational complexity. The equalization performance was evaluated using both simulated and experimental data. The 1D-CNN outperformed other NN types as a student model with respect to the Q-factor. The proposed 1D-CNN showed a significant reduction in the inference time compared to the biLSTM while maintaining comparable performance in the experimental data and experiencing only a slight degradation in the Q-factor in the simulated data.

Original language	English
Number of pages	10
Journal	Journal of Lightwave Technology
Early online date	29 Nov 2023
DOIs	https://doi.org/10.1109/jlt.2023.3337604
Publication status	E-pub ahead of print - 29 Nov 2023

Bibliographical note

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0

Keywords

Artificial intelligence
machine learning
recurrent neural networks
parallelization
knowledge distillation
nonlinear equalizer
coherent detection

Access to Document

10.1109/jlt.2023.3337604Licence: CC BY 4.0

Srivallapanondh et al_AAM_Parallelization_of_Recurrent_Neural_Network-Based_Equalizer_for_Coherent_Optical_Systems_via_Knowledge_Distillation
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0
Accepted author manuscript, 852 KBLicence: CC BY 4.0

Cite this

@article{cb53b887a40648849c6538635c4e9dd7,

title = "Parallelization of Recurrent Neural Network-Based Equalizer for Coherent Optical Systems via Knowledge Distillation",

abstract = "The recurrent neural network (RNN)-based equalizers, especially the bidirectional long-short-term memory (biLSTM) structure, have already been proven to outperform the feed-forward NNs in nonlinear mitigation in coherent optical systems. However, the recurrent connections still prevent the computation from being fully parallelizable. To circumvent the non-parallelizability of recurrent-based equalizers, we propose, for the first time, knowledge distillation (KD) to recast the biLSTM into a parallelizable feed-forward 1D-convolutional NN structure. In this work, we applied KD to the cross-architecture regression problem, which is still in its infancy. We highlight how the KD helps the student's learning from the teacher in the regression problem. Additionally, we provide a comparative study of the performance of the NN-based equalizers for both the teacher and the students with different NN architectures. The performance comparison was carried out in terms of the Q-factor, inference speed, and computational complexity. The equalization performance was evaluated using both simulated and experimental data. The 1D-CNN outperformed other NN types as a student model with respect to the Q-factor. The proposed 1D-CNN showed a significant reduction in the inference time compared to the biLSTM while maintaining comparable performance in the experimental data and experiencing only a slight degradation in the Q-factor in the simulated data.",

keywords = "Artificial intelligence, machine learning, recurrent neural networks, parallelization, knowledge distillation, nonlinear equalizer, coherent detection",

author = "Sasipim Srivallapanondh and Freire, {Pedro J.} and Bernhard Spinnler and Nelson Costa and Antonio Napoli and Turitsyn, {Sergei K.} and Prilepsky, {Jaroslaw E.}",

note = "This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0",

year = "2023",

month = nov,

day = "29",

doi = "10.1109/jlt.2023.3337604",

language = "English",

journal = "Journal of Lightwave Technology",

issn = "0733-8724",

publisher = "IEEE",

}

TY - JOUR

T1 - Parallelization of Recurrent Neural Network-Based Equalizer for Coherent Optical Systems via Knowledge Distillation

AU - Srivallapanondh, Sasipim

AU - Freire, Pedro J.

AU - Spinnler, Bernhard

AU - Costa, Nelson

AU - Napoli, Antonio

AU - Turitsyn, Sergei K.

AU - Prilepsky, Jaroslaw E.

N1 - This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0

PY - 2023/11/29

Y1 - 2023/11/29

N2 - The recurrent neural network (RNN)-based equalizers, especially the bidirectional long-short-term memory (biLSTM) structure, have already been proven to outperform the feed-forward NNs in nonlinear mitigation in coherent optical systems. However, the recurrent connections still prevent the computation from being fully parallelizable. To circumvent the non-parallelizability of recurrent-based equalizers, we propose, for the first time, knowledge distillation (KD) to recast the biLSTM into a parallelizable feed-forward 1D-convolutional NN structure. In this work, we applied KD to the cross-architecture regression problem, which is still in its infancy. We highlight how the KD helps the student's learning from the teacher in the regression problem. Additionally, we provide a comparative study of the performance of the NN-based equalizers for both the teacher and the students with different NN architectures. The performance comparison was carried out in terms of the Q-factor, inference speed, and computational complexity. The equalization performance was evaluated using both simulated and experimental data. The 1D-CNN outperformed other NN types as a student model with respect to the Q-factor. The proposed 1D-CNN showed a significant reduction in the inference time compared to the biLSTM while maintaining comparable performance in the experimental data and experiencing only a slight degradation in the Q-factor in the simulated data.

AB - The recurrent neural network (RNN)-based equalizers, especially the bidirectional long-short-term memory (biLSTM) structure, have already been proven to outperform the feed-forward NNs in nonlinear mitigation in coherent optical systems. However, the recurrent connections still prevent the computation from being fully parallelizable. To circumvent the non-parallelizability of recurrent-based equalizers, we propose, for the first time, knowledge distillation (KD) to recast the biLSTM into a parallelizable feed-forward 1D-convolutional NN structure. In this work, we applied KD to the cross-architecture regression problem, which is still in its infancy. We highlight how the KD helps the student's learning from the teacher in the regression problem. Additionally, we provide a comparative study of the performance of the NN-based equalizers for both the teacher and the students with different NN architectures. The performance comparison was carried out in terms of the Q-factor, inference speed, and computational complexity. The equalization performance was evaluated using both simulated and experimental data. The 1D-CNN outperformed other NN types as a student model with respect to the Q-factor. The proposed 1D-CNN showed a significant reduction in the inference time compared to the biLSTM while maintaining comparable performance in the experimental data and experiencing only a slight degradation in the Q-factor in the simulated data.

KW - Artificial intelligence

KW - machine learning

KW - recurrent neural networks

KW - parallelization

KW - knowledge distillation

KW - nonlinear equalizer

KW - coherent detection

UR - https://ieeexplore.ieee.org/document/10333336

UR - http://www.scopus.com/inward/record.url?scp=85179102738&partnerID=8YFLogxK

U2 - 10.1109/jlt.2023.3337604

DO - 10.1109/jlt.2023.3337604

M3 - Article

SN - 0733-8724

JO - Journal of Lightwave Technology

JF - Journal of Lightwave Technology

ER -

Parallelization of Recurrent Neural Network-Based Equalizer for Coherent Optical Systems via Knowledge Distillation

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this