Comparative study of famous deep learning papers

Introduction

In the rapid and exciting world of deep learning and computer vision, two certain foundational works truly set the standard for future research and build the path for exciting applications in Academia and industry. Two such impressive works are “ImageNet Classification with Deep Convolutional Neural Networks” and “Deep Residual Learning for Image Recognition.” For simplicity, we will refer to them as AlexNet (RA1) and ResNet (RA2).

RA1, known as the AlexNet, was not only the winner of the ImageNet competition in 2012 but also redefined what machines could identify patterns and classifications. ImageNet, created by the Stanford Vision Lab, Stanford University, and Princeton University, is the industry’s definitive image database, containing hundreds to thousands of images, and has had a very important effect on CV (computer vision) and DL (deep learning). RA2, commonly referred to as the ResNet, revolutionized the changed the architectures of neural networks by adding residual learning, which solved the problem of “Deeper neural networks are more difficult to train.” In this report, we will explore the distinct strategies and rhetorical approaches applied by ‘ImageNet Classification with Deep Convolutional Neural Networks’ and 'Deep Residual Learning for Image Recognition to achieve their research goal in the field of deep learning and computer vision.

Rhetorical situation

Unquestionably, deep learning has changed how machines see, recognize, and understand images and text. Two revolutionary works, ImageNet and ResNet have enhanced the improvement in this field. These seminal works were written for people who are studying or working in the fields of CV (computer vision), DL (deep learning), and NLP (Natural language processing). The authors in these essays use precise and deep technical language and clear ideas to let their audiences understand and use the results. The audience must be familiar with the concepts in the essay. Since the outcomes of both articles are remarkable, readers may enjoy reading the articles and engaging with the accompanying models. Both articles have similar goals, which are to demonstrate new methods, and architectures and prove their superiority. AlexNet proved that the CNN (Convolutional Neural Networks) can be effective, can be used on massive datasets, and get great results. At the same time, the importance of GPU acceleration training was also demonstrated. ResNet showed that Residual Learning can efficiently train very deep neural networks to improve model performance and expressiveness. The publication of both these papers in the CVPR (IEEE / CVF Computer Vision and Pattern Recognition Conference), which is the premier annual computer vision event (CVPR,2023) and it is one of the most important meetings in the field. This means that these papers have great ideas, peer-reviewed, and many people would read and talk about the articles. The purpose of the information presented in the genres in the two articles are similar, both articles were to inform peers and learners how they use their new ways to solve the computer vision classification questions and push the industry forward. They also discuss with peers and try to solve or optimize more problems so they can create a better model.

Rhetorical strategies

Ethos, pathos, and logos are very important for a strong argument. There are three verified persuasive methods used to influence readers. AlexNet and ResNet are very important essays in artificial intelligence fields, they also use these persuasive methods to enrich the reader’s experience and spread their groundbreaking ideas.

Ethos

Firstly, in terms of ethos discusses the authors’ authority which makes the audience easier to trust them. Both essays’ authors were coming from famous, leading academic universities and leading tech companies. AlexNet team all comes from UoT (University of Toronto) which is a QS (QS World University Rankings, 2024) top 30 university, this somehow shows the reputations. At the same time, the number of citations for this paper on Google Scholar was 142819 which is high, this on the other hand also showed that this essay is important and famous. (Google Scholar, 2023) ResNet’s team numbers were all coming from MSRA (Microsoft Research Asia), which is another popular and world-top research organization. Their essay is cited by 184618 on Google Scholar which has the highest number of cities in the 21st century. (Google Scholar, 2023) Their team’s organization and citations combined with their detailed essay somehow enhance the reader’s trust.

Pathos

Pathos appeals to emotions, there is some kind of connection between reader and writer. Humans are mostly emotional creatures; pathos can be a very powerful strategy in argument. (The Three Appeals of Argument) In these two essays, pathos primarily came from results, benchmark tables, and graphs. Just like the dramatic increase in the correct rate of CNN and the highest performance on ImageNet, the result of AlexNet and ResNet made the audience want to read more and let them become more interested in the specific process they did in the experiment. They will become more hopeful for the future of AGI (Artificial General Intelligence).

Logos

Logos use the power of logic which is very suitable for these computer vision technical articles. Both articles use many formulas, codes, images, graphs, and architecture graphs to help readers understand the technical details of the articles and how they get the results. For instance, the ResNet paper uses a training error line chart to illustrate the challenges of training complex, multi-layered neural networks. And AlexNet used a complex distributed Convolutional Neural Network Architecture diagram to clearly show how neural networks work. (It didn’t prove to be very useful for a long time, but now it’s useful again when training huge transformer models.) In summary, all these classic persuasive methods help authors spread their ideas more successfully and set up new standards for AI (Artificial intelligence).

Argument Organization

Certainly, both essays have different viewpoints and structures even though they have similar goals when it comes to results. Since they are all in the field of computer vision, both have used many architecture graphs and experimental data to demonstrate as evidence. AlexNet ‘s key point is to convince readers that their convolutional neural networks can GPU to train fast, use dropout to prevent overfitting, use ReLU (similar to Normalization) function to enhance the performance, and eventually got a great result when it comes to huge image recognition. Notably, they trained their whole model using the entire unsupervised ImageNet, an uncommon practice at the time. As for its conclusion, this essay does not have a conclusion, they basically just introduce the process of how they did the work, and there is no explanation of why this process is working, so the last part of the essay is more like a discussion. On the other hand, ResNet uses Residual Learning to successfully solve the network’s performance reduction problem when it comes to the deep neural network. Interestingly, this essay also does not have a conclusion, but the reason for that is the CVPR’s max page allowed is 8 pages so they cannot put a conclusion. (CVPR 2023 Submission Policies, 2023) Thankfully, they put most of the conclusive information in the introduction, separated with data and formulas.

Conclusion

To conclude, both AlexNet and ResNet really set up future research in the field of computer vision and deep learning. The precise use of ethos, pathos, and logos is very important for arguments, it can attract the audience’s attention, and lead to many discussions. In this report, we discussed two foundational essays, both use suitable genres to explain their work better. These papers show how the field does innovation, and clear communication, highlighting the beliefs of the computer vision and deep learning research community. Ultimately, we believe that continued research based on these papers will pave the way towards achieving Artificial General Intelligence (AGI).

However, our analysis does have some limits. We focused on just two papers, so there’s a chance we didn’t see all the writing styles and techniques used in the entire field. It would be good for future studies to look at more papers. Future research might benefit from considering larger and more kind of articles, offering a more overall view of rhetorical techniques in computer vision and deep learning areas.

Reference

[1] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

[2] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.

[3] Stanford Vision Lab, Stanford University, & Princeton University. (2021). ImageNet. https://www.image-net.org/index.php

[4] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2023, August 2). Attention is all you need. arXiv.org. https://arxiv.org/abs/1706.03762

[5] CVPR 2024. (n.d.). https://cvpr.thecvf.com/

[6] CVPR 2023. (n.d.). https://cvpr.thecvf.com/Conferences/2023/AuthorGuidelines

[7] Google scholar. (n.d.). https://scholar.google.com/

[8] 跟李沐学Ai. (2021, October 22). Resnet论文逐段精读. https://www.bilibili.com/video/BV1P3411y7nn/

[9] Three appeals argument - university writing center. (2023). https://uwc.cah.ucf.edu/wp-content/uploads/sites/9/2015/04/Three_Appeals_Argument.pdf

Version: 1.0

Banner: OPPO