The scale of vision models has increased dramatically in recent years, from tens of millions to hundreds of millions, or even billions, for Transformers. The first is that fine-tuning becomes more difficult, as huge model sizes can easily lead to overfitting in a typical size dataset, not to mention increased computational and storage costs.
There has been renewed interest in creating parameter-efficient tuning algorithms in recent years. The essential concept is to add a small trainable module to a large pre-trained model and modify its parameters only by maximizing task-specific losses such as cross-entropy for classification problems. Adapt, Low-Rank Adaptation (LoRA) and Visual Prompt Tuning are the most representative means (VPT). VPT adds extra tokens to the input of a Transformer block, which can be thought of as adding learnable “pixels”. An adapter is a bottleneck-shaped neural network added to the output of a network block; LoRA is a “residual” layer consisting of rank decomposition matrices, and VPT is a “residual” layer consisting of rank decomposition matrices.
The three effective parameter tuning approaches, on the other hand, have a handful of serious flaws. To begin with, none of the three techniques works well in all datasets. The results show that, for a given data set, an in-depth study of several tuning strategies is necessary in order to determine the best one. Second, the choice of model parameters, such as adapter feature dimension or token length in VPT, is found to affect performance.
In a recent paper, researchers from Nanyang Technological University in Singapore interpreted existing efficient parameter tuning approaches as prompt modules and proposed to use a neural architecture search (NAS) algorithm to automatically find the best prompt design from the data. They proposed the Neural prOmpt seArcH (NOAH) approach for large vision models, especially those incorporating the Transformer block. The subsumption of Adapter, LoRA and VPT in each Transformer block creates the search space.
The researchers tested a variety of vision datasets on a wide range of visual domains, including objects, scenes, textures, and satellite photographs. On 10 of the 19 datasets, NOAH significantly outperforms the individual prompt modules, while performance on the remaining datasets is extremely competitive. They also tested NOAH against handcrafted prompt modules in terms of quick learning and domain generalization, with the results confirming NOAH’s superiority.
Conclusion
In order to achieve a given learning capacity, the model size in neural networks has been expanded along with the dissemination of large-scale pre-training data. The significant increase in model size, on the other hand, has sparked interest in the development of efficient transfer learning approaches. Researchers at Nanyang Technological University in Singapore recently published data on how recently proposed efficient parameter tuning approaches, or prompt modules, work in computer vision tasks. The results also highlight a major problem: Manually designing an appropriate prompt module for any single downstream dataset is incredibly difficult. In order to fully utilize the potential of NOAH, more tags would be needed. For future study, the team intends to dig deeper into the mechanisms behind NOAH to better understand the exciting results and apply NOAH to areas beyond computer vision, such as natural language processing (NLP). ).
This Article is written as a summary article by Marktechpost Staff based on the paper 'Neural Prompt Search'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper, github. Please Don't Forget To Join Our ML Subreddit