Deep Image Aesthetics Classification using Inception Modules and Fine-tuning Connected Layer

Xin Jin1,∗ , Jingying Chi2 , Siwei Peng2 , Yulu Tian1 , Chaochen Ye1 and Xiaodong Li1,∗

1Beijing Electronic Science and Technology Institute, Beijing 100070, China

2Beijing University of Chemical and Technology, Beijing 100029, China


   In this paper we investigate the image aesthetics classification problem, aka, automatically classifying an image into low or high aesthetic quality, which is quite a challenging problem beyond image recognition. Deep convolutional neural network (DCNN) methods have recently shown promising results for image aesthetics assessment. Currently, a powerful inception module is proposed which shows very high performance in object classification. However, the inception module has not been taken into consideration for the image aesthetics assessment problem. In this paper, we propose a novel DCNN structure codenamed ILGNet for image aesthetics classification, which introduces the Inception module and connects intermediate Local layers to the Global layer for the output. Besides, we use a pre-trained image classification CNN called GoogLeNet on the ImageNet dataset and fine tune our connected local and global layer on the large scale aesthetics assessment AVA dataset [1]. The experimental results show that the proposed ILGNet outperforms the state of the art results in image aesthetics assessment in the AVA benchmark.






The architecture of the proposed ILGNet: Inception with connected Local and Global layers. We use one pre-treatment layer and three inception layers. The first two inception layers extract local features and the last one extracts global features. Recent work shows value in directly connecting intermediate layers to the output. Thus, we connect the two layers of local features to the layer of global features to form a concat layer of 1024 dimension to a full connected layer. The output layer is 1 dimension which indicate low or high aesthetic quality. The network is 13 layers deep when counting only layers with parameters (or 17 layers if we also count pooling). The labels (1)-(7) are used for the visualization in section IV.



The histogram/distribution of the mean scores and the number ofvotes per image in the AVA dataset.


The visualization results of the the weights of the first three convolutional layers. The labels of (1), (2), (3) correspond to the same labels in the third picture.

20170116214709The visualization results of the the weights of the features extracted by our ILGNet in important layers for images with high (top) and low (bottom) labels. The labels of (1)-(7) correspond to the same labels in the third picture.


Paper and Codes

Xin Jin, Jingying Chi, Siwei Peng, Yulu Tian, Chaochen Ye and Xiaodong Li.The 8th International Conference on Wireless Communications and Signal Processing (WCSP), Yangzhou, China, 13-15 October, 2016.

pdf github
Paper Codes


  author    = {Xin Jin and
               Jingying Chi and
               Siwei Peng and
               Yulu Tian and
               Chaochen Ye and
               Xiaodong Li},
  title     = {Deep image aesthetics classification using inception modules and fine-tuning
               connected layer},
  booktitle = {8th International Conference on Wireless Communications {\&} Signal
               Processing, {WCSP} 2016, Yangzhou, China, October 13-15, 2016},
  pages     = {1--6},
  year      = {2016},
  crossref  = {DBLP:conf/wcsp/2016},
  url       = {},
  doi       = {10.1109/WCSP.2016.7752571},
  timestamp = {Fri, 26 May 2017 00:49:51 +0200},
  biburl    = {},
  bibsource = {dblp computer science bibliography,}

The Trained Models

The size of the trained model is above 500MB.

You can download them from the BaiduYun cloud disk or Google Drive:

baiduyun ILGnet-AVA1.caffemodel  ILGnet-AVA2.caffemodel
googledrive ILGnet-AVA1.caffemodel  ILGnet-AVA2.caffemodel


电子邮件地址不会被公开。 必填项已用*标注