Xin Jin1,∗ , Jingying Chi2 , Siwei Peng2 , Yulu Tian1 , Chaochen Ye1 and Xiaodong Li1,∗
1Beijing Electronic Science and Technology Institute, Beijing 100070, China
2Beijing University of Chemical and Technology, Beijing 100029, China
In this paper we investigate the image aesthetics classification problem, aka, automatically classifying an image into low or high aesthetic quality, which is quite a challenging problem beyond image recognition. Deep convolutional neural network (DCNN) methods have recently shown promising results for image aesthetics assessment. Currently, a powerful inception module is proposed which shows very high performance in object classification. However, the inception module has not been taken into consideration for the image aesthetics assessment problem. In this paper, we propose a novel DCNN structure codenamed ILGNet for image aesthetics classification, which introduces the Inception module and connects intermediate Local layers to the Global layer for the output. Besides, we use a pre-trained image classification CNN called GoogLeNet on the ImageNet dataset and fine tune our connected local and global layer on the large scale aesthetics assessment AVA dataset . The experimental results show that the proposed ILGNet outperforms the state of the art results in image aesthetics assessment in the AVA benchmark.
The architecture of the proposed ILGNet: Inception with connected Local and Global layers. We use one pre-treatment layer and three inception layers. The first two inception layers extract local features and the last one extracts global features. Recent work shows value in directly connecting intermediate layers to the output. Thus, we connect the two layers of local features to the layer of global features to form a concat layer of 1024 dimension to a full connected layer. The output layer is 1 dimension which indicate low or high aesthetic quality. The network is 13 layers deep when counting only layers with parameters (or 17 layers if we also count pooling). The labels (1)-(7) are used for the visualization in section IV.
The histogram/distribution of the mean scores and the number ofvotes per image in the AVA dataset.
The visualization results of the the weights of the first three convolutional layers. The labels of (1), (2), (3) correspond to the same labels in the third picture.
The visualization results of the the weights of the features extracted by our ILGNet in important layers for images with high (top) and low (bottom) labels. The labels of (1)-(7) correspond to the same labels in the third picture.
The Trained Models
The size of the trained model is above 500MB.
You can download them from the BaiduYun cloud disk or Google Drive:
Google Drive Links: