基于模糊气候聚类和改进BP神经网络的建筑气候数据清洗方法

(1.广州美术学院 建筑艺术设计学院,广东 广州 510006; 2.香港中文大学(深圳)深圳高等金融研究院,广东 深圳 518000)

建筑节能; 气候数据; 数据清洗; 自适应聚类; BP神经网络

Building climate data cleaning method based on fuzzy climate clustering and improved BP neural network
LIN Kangqiang1, LIN Yusong2*

(1.School of Architecture and Applied Art, Guangzhou Academy of Fine Arts, Guangzhou 510006, China; 2.Shenzhen Finance Institute,The Chinese University of Hong Kong(Shenzhen),Shenzhen 518000, China)

building energy saving; climate data; data cleaning; adaptive clustering; BP neural network

DOI: 10.15986/j.1006-7930.2021.02.017

备注

针对建筑节能气候数据质量较差的问题,提出一种基于K-MEANS的模糊气候聚类和改进BP神经网络模型的建筑物气候数据清洗方法.首先利用K-MEANS算法根据数据相关性将其划分为不同子类,针对K-MEANS聚类个数和初始聚类中心的选取问题,将主分量分析(Principal Component Analysis,PCA)与K-MEANS结合,利用PCA的主分量作为初始聚类中心; 然后利用BP神经网络对每个子类分别构建数据清洗模型,降低运算复杂度,同时利用遗传模拟退火(Genetic Simulated Annealing, GSA)算法对BP神经网络的初值进行全局寻优,解决BP网络参数选择困难、易陷入局部极值问题的同时提升模型的数据清洗性能.采用某市实际气候数据开展试验,对所提方法的数据清洗性能进行验证,结果表明所提方法可以获得优于94%的清洗效率,并且在小样本情况下具备稳健性.
AbstractIn view of the poor quality of building energy-saving climate data, a method of building climate data cleaning based on K-means fuzzy climate clustering and improved BP neural network model is proposed. Firstly, the K-means algorithm is used to divide the data into different sub classes according to the data correlation. Aiming at the problem of selecting the number of K-means clusters and the initial clustering center, the principal component analysis(PCA)is used to analyze the cluster number and the initial cluster center. The principal component of PCA is used as the initial clustering center, and then BP neural network is used to construct data cleaning model for each subclass to reduce the computational complexity. Meanwhile, GSA algorithm optimizes the initial value of BP neural network globally, solves the difficulty of parameter selection of BP neural network, avoids the problem of local extremum, and improves the data cleaning performance of the model. The results show that the proposed method can achieve a cleaning efficiency higher than 94%, and is robust in the case of small samples.