周恒,巴庆芳,袁哲明,代志军.化学通报,2022,85(6):736-745.
基于逐步非线性回归的血管紧张素转化酶抑制肽QSAR建模
QSAR Modeling on Angiotensin-Converting Enzyme Inhibitory Peptides Based on Stepwise Non-linear Regression
投稿时间:2021-09-30  修订日期:2021-11-04
DOI:
中文关键词:  定量构效关系  特征选择  支持向量机  逐步非线性回归  血管紧张素转化酶抑制肽
英文关键词:quantitative structure-activity relationship  feature selection  support vector machine  stepwise non-linear regression  angiotensin-converting enzyme inhibitory peptide
基金项目:国家自然科学基金(No. 31701164), 湖南省自然科学基金(No. 2018JJ3238), 湖南省教育厅科学研究项目(No. 18C0171)
作者单位E-mail
周恒 湖南农业大学 1047216032@qq.com 
巴庆芳 湖南农业大学  
袁哲明 湖南农业大学  
代志军* 湖南农业大学 daizhijun@hunau.edu.cn 
摘要点击次数: 408
全文下载次数: 211
中文摘要:
      线性特征选择方法可提升定量构效关系(QSAR)模型的预测能力,但易忽略特征(理化属性)与分子活性间的非线性关系。提出基于支持向量回归(SVR)的逐步非线性回归(SSNR)特征选择算法并用于降血压药物血管紧张素转化酶(ACE)抑制肽的QSAR研究。首先以具有不同背景的5组分子描述符分别表征肽序列,以SSNR实施特征选择,再通过智能一致性模型(ICM)对各组描述符对应子模型的预测活性进行加权整合,获得最终活性预测值。在ACE抑制二肽与三肽两个数据上的应用结果表明,SSNR获得的特征子集结合ICM策略可有效提升模型预测能力(二肽的平均Q2pred为0.675±0.002,三肽为0.663±0.013),优于遗传算法-偏最小二乘(0.538±0.049、0.599±0.047)与逐步线性回归(0.583±0.041、0.675±0.010)。最后基于抑制活性已知肽序列预测所有活性未知肽活性,分析了高活性肽及其氨基酸偏好性,为人工合成潜在高活性ACE抑制肽提供可能的序列组合。
英文摘要:
      Linear feature selection methods are able to improve the predictive ability of quantitative structure-activity relationship (QSAR) models. However, it generally fails to capture the non-linear relationships between features (or physiochemical properties) and activities of molecules. We proposed a non-linear feature selection algorithm, namely Support vector regression (SVR)-based Stepwise Non-linear Regression (SSNR), to perform a QSAR study on the antihypertensive drugs, i.e. the angiotensin-converting enzyme inhibitory peptides. Firstly, we used 5 groups of molecular descriptors with different backgrounds to characterize the peptide sequences respectively. Then the SSNR was employed to select the features that may present non-linear correlation to the molecular activities. After that, an intelligent consensus model (ICM) was conducted to integrate the predicted activities obtained from each of the sub-models with respect to the groups of descriptors through a weighting strategy to obtain the final predicted activities. Results of QSAR modeling on the two data sets of ACE-inhibitory dipeptide and tripeptide indicate that the feature subset obtained by SSNR coupling with the ICM strategy can improve the predictive ability of models effectively (with average Q2pred = 0.675±0.002 for dipeptide and 0.663±0.013 for tripeptide), which is superior to the reference feature selection algorithms, such as genetic algorithm-partial least squares (GA-PLS) (0.538±0.049, 0.599±0.047) and stepwise linear regression (0.583±0.041, 0.675±0.010). Comparing with the reference regression models, SVR presents stabler prediction in general than PLS Regression and Multiple Linear Regression (MLR). Finally, we performed prediction on all of the unknown ACE-inhibitory peptides using the proposed QSAR pipeline. By combining the activities of existing peptides and the predicted activities of unknown peptides, preferred amino acids in each position of residue were analyzed and revealed base on the peptides with high-activities.
查看全文  查看/发表评论  下载PDF阅读器
关闭