基于机器学习利用常规检验指标建立胃癌淋巴结转移预测模型

doi:10.3969/j.issn.1006-5725.2024.06.019

摘要/Abstract

摘要：

目的利用机器学习算法，建立一种基于常规检验指标的胃癌淋巴结转移（lymph node metastasis， LNM）预测模型。方法收集南通大学附属医院2020年1月至2022年1月期间741例胃癌患者数据用于模型训练和测试，收集2023年1-10月期间102例胃癌患者数据用于模型验证；使用XGBoost方法计算指标重要性，从66个指标中过滤出重要指标集合；构建并训练5种机器学习算法：K近邻、支持向量机、多层感知器、随机森林、Adaboost进行对比分析，并在验证集中进一步验证模型的稳定性和预测能力。结果本研究筛选出由9个常规检验指标组成的重要指标集合，并训练构建出胃癌LNM预测模型V9。此外，通过多种机器学习算法对比实验发现，基于Boosting策略的Adaboost算法效果最好，其曲线下面积、F1值、准确率、灵敏度、特异度等评估指标均在0.833～0.968之间，在验证集上预测准确率达94.12%。结论 V9是一种具有辅助临床诊断价值的胃癌LNM预测模型，能够准确评估患者的风险，并为临床决策提供依据。

关键词: 胃癌, 淋巴结转移, 常规检验指标, 机器学习

Abstract:

Objective To establish a prediction model for lymph node metastasis （LNM） of gastric cancer based on routine laboratory indicators using machine learning algorithms. Methods This study collected data of 741 gastric cancer patients at Affiliated Hospital of Nantong University between January 2020 and January 2022 for model training and testing. Additionally， data of 102 gastric cancer patients between January 2023 and October 2023 were collected for model validation. XGBoost algorithm was used to calculate the importance of indicators and filter out a set of important indicators from 66 indicators. Five machine learning algorithms， including K-Nearest Neighbor， Support Vector Machine， Multilayer Perceptron， Random Forest and Adaboost， were constructed and trained for comparative analysis. Furthermore， the stability and accuracy of the model were further validated on the validation set. Results This study selected a set of important indicators composed of 9 routine laboratory indicators and trained the gastric cancer LNM prediction model， named V9. Additionally， through comparative experiments， it was found that the Adaboost algorithm based on the boosting strategy had the best performance， with evaluation metrics such as area under the curve， F1 score， accuracy， sensitivity， and specificity ranging from 0.833 to 0.968. The accuracy of the predictions on the validation set was 94.12%. Conclusion V9 was a gastric cancer LNM prediction model that has auxiliary clinical diagnostic value. It can be used to assess the risk of patients accurately and provide a basis for clinical decision-making.

Key words: gastric cancer, lymph node metastasis, routine laboratory indicators, machine learning

中图分类号:

R319

严健亮,谢泽宇,景蓉蓉,崔明. 基于机器学习利用常规检验指标建立胃癌淋巴结转移预测模型[J]. 实用医学杂志, 2024, 40(6): 844-849.

Jianliang YAN,Zeyu XIE,Rongrong JING,Ming. CUI. Research on establishing gastric cancer lymph node metastasis prediction model based on machine learning and routine laboratory indicators[J]. The Journal of Practical Medicine, 2024, 40(6): 844-849.

图/表 6

表1

图1

图2

图3

表2

表3

参考文献 26

1	XIA C， DONG X， LI H， et al. Cancer statistics in China and United States， 2022： profiles， trends， and determinants［J］. Chin Med J （Engl）， 2022，135（5）：584-590. doi:10.1097/cm9.0000000000002108 doi: 10.1097/cm9.0000000000002108
2	WANG K， JIANG X， REN Y，et al. The significance of preoperative serum carcinoembryonic antigen levels in the prediction of lymph node metastasis and prognosis in locally advanced gastric cancer： a retrospective analysis［J］. BMC Gastroenterol，2020， 20（1）： 100. doi:10.1186/s12876-020-01255-6 doi: 10.1186/s12876-020-01255-6
3	LI Y， XIE F， XIONG Q， et al. Machine learning for lymph node metastasis prediction of in patients with gastric cancer： A systematic review and meta-analysis［J］. Front Oncol， 2022，12：946038. doi:10.3389/fonc.2022.946038 doi: 10.3389/fonc.2022.946038
4	TIAN H， NING Z， ZONG Z， et al. Application of Machine Learning Algorithms to Predict Lymph Node Metastasis in Early Gastric Cancer［J］. Front Med （Lausanne）， 2022，8：759013. doi:10.3389/fmed.2021.759013 doi: 10.3389/fmed.2021.759013
5	ZHANG Y， ZHANG J， YANG L， et al. A meta-analysis of the utility of transabdominal ultrasound for evaluation of gastric cancer［J］. Medicine （Baltimore），2021，100 （32）：e26928. doi:10.1097/md.0000000000026928 doi: 10.1097/md.0000000000026928
6	CHARILAOU P， BATTAT R. Machine learning models and over-fitting considerations［J］. World J Gastroenterol，2022，28（5）：605-607. doi:10.3748/wjg.v28.i5.605 doi: 10.3748/wjg.v28.i5.605
7	MACEACHERN S J， FORKERT N D. Machine learning for precision medicine［J］.Genome， 2021，64（4）：416-425. doi:10.1139/gen-2020-0131 doi: 10.1139/gen-2020-0131
8	NGIAM K Y， KHOR I W. Big data and machine learning algorithms for health-care delivery［J］. Lancet Oncol，2019，20（5）：e262-e273. doi:10.1016/s1470-2045(19)30149-4 doi: 10.1016/s1470-2045(19)30149-4
9	PRABHA A， YADAV J， RABI A， et al. Design of intelligent diabetes mellitus detection system using hybrid feature selection based XGBoost classifier［J］. Comput Biol Med， 2021，136：104664. doi:10.1016/j.compbiomed.2021.104664 doi: 10.1016/j.compbiomed.2021.104664
10	LUO Y， XUE Y， SONG H， et al. Machine learning based on routine laboratory indicators promoting the discrimination between active tuberculosis and latent tuberculosis infection［J］. J Infect， 2022，84（5）：648-657. doi:10.1016/j.jinf.2021.12.046 doi: 10.1016/j.jinf.2021.12.046
11	ALBARADEI S， THAFAR M， ALSAEDI A， et al. Machine learning and deep learning methods that use omics data for metastasis prediction［J］. Comput Struct Biotechnol J， 2021，19：5008-5018. doi:10.1016/j.csbj.2021.09.001 doi: 10.1016/j.csbj.2021.09.001
12	LEI N， ZHANG X， WEI M， et al. Machine learning algorithms' accuracy in predicting kidney disease progression： a systematic review and meta-analysis［J］. BMC Med Inform Decis Mak， 2022，22（1）：205. doi:10.1186/s12911-022-01951-1 doi: 10.1186/s12911-022-01951-1
13	LIU H Q， LIN S Y， SONG Y D， et al. Machine learning on MRI radiomic features： identification of molecular subtype alteration in breast cancer after neoadjuvant therapy［J］. Eur Radiol，2023，33（4）：2965-2974. doi:10.1007/s00330-022-09264-7 doi: 10.1007/s00330-022-09264-7
14	SUN Y， DING S， ZHANG Z， et al. An improved grid search algorithm to optimize SVR for prediction［J］. Soft Comput，2021，25： 5633-5644. doi:10.1007/s00500-020-05560-w doi: 10.1007/s00500-020-05560-w
15	邱晖. 中性粒细胞-淋巴细胞比对早期胃癌淋巴结转移的预测价值与预后影响分析［J］. 黑龙江医药科学，2022，45（6）：144-147. doi:10.3969/j.issn.1008-0104.2022.06.059 doi: 10.3969/j.issn.1008-0104.2022.06.059
16	YANG Z， XU Q， BAO S， et al. Learning With Multiclass AUC： Theory and Algorithms［J］. IEEE Trans Pattern Anal Mach Intell，2022，44（11）：7747-7763. doi:10.1109/tpami.2021.3101125 doi: 10.1109/tpami.2021.3101125
17	罗东明，陈德伦，汪志华，等. 血清中AFP、FGA、PG、PSA在预测早期胃癌患者淋巴结转移和手术疗效监测中的临床意义［J］.中国老年学杂志，2022，42（5）：1081-1084. doi:10.3969/j.issn.1005-9202.2022.05.018 doi: 10.3969/j.issn.1005-9202.2022.05.018
18	李焱芳，陶芹，李韶华，等. 卡培他滨辅助紫杉醇+顺铂化疗对胃癌患者cTnI、BNP的影响［J］. 现代消化及介入诊疗，2021，26（4）：437-440.
19	HUANG C， HU C， ZHU J， et al. Establishment of Decision Rules and Risk Assessment Model for Preoperative Prediction of Lymph Node Metastasis in Gastric Cancer［J］. Front Oncol，2020，10：1638. doi:10.3389/fonc.2020.01638 doi: 10.3389/fonc.2020.01638
20	GAO X， MA T， CUI J， et al. A radiomics-based model for prediction of lymph node metastasis in gastric cancer［J］. Eur J Radiol，2020，129：109069. doi:10.1016/j.ejrad.2020.109069 doi: 10.1016/j.ejrad.2020.109069
21	顾玉花. CysC、β₂-MG联合检测对胃癌患者化疗后肾脏早期损害的诊断价值［J］. 山东医学高等专科学校学报，2014，36（1）：49-51. doi:10.3969/j.issn.1674-0947.2014.01.022 doi: 10.3969/j.issn.1674-0947.2014.01.022
22	YIN H M， HE Q， CHEN J， et al. Drug metabolism-related eight-gene signature can predict the prognosis of gastric adenocarcinoma［J］. J Clin Lab Anal，2021，35（12）：e24085. doi:10.1002/jcla.24085 doi: 10.1002/jcla.24085
23	孙芳，许永波，崔广和，等. 基于超声特征构建机器学习模型预测浸润性乳腺癌Luminal分型［J］. 实用医学杂志，2022，38（18）：2279-2283.
24	严健亮，景蓉蓉，谢泽宇，等. 机器学习在胃癌生物标志物挖掘中的应用进展［J］. 实用医学杂志，2023，39（6）：783-787. doi:10.3969/j.issn.1006-5725.2023.06.023 doi: 10.3969/j.issn.1006-5725.2023.06.023
25	YANG B， LI W， WU X， et al. Comparison of Ruptured Intracranial Aneurysms Identification Using Different Machine Learning Algorithms and Radiomics ［J］.Diagnostics （Basel），2023，13（16）：2627. doi:10.3390/diagnostics13162627 doi: 10.3390/diagnostics13162627
26	TANG J， HENDERSON A， GARDNER P. Exploring AdaBoost and Random Forests machine learning approaches for infrared pathology on unbalanced data sets［J］. Analyst， 2021， 146（19）：5880-5891. doi:10.1039/d0an02155e doi: 10.1039/d0an02155e

临床指标	变量	训练集（n = 555）					测试集（n = 186）			χ²值	P值
临床指标	变量	non-LNM （n = 197）	LNM （n = 358）	χ²值	P值	non-LNM （n = 68）	LNM （n = 118）	χ²值	P值	χ²值	P值
性别	男	146	242	2.263	0.133	38	87	5.451	0.020	0.360	0.548
性别	女	51	116	2.263	0.133	30	31	5.451	0.020	0.360	0.548
年龄	< 均值（66）	98	159	1.247	0.264	29	60	0.857	0.355	0.078	0.779
年龄	≥ 均值（66）	99	199	1.247	0.264	39	58	0.857	0.355	0.078	0.779
分化程度	G1	19	2	44.560	< 0.001	17	1	44.437	< 0.001	22.399	< 0.001
	G2	102	214			24	49
	G3	56	132			18	66
	缺失	20	10			9	2
NLR	< 中位数（2.118）	110	167	3.933	0.047	39	52	2.539	0.111	0.022	0.882
NLR	≥ 中位数（2.118）	87	191	3.933	0.047	29	66	2.539	0.111	0.022	0.882
pTNM T阶段	T1	107	23	207.072	< 0.001	44	3	95.694	< 0.001	3.743	0.442
	T2	41	37			8	12
	T3	45	272			15	89
	T4	1	22			0	13
	缺失	3	4			1	1
pTNM N阶段	N0	197	0	555.000	< 0.001	68	0	186.000	< 0.001	0.452	0.929
	N1	0	105			0	38
	N2	0	98			0	30
	N3	0	155			0	50

	训练集（n = 555， 197 LNM， 358 non-LNM）							测试集（n = 186， 68 LNM， 118 non-LNM）
模型	AUC	F1	ACC	敏感度	特异度	PPV	NPV	AUC	F1	ACC	敏感度	特异度	PPV	NPV
AdaBoost	0.999	0.982	0.977	0.974	0.978	0.960	0.986	0.968	0.926	0.903	0.887	0.911	0.833	0.942
RF	0.990	0.950	0.935	0.918	0.944	0.899	0.955	0.960	0.896	0.866	0.815	0.893	0.803	0.900
KNN	0.879	0.844	0.800	0.720	0.845	0.724	0.843	0.813	0.823	0.763	0.690	0.797	0.606	0.850
SVM	0.763	0.767	0.659	0.550	0.684	0.276	0.874	0.750	0.771	0.661	0.548	0.684	0.258	0.883
MLP	0.749	0.778	0.659	0.586	0.668	0.171	0.933	0.742	0.800	0.683	0.818	0.674	0.136	0.983

[1]	陈小梅,王安奇,杨积祯,于淼. m1A/m5C/m6A/m7G调控基因预测胃癌预后及免疫关联性[J]. 实用医学杂志, 2024, 40(9): 1230-1237.
[2]	陈舒,张静蕾,荣康,张楠,孙维义. 外泌体在胃癌远处转移和耐药性中的研究进展[J]. 实用医学杂志, 2024, 40(6): 870-876.
[3]	徐俐,胡珊珊,赵海明. LncRNA GNAS-AS1通过调节miR-449a/Notch1轴参与胃癌细胞的增殖和迁移[J]. 实用医学杂志, 2024, 40(4): 483-489.
[4]	黄良江,毛德文,郑景辉,王明刚,姚春. 人工智能在肝性脑病风险预测模型中的应用进展[J]. 实用医学杂志, 2024, 40(3): 289-294.
[5]	赵健,刘松杰,张观朝,沈裕厚,李凤臣,徐兵. 着丝粒蛋白F、miR-1-3p在中晚期胃癌患者血清中的表达及与预后的相关性[J]. 实用医学杂志, 2024, 40(3): 365-370.
[6]	杨家明,谢诗,周海深,张家庆. 早期多原发与单发肺腺癌结节的临床特征及淋巴结转移风险对比[J]. 实用医学杂志, 2024, 40(22): 3208-3214.
[7]	徐俊,王晓丽,倪静怡,张娣娣. 维迪西妥单抗治疗晚期胃癌的临床疗效及安全性[J]. 实用医学杂志, 2024, 40(20): 2913-2917.
[8]	牛春燕,王小平,赵向阳,黄健康,陈跃,石永强,宋用强,王辉,吴新国,卜永丹,李箕进,陶涛,吴金华,薛昌林,张福玉,杨金明,韩春荣,袁娟,武银铃,熊红兵,肖鹏. 南京市溧水区多中心胃癌前病变状况人群调查[J]. 实用医学杂志, 2024, 40(20): 2929-2934.
[9]	裴蓓,张艺,孙琴,金月萍,李学军. 趋化因子CXCL5与慢性萎缩性胃炎及胃癌前病变的相关性[J]. 实用医学杂志, 2024, 40(15): 2098-2104.
[10]	张艺,马芳琪,魏思媛,李学军. 白术内酯Ⅰ在胃癌细胞中抑制输出蛋白T的作用和机制[J]. 实用医学杂志, 2024, 40(14): 1928-1934.
[11]	王畏,张新鑫,王广辉,张杰,陈安然,贾建光. TMSB10促进胃癌细胞增殖及糖酵解：基于激活AMPK/mTOR信号通路[J]. 实用医学杂志, 2024, 40(11): 1519-1525.
[12]	肖无双,洪林杰,余针,杨萍,张杰铭,彭思扬,魏向阳,陈奕东,刘思德,王继德. S100A7A在胃癌中的表达及对增殖转移的影响[J]. 实用医学杂志, 2024, 40(10): 1344-1350.
[13]	顾天,刘春宏,张飞,钱薇,朱艳秋,褚明亮,刘杰民. 大黄素抑制胃癌AGS细胞YAP1、FOXD1基因表达及相关机制[J]. 实用医学杂志, 2024, 40(1): 59-64.
[14]	严健亮, 景蓉蓉谢泽宇崔明. 机器学习在胃癌生物标志物挖掘中的应用进展 [J]. 实用医学杂志, 2023, 39(6): 783-787.
[15]	杨光,张丹凤,冯晓娜,张燕. 细胞程序性死亡蛋白5与宫颈癌患者病理特征的关系及对淋巴结转移发生风险的评估价值[J]. 实用医学杂志, 2023, 39(24): 3210-3213.