开封府景点网站及移动端建设情况,营销网站建设 公司,网站备案和域名备案区别,网架公司赵娜大家好#xff0c;我是带我去滑雪#xff01; 判断肺部是否发生病变可以及早发现疾病、指导治疗和监测疾病进展#xff0c;以及预防和促进肺部健康#xff0c;定期进行肺部评估和检查对于保护肺健康、预防疾病和提高生活质量至关重要。本期将利用相关医学临床数据结合逻辑回… 大家好我是带我去滑雪 判断肺部是否发生病变可以及早发现疾病、指导治疗和监测疾病进展以及预防和促进肺部健康定期进行肺部评估和检查对于保护肺健康、预防疾病和提高生活质量至关重要。本期将利用相关医学临床数据结合逻辑回归判断病人肺部是否发生病变其中响应变量为group1表示肺部发生病变0表示正常特征变量为ESR表示红细胞沉降率、CRP表示C-反应蛋白、ALB表示白蛋白、Anti-SSA表示抗SSA抗体、Glandular involvement表示腺体受累、gender表示性别、c-PSAcancer-specific prostate-specific antigen、CA 15-3Cancer Antigen 15-3、TH17Th17细胞、ANA代表抗核抗体、CA125Cancer Antigen 125、LDH代表乳酸脱氢酶。下面开始使用逻辑回归进行肺部病变判断。
1导入相关模块与数据 import pandas as pd import matplotlib.pyplot as plt from sklearn.linear_model import LogisticRegression from sklearn.metrics import confusion_matrix from sklearn.metrics import classification_report from sklearn.metrics import cohen_kappa_score#导入包 import numpy as np from scipy.stats import logistic import matplotlib.pyplot as plt titanic pd.read_csv(filename1.csv) titanic#导入数据 输出结果 data.Ageimpute.data.ESR..mean.impute.data.CRP..mean.impute.data.ALB..mean.impute.data.Anti.SSA..median.impute.data.Glandular.involvement..median.impute.data.Gender..median.impute.data.c.PSA..mean.impute.data.CA153..mean.impute.data.TH17..mean.impute.data.ANA..median.impute.data.CA125..mean.impute.data.LDH..mean.data.group06721.0000004.81000038.6926610000.3000003.5000010.33000013.000000212.210493017833.00000012.08991641.1000000000.61093122.400007.465353117.500000485.000000026924.0000002.25000042.7000000000.3000005.400008.02000004.360000236.000000037143.00000021.80000039.2000000000.30000011.110005.50000016.700000166.000000046920.0000002.43000047.6000003000.3000006.930004.31000003.520000223.0000000.............................................9546340.2749142.37000040.3000002000.4300006.100006.56000007.720000234.00000009556827.0000003.52000041.0000003000.3200007.520004.78000017.150000254.00000009566140.27491412.08991640.7000000000.61093112.463031.79000019.392344161.00000009576027.00000035.40000038.3000000000.2000007.680005.70000009.290000256.00000009586830.0000002.28000044.4000000000.2000005.320004.43000004.710000172.0000000 959 rows × 14 columns 2数据处理 X titanic.iloc[:,:-1] y titanic.iloc[:,-1] Xpd.get_dummies(X,drop_first True) X 3划分训练集与测试集 from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression X_train, X_test, y_train, y_test train_test_split(X,y,test_size0.2,stratifyNone, random_state0)#划分训练集和测试集 4拟合逻辑回归 model LogisticRegression(C1e10) model.fit(X_train, y_train) model.intercept_ #模型截距 model.coef_ #模型回归系数 输出结果 array([[ 0.03899236, 0.00458312, 0.000863 , -0.10140358, -0.09681747,0.74167081, 0.56011254, 0.24636358, 0.0226635 , -0.02681392,0.4987412 , -0.01932326, 0.00211805]]) 5使用逻辑回归测试集进行评价分类准确率 model.score(X_test, y_test) 输出结果 0.6822916666666666 6测试集预测所有种类的概率 prob model.predict_proba(X_test) prob[:5] 输出结果 array([[0.71336774, 0.28663226],[0.34959506, 0.65040494],[0.91506198, 0.08493802],[0.24008149, 0.75991851],[0.55969043, 0.44030957]]) 7模型预测 pred model.predict(X_test) pred[:5]#计算测试集的预测值展示前五个值 输出结果 array([0, 1, 0, 1, 0], dtypeint64) 8计算混淆矩阵 table pd.crosstab(y_test, pred, rownames[Actual], colnames[Predicted]) table 输出结果 Predicted01Actual0992213932 9计算基于混淆矩阵诸多评价指标 print(classification_report(y_test, pred, target_names[yes, no])) 输出结果 precision recall f1-score supportyes 0.72 0.82 0.76 121no 0.59 0.45 0.51 71accuracy 0.68 192macro avg 0.65 0.63 0.64 192
weighted avg 0.67 0.68 0.67 19210绘制ROC曲线 from scikitplot.metrics import plot_roc plot_roc(y_test, prob) x np.linspace(0, 1, 100) plt.plot(x, x, k--, linewidth1) plt.title(ROC Curve (Test Set))#画ROC曲线 plt.savefig(E:\工作\硕士\博客\squares1.png, bbox_inches tight, pad_inches 1, transparent True, facecolor w, edgecolor w, dpi300, orientation landscape) 输出结果 需要数据集的家人们可以去百度网盘永久有效获取
链接https://pan.baidu.com/s/1E59qYZuGhwlrx6gn4JJZTg?pwd2138 提取码2138 更多优质内容持续发布中请移步主页查看。 点赞关注,下次不迷路