Breast Cancer Wisconsin (Diagnostic) Data Set(威斯康星州乳腺癌(诊断)数据集)

原文:

Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.

n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: “Robust Linear Programming Discrimination of Two Linearly Inseparable Sets”, Optimization Methods and Software 1, 1992, 23-34].

This database is also available through the UW CS ftp server:

ftp ftp.cs.wisc.edu

cd math-prog/cpo-dataset/machine-learn/WDBC/

Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

Attribute Information:

1) ID number

2) Diagnosis (M = malignant, B = benign)

3-32)

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)

b) texture (standard deviation of gray-scale values)

c) perimeter

d) area

e) smoothness (local variation in radius lengths)

f) compactness (perimeter^2 / area – 1.0)

g) concavity (severity of concave portions of the contour)

h) concave points (number of concave portions of the contour)

i) symmetry

j) fractal dimension (“coastline approximation” – 1)

The mean, standard error and “worst” or largest (mean of the three

largest values) of these features were computed for each image,

resulting in 30 features. For instance, field 3 is Mean Radius, field

13 is Radius SE, field 23 is Worst Radius.

All feature values are recoded with four significant digits.

Missing attribute values: none

Class distribution: 357 benign, 212 malignant

译:

威斯康星州乳腺癌(诊断)数据集

预测癌症是良性还是恶性

特征是从一个乳腺肿块的细针抽吸(FNA)的数字化图像计算出来的。它们描述了图像中细胞核的特征。

n三维空间描述如下:【K.P.Bennett和O.L.Mangasarian:“两个线性不可分集的鲁棒线性规划判别”,《优化方法与软件》,1992年,23-34]。

该数据库也可通过UW CS ftp服务器获得:

资金转移定价ftp.cs.wisc文件.edu公司

cd数学程序/cpo数据集/机器学习/WDBC/

也可以在UCI机器学习库中找到:https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

属性信息:

1) 身份证 码

2) 诊断(M=恶性,B=良性)

(第32-32页)

计算每个细胞核的10个实值特征:

a) 半径(从中心到周界各点的平均距离)

b) 纹理(灰度值的标准偏差)

c) 周长

d) 面积

e) 平滑度(半径长度的局部变化)

f) 密实度(周长^2/面积-1.0)

g) 凹度(轮廓凹陷部分的严重程度)

h) 凹点(轮廓凹面部分的数量)

i) 对称性

j) 分形维数(“海岸线近似值”-1)

平均值、标准误差和“最差”或最大值(三者中的平均值

最大值)为每个图像计算这些特征,

产生了30个特征。例如,场3是平均半径

13是半径SE,字段23是最差半径。

所有特征值用四个有效数字重新编码。

缺少属性值:无

分类分布:良性357例,恶性212例

声明:本站部分文章及图片源自用户投稿,如本站任何资料有侵权请您尽早请联系jinwei@zod.com.cn进行处理,非常感谢!

上一篇 2020年9月14日
下一篇 2020年9月14日

相关推荐