Anthony Awuley Machine Learning and Software Engineer

12 Mar

Using Genetic Programming to Study Pima Indians Diabetes Data Set

The Pima Indians Diabetes data set is a well known challenging pattern recognition problem from the UCI Machine Learning Repository: Pima Indians Diabetes. I used genetic programming to evolve to predict the diabetes status of a patient. The data contained the following attributes:

1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
9. Class variable (0 or 1)

