When a crime is under investigation, especially when too many questions are unanswered, it is necessary to reduce the number of suspects to be able to solve the investigation. Any detail found at the crime scene is important to reduce the number of suspects, such as a strand of hair, DNA, or even a fingerprint. When the DNA found does not have the complete information to be able to determine the identity of the suspect, some information can still be extracted from it, like the information on eye color or skin color.
Single Nucleotide Polymorphisms (SNPs, pronounced “snips”) are the places in the genome where people differ. For example, if a major part of a population has the nucleotide C (cytosine) in a specific genome position and a minority of the same population has the nucleotide A (adenine) in this same genome position, that indicates that an SNP occurs in that genome position.
The human genome has 10 to 30 million SNPs [1], but only some of them are related to external traits, making the task of using SNPs to determine pigmentation traits even harder.
The use of SNPs to determine pigmentation traits has many studies trying to find a state-of-the- art solution. There is no consensus on the best approach for the problem, and each study tries a different way to solve it. IrisPlex [2] is a tool developed to predict Blue, Intermediate, and Brown eye colors for forensic use. For the creation of the tool, six SNPs were used and the information of 6168 Dutch Europeans was used to establish that the six SNPs selected carry the most important eye color information.
The IrisPlex presented a good result for brown eyes and blue eyes prediction. But intermediate eye colors were more challenging to define using the presented prediction model and the available SNPs. After the first paper was published, the authors developed a better version of IrisPlex, called HIrisPlex-S, to predict eye, skin, and hair color. Today, Irisplex is the reference solution for forensic use. The tool is a model based on a multinomial logistic regression and the probability of the individual having brown, blue, or intermediate color is calculated based on a formulation for each category (you can check out the paper to understand better how they got the formulation).
Muneeb and Henschel [3] work presented an experiment to classify eye color and Type-2 diabetes using 9 types of classifiers: Random Forest, Extreme Gradient boosting, Artificial Neural Network (ANN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Bidirectional LSTM (BILSTM), 1D Convolutional Neural Network (1DCNN), ensembles of ANN, and ensembles of LSTM. The dataset used for eye color was randomly split into 540 samples for training and 266 samples for testing, maintaining the class’s proportion (Brown and Green). Each algorithm was trained using different numbers of SNPs. The results of all models were very close, but in their case, the ensembles of LSTM had the higher accuracy (96%) using 1560 SNPs for eye color prediction.
Hart et al. [4] presented a heuristic solution for eye color and skin color prediction using 8 SNPs, to improve the 7-Plex system, which utilizes 7 SNPs. The training set used for them has 803 training samples, and the classes for eye color are Blue, Brown, and Green, while the classes for skin are Dark, Medium, and Light. The process for eye color prediction occurs in two steps: The first step will classify the sample as Not Brown or Not Blue. Then, in the second step, it will classify the eye as being Blue, Brown, or Green. The eye and skin classification occurs according to the alleles in each SNP (AA, GC, etc.). Using the European data for the test, the call rate for the solution was approximately 94%, and no errors occurred for eye prediction.
Conclusion:
This post presented an overview of current solutions for eye and skin color prediction using SNPs for forensic use. Most of the solutions proposed so far use data collected from Europeans and only a few admixed samples. One of the challenges for feature works is to create a predictive model that can generalize well not only for Europeans and understand what the most relevant SNPs for eye and skin color prediction are, once each tool has its own set of SNPs.
Did you get interested in the subject? Let us know in the comments.
References:
[1] — KWOK, P.-Y. SNPs: Why Do We Care? In: Single Nucleotide Polymorphisms: methods and
protocols. 1st. ed. Totowa, N.J: Humana Press, 2003. p. 1–11. ISBN 978–1–59259–327–9
[2] — WALSH, S. et al. IrisPlex: A sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information. Forensic Science International: Genetics, Elsevier BV, v. 5, n. 3, p. 170–180, jun 2011. Available from Internet: <https://doi.org/10.1016/j.fsigen.2010.02.004>
[3] — MUNEEB, M.; HENSCHEL, A. Eye-color and type-2 diabetes phenotype prediction from genotype data using deep learning methods. BMC Bioinformatics, Springer Science and
Business Media LLC, v. 22, n. 1, p. 1–26, apr 2021. Available from Internet: <https://doi.org/10.1186/s12859-021-04077-9>
[4] — HART, K. L. et al. Improved eye- and skin-color prediction based on 8 SNPs. Croatian Medical Journal, Croatian Medical Journals, v. 54, n. 3, p. 248–256, jun 2013. Available from Internet: <https://doi.org/10.3325/cmj.2013.54.248>