Deep Learning to Uncover Carcinogenic Human Metabolites
Author: Nitin Sreekumar ‘25
Editor: Lisa Liong ‘25
People are exposed to various chemicals throughout their lives from air, food, personal care items, or medications. These chemicals can cause harmful health effects including cancer. These chemicals are called carcinogens and pose significant health risks and are the reason for the rising incidence of cancer in the world.  They trigger the dysregulation of cellular processes resulting in loss of cellular homeostasis. This leads to the disruption of the cells’ survival mechanisms and makes them malignant. It’s therefore very important to identify these carcinogens and take specific measures to avoid or limit our exposure to them.
Recent reports suggest that a chemical carcinogen could induce carcinogenicity by directly impairing the epigenome or by interfering with the DNA damage response pathway, and/or activating the anti-apoptotic pathways.  While we can identify these carcinogens by in vivo experiments, artificial intelligence can significantly accelerate pre-screening of these compounds that includes new drugs, compounds, and industrial by-products. In most of these cases, quantitative structure-activity relationship (QSAR) models have been used for the classification of carcinogens and non-carcinogens. Numerous computational methods have also been proposed for carcinogenicity prediction.
However, all predictive models for carcinogenicity prediction use a limited number of experimentally validated carcinogens and non-carcinogens. Many of these prediction models heavily rely on the genotoxicity or mutagenicity of these compounds which leads to unsatisfactory predictive performance. This is due to the fact that some of these carcinogens are not genotoxic while some genotoxic compounds are not carcinogenic. With the advancement in functional assays, it has been established that a potential carcinogen might induce cellular proliferation, genomic instability, oxidative stress response, anti-apoptotic response, and epigenetic alterations. Based on this information, recently scientists developed a method called Metabokiller, which utilizes a classification approach that identifies the number of human metabolites that might possess carcinogenic properties. They provided Metabokiller as a Python package. 
Cancer cells are highly proliferative and possess altered epigenetic signatures, elevated reactive oxygen species (ROS) levels, and activated anti-apoptotic pathways. Using these characteristics, Metabokiller tracks and utilizes these properties to identify carcinogens. To build the Metabokiller, scientists manually curated and compiled datasets containing information about compounds that are reported to impact cellular proliferation, genomic stability, oxidative state, epigenetic landscape, and apoptotic response. To investigate and study the chemical heterogeneity between the classes, they performed principal component analysis (PCA). Studies showed that Metabokiller outperformed other prediction analyses which were widely used for carcinogenicity prediction. Metabokiller unfolded potential endogenous metabolic compounds that could cause cancer. Moreover, Metabokiller also possesses features of artificial intelligence, since it provides the individual contribution of all the six-core models detailing the biochemical properties of carcinogenicity. The research study used Metabokiller to perform a large-scale computational screening of human metabolites from the HMDB database for their carcinogenic potential and identified many known and unknown human metabolites with carcinogenicity potential. 
Despite multiple advantages, Metabokiller possessed several limitations. The ensemble model did not recognize some of the carcinogen properties such as chronic inflammation, inhibition of senescence, cell transformation, changes in growth factors, energetics, signaling pathways related to cellular replication, cell cycle control, and angiogenesis. Also, some carcinogens are dose-dependent and the metabokiller did not provide this information. It also does not predict the toxicity properties of different tissues. 
Irrespective of the above limitations, Metabokiller was still able to provide a robust, reliable, and accurate alternative for carcinogenicity prediction. Furthermore, the interpretability module of Metabokiller provided a biochemically enriched explanation for each prediction of the carcinogen. Large-scale screening of the human metabolome by Metabokiller provides very reliable information about the predicted carcinogenic metabolites and opens a new avenue in functional metabolomics. This may help us to unfold as well as identify the role of the major cancer-associated metabolites in cancer progression.
1. Nguyen-Ba G, Vasseur P. Epigenetic events during the process of cell transformation induced by carcinogens (review). Oncology Reports [Internet]. 1999 Jul 1 [cited 2023 Aug 31]; 6(4): 925–57. Available from: https://www.spandidos-publications.com/or/6/4/925
2. Mittal A, Mohanty SK, Gautam V, Arora S, Saproo S, Gupta R, et al. Artificial intelligence uncovers carcinogenic human metabolites. Nat Chem Biol [Internet]. 2022 Nov [cited 2023 Aug 31]; 18(11): 1204–13. Available from: https://pubmed.ncbi.nlm.nih.gov/35953549/
Nataljacernecka. Women silhouette with neural networks connections in brain. Artificial Intelligence System. High tech digital technology. Print for scientific research in biology, physics and nanotechnologies [image on the Internet]. [Date Unknown] [cited 2023 Aug 31]. Available from: https://depositphotos.com/vectors/neural-networks.html?qview=250686090