OPTIMIZING STRUCTURE-BASED VIRTUAL SCREENING PROTOCOL TO IDENTIFY PHYTOCHEMICALS AS CYCLOOXYGENASE-2 INHIBITORS

Email: enade@usd.ac.id ABSTRACT By employing Databases of Useful Decoys (DUD) and its enhanced version (DUD-E), several attempts to construct validated Structure-based Virtual Screening (SBVS) protocols to identify cyclooxygenase-2 (COX-2) inhibitors have been performed. Both databases tagged active COX-2 inhibitors for compounds with IC50 values < 1M. In the search for phytochemicals as natural COX-2 inhibitors, however, most of their IC50 values are in the micromolar range, which will likely be identified as non-inhibitors for COX-2 by the available SBVS protocols. In this article, validation of an SBVS protocol by adding marginal active COX-2 inhibitors from DUD-E as active compounds is presented. Binary quantitative-structure activity relationship analysis by using recursive partition and regression tree method was performed subsequently to optimize the predictive ability of the protocol. The enrichment factor and the F-measure values of the optimized protocol could reach 44.78 and 0.47, respectively. The optimized protocol could identify 1 out of 9 phytochemicals as COX-2 inhibitors.

The search for COX-2 inhibitors has also involved natural products (Orlikova et al., 2013;Pany et al., 2013).Nevertheless, two most well-known natural COX-2 inhibitors curcumin (Figure 1A) and resveratrol (Figure 1B) showed IC50 values of 79.2μM and 32.0μM, respectively (Gautam et al., 2011;Larsson et al., 2005).By using the method employed by Istyastono et al.  (2015 b ) to have additional active histamine H4 receptor, the data of compounds that have been tested as COX-2 inhibitors which have been stored in ChEMBL version 21 (ChEMBL_21; https://www.ebi.ac.uk/chembl/) (Bento et al., 2014) were downloaded and examined.By taking into account only compounds that published in Journal of Natural Products (http://pubs.acs.org/journal/jnprdf), it was recorded that at least 74 phytochemicals have been examined as COX-2 inhibitors (Bento et al., 2014).Similar to curcumin and resveratrol, most of those 74 phytochemicals are however marginal COX-2 inhibitors with IC50 values > 1μM, which will unlikely identified as COX-2 inhibitors by SBVS protocols validated using data from DUD or DUD-E (Bento et al., 2014;Huang et al., 2006;Mysinger et al., 2012).In fact, only 9 compounds out of the 74 phytochemicals that have IC50 values as COX-2 inhibitors < 1μM (Bento et al., 2014).Moreover, only 7 out of those 9 compounds that meet the Lipinski's rule of 5 (Figure 2) (Lipinski et al., 2001).Therefore, development of validated SBVS protocols that can identify marginal and potent COX-2 inhibitors to cover phytochemicals as potential lead compounds is required.
The research presented in this paper was aimed to construct and retrospectively validate SBVS protocols to identify marginal to potent COX-2 inhibitors by using data from DUD-E with additional marginal active COX-2 inhibitors from DUD-E as active compounds (Mysinger et al., 2012).The protocols were constructed by employing PLANTS1.2 as the molecular docking software (Korb et al., 2007;Korb et al., 2009) and PyPLIF to identify Protein-Ligand Interaction Fingerprints (PLIF) to COX-2 as the re-scoring functions (Radifar  et al., 2013 a ; Radifar et al., 2013 b ).The quality of the SBVS protocol was subsequently assessed (Cannon et al., 2007;de Graaf et al., 2011;Desaphy et al., 2013;Powers, 2011) and Figure 1.Structures of curcumin, the active substance found in turmeric (Curcuma longa) (A) and resveratrol, a compound mainly found in grapes and red wine (B) (Orlikova et al., 2013;Setyaningsih et al., 2013;Yuniarti et al., 2012) Figure 2. Structures of phytochemicals published in Journal of Natural Products and stored in ChEMBL_21 database that have IC50 values as COX-2 inhibitors < 1 M and do not violate the Lipinski's rule of 5 (Bento et al., 2014).compared to the original SBVS accompanying the release of DUD-E (Mysinger et al., 2012).Very recently, Istyastono (2015) showed that using recursive partition and regression tree (RPART) method with the PLANTS1.2docking score (ChemPLP score) and the PLIF bitstrings resulted from PyPLIF as the descriptors increased significantly the SBVS predictive ability (Istyastono, 2015;Therneau et al., 2015).This approach could avoid the dependency of the application of PyPLIF towards the reference compound (Istyastono, 2015).By employing the same strategy, this research could increase significantly the predictive ability of the SBVS protocols to identify marginal and potent COX-2 inhibitors with enrichment factor (EF) value of 44.78.The optimized protocol was subsequently employed to virtually screen phytochemicals in figures 1 and 2 as COX-2 inhibitors.

Virtual molecular target preparation
The crystal structure of COX-2 with the PDB id of 3LN1 (Wang et al., 2010 a ) was downloaded from http://www.rcsb.org/pdb/explore.do?structur eId=3ln1.Only chain A of the crystal structure used further in this research (Mysinger et al., 2012).The module splitpdb in SPORES was subsequently used to split and to convert the splitted files into mol2 files the virtual COX-2 (protein.mol2),the co-crystal ligand celecoxib (ligand_CEL682_0.mol2),and the water molecules.The mol2 files were then ready to be employed in molecular docking simulation employing PLANTS1.2 docking software.

Ligands preparation for retrospective virtual screening
Known COX-2 active and marginal inhibitors and the decoys were downloaded in their SMILES format from DUD-e (Mysinger et al., 2012).They were stored locally as actives_final.ism,marginal_actives_nM_chembl.ism and decoys_final.ism.Each compound in the files was then subjected to Open Babel 2.2.3 conversion software to be converted in its three dimensional (3D) format at pH 7.4 as a mol2 file.The settypes module in SPORES was subsequently employed to properly check and assign the mol2 file into a proper mol2 file ready to dock by using PLANTS1.2docking software.

Similar
to previously published procedures (Istyastono and Setyaningsih, 2015;  Istyastono et al., 2015 a ; Setiawati et al., 2014), all virtual screenings were performed by docking program PLANTS1.2.For each compound, 50 poses were calculated and scored by the ChemPLP scoring function at speed setting 2. The binding pocket of COX-2 was defined by the coordinates of the center of the co-crystal ligand celecoxib and a radius of 5 Å (which is the maximum distance from the center defined by a 5 Å radius around the reference ligand).All other options of PLANTS1.2 were left at their default setting (Anita et al., 2012;Istyastono and Setyaningsih, 2015).Every compound was virtually screened five times independently.

Rescoring using protein-ligand interaction fingerprints calculated by PyPLIF
Seven different interaction types (negatively charged, positively charged, hydrogen bond (H-bond) acceptor, H-bond donor, aromatic face-to-edge, aromatic face-to-face, and hydrophobic interactions) were used to define the PLIF for each docking pose (Radifar et al., 2013 b ; Setiawati et al., 2014).The cavity used for the PLIF analysis consisted of a set of amino acid residues in the binding pocket of COX-2 defined in subsection Automated molecular docking and virtual screening.

Optimizing the SBVS predictive ability using RPART
The docking pose with the best ChemPLP score was selected for each virtually screened compound.The results were then ranked based on their ChemPLP score and the enrichment factor of True Positives (TP) at 1% False Positives (FP) value or the EF1% value was calculated (EF1% = %TP/FP1%) (de Graaf and Rognan, 2008;Istyastono and Setyaningsih, 2015).The compound was predicted as a COX-2 inhibitor if it showed ChemPLP score ≤ the ChemPLP score of the compound at 1% FP (Istyastono et al., 2015 b ).Therefore, the protocol was encoded as the EF1%-ChemPLP based SBVS.By employing RPART package (R Core Team, 2015;Therneau et al., 2015), decision trees were generated using ChemPLP score resulted from PLANTS1.2 (Korb et al., 2009) and all PLIF bitstrings resulted from PyPLIF (Radifar et al., 2013 b ) as the descriptors (Istyastono, 2015).The predictive quality of the best decision tree resulted from RPART method was measured by examining the EF value (de Graaf et al., 2011), the balance accuracy (Cannon et al., 2007;Therneau et al., 2015) and F-measure value (Desaphy et al., 2013).The predictive quality of the best decision tree was also compared to the predictive quality of the EF1%-ChemPLP based SBVS by employing McNemar's test (Cannon et al., 2007).

Virtual screening on some phytochemicals
By employing the best SBVS protocol resulted from the subsection Optimizing the SBVS predictive ability using RPART, virtual screening campaigns to predict whether the COX-2 inhibitors (Figures 1 and 2) were also identified as COX-2 inhibitors virtually were performed.The virtual compounds were downloaded in their SMILE formats from the ChEMBL_21 database (Bento et al., 2014).
Subsequently, the virtual compounds were prepared and examined using the best SBVS protocol.

RESULTS AND DISCUSSION
Aimed to construct and evaluate the validation of an SBVS protocol to identify phytochemicals as inhibitors for COX-2, this research employed marginal and potent COX-2 inhibitors organized and stored in DUD-E (Mysinger et al., 2012) as the retrospective compounds for the validation.Similar to the construction of an SBVS protocol to identify potent ligands for adrenergic β2 receptor (Istyastono and Setyaningsih, 2015), the SBVS constructed here employed PLANTS1.2 as the molecular docking software (Korb et al., 2007;Korb et al., 2009) and PyPLIF as the PLIF identification software for rescoring the results from PLANTS1.2 (Radifar et al., 2013 a ; Radifar  et al., 2013 b ).Additionally, the SBVS protocol constructed here employed the ChemPLP scores from PLANTS1.2 (Korb et al., 2009) and the PLIF bitstrings from PyLIF (Radifar et  al., 2013 b ) obtained from the docking pose with the best ChemPLP score in each virtually screened compound as descriptors to generate decision tress using RPART package in R (Istyastono, 2015;R Core Team, 2015;Therneau et al., 2015).

Retrospective Validation of the SBVS protocol
The retrospective SBVS campaigns on COX-2 ligands and their decoys have resulted in 3,767,550 docking poses and 1,318,642,500 PLIF bitstrings resulted from 25,117 out of 25,123 screened compounds.Six decoys could not pass the constructed protocol, which were then assigned as True Negatives (TN).The EF1%-ChemPLP based SBVS resulted in ChemPLP score of -108.028 as the cutoff score.Employing this cutoff score resulted in a confusion matrix presented in Table 1.The EF value of the protocol was 3.55, which was very low compared to the reference (EF value = 12.9) and was not recommended to be employed further (Istyastono et al., 2015 b ;  Mysinger et al., 2012).Inspired by Istyastono (2015), RPART package in R computational statistics software (R Core Team, 2015; Figure 3.The decision tree adopted from the best decision tree resulted from the RPART method (Istyastono, 2015;Therneau et al., 2015).Complexity parameter of the decision tree; b) The selected decision tree with the lowest training set error rate and the lowest 10-fold cross-validation error rate (see figure 3).No evidence of overfitting was found since the ratio of the 10-fold cross-validation error rate over the training error rate is less than 1.5 (Cappel et al., 2015).Therneau et al., 2015) was used to generate decision trees using ChemPLP scores and PLIF bitstrings descriptors to optimize the SBVS protocol.The best decision tree (Figure 3) resulted in a confusion matrix with EF value of 44.78.The EF value of the optimized SBVS protocol was higher than the EF1%-ChemPLP based SBVS (3.55) and the reference (12.9).Moreover the predictive ability of the optimized protocol was statistically better in confidence level of 95% compared to the EF1%-ChemPLP based SBVS (p-value < 0.05) using McNemar's test with chi-squared value of 417.23 (Cannon et al., 2007;R Core Team, 2015).The predictive ability of the optimized protocol was considered as acceptable since it outperformed the reference protocol (Mysinger et al., 2012).By examining the statistical significances (Table I), although the optimized protocol could be used further in prospective campaigns since the EF value was sufficiently high (de Graaf et al., 2011; Istyastono et al.,  2015 b ), the sensitivity value was still considered as low (Desaphy et al., 2013).The predictive ability was mainly contributed by the high specificity value.This indicated that if a compound predicted as a COX-2 inhibitor using this optimized in silico screening protocol, it would be high likely as COX-2 inhibitor in vitro.But, if a compound predicted as a non COX-2 inhibitor, it would still likely be a COX-2 inhibitor in vitro since the sensitivity value was low caused by the high number of the false negatives (FN; Table 1).This high number of the FN was the limitation of the optimized SBVS protocol that could be improved to increase the predictive ability of the SBVS protocol by employing some more advanced approaches, for example: (i) employing anchor reactions during molecular docking simulations (Yuniarti et al., 2011) or post-docking pose selection (de Graaf et al., 2011; Istyastono et al.,  2015 b ), and/or (ii) employing advanced used of PLIF bitstrings (Desaphy et al., 2013).
Based on Figure 3, the most important descriptor to identify COX-2 inhibitors was the hydrogen bond interaction to ARG499 (previously reported as ARG513 in the older COX-2 crystal structure (Kurumbail et al., 1996)) with the inhibitors as the acceptor (PLIF bitstring #221).This interaction was identified previously as the anchor interaction of COX-2 inhibitors to COX-2 binding pocket (Kurumbail et al., 1996; Wang et al.,  2010 a ; Yuniarti et al., 2011).This anchor interaction was identified in the interaction of selective COX-2 inhibitor celecoxib in the COX-2 binding pocket (Wang et al., 2010 a ).Alternatives important interactions identified in this research were: (i) hydrogen bond interaction with the residue as the acceptor, which were to GLN178 (PLIF bitstring # 68; Previously reported as GLN192 in the older COX-2 crystal structure (Kurumbail et al., 1996)), LEU338 (PLIF bitstring #138;  Previously reported as LEU352 in the older COX-2 crystal structure (Kurumbail et al., 1996)) or PHE504 (PLIF bitstring #242; Previously reported as PHE518 in the older COX-2 crystal structure (Kurumbail et al., 1996)); (ii) hydrogen bond interaction to HIS75 (PLIF bitstring #18; Previously reported as HIS90 in the older COX-2 crystal structure (Kurumbail et al., 1996)) with the residue as the donor; and (iii) edge-to-face aromatic interaction to HIS75 (PLIF bitstring #17).
Celecoxib was also reported having the interaction to GLN178, HIS75 and PHE504 (Wang et al., 2010 a ).The interaction to LEU352 could categorized as novel interaction, but it was not rare in COX-2 binding since very recently several COX-2 inhibitors could proceed further to the clinical trial phase although the compound did not show interaction to the previously known important residues in the COX-2 binding pocket (Wang et  al., 2010 a ; Wang et al., 2010 b ).The decision tree Figure 4.The selected docking pose of resveratrol (sticks mode; carbon atoms in cyan) in COX-2 binding pocket (lines mode; carbon atoms in green).For clarity, (i) only polar hydrogens (in white) are presented, and (ii) only main chain atoms are presented for the binding pocket except for the important residues: HIS75, GLN178, ARG499, and PHE504 (see also Figure 3).Oxygens and nitrogens are colored in red and blue, respectively.Hydrogen bonds and aromatic interactions are presented in black dashed lines and yellow dashed lines, respectively.
resulted from RPART method was therefore could identify alternative important interactions which in turn could increase the predictive ability of SBVS protocols by decreasing the number of FP and FN.Moreover, the 10-fold cross validation in the construction of decision trees (Table 2) showed that there was no evidence of overfitting of the selected decision tree, and the 1000 times Y-randomization showed that there is no evidence of chance correlation (Cappel et al., 2015;Lim et al., 2009).

Phytochemicals Virtual Screening Employing Optimized Protocol
Nine compounds presented in Figures 1  and 2 were examined using the optimized SBVS protocol (Figure 3 and Table 1).The results are presented in Table 3. Surprisingly, only resveratrol was predicted as a COX-2 inhibitor in this research.It was therefore suggested that beside the protocol should be improved to reduce the number of FN, as described in the previous subsection, the protocol exclusively identified marginal COX-2 inhibitors.This should be verified by testing more representative numbers of other external marginal and potent COX-2 inhibitors.
The docking protocol used here is the same as the docking protocol employed by Mumpuni et al. (2015), which re-dock cocrystalized ligand celecoxib to the crystal structure 3LN1 with the root-mean-square deviation (RMSD) value of 0.525 Ǻ.Since the value was less than 2.0 Ǻ (Mumpuni et al., 2015), the selected pose of resveratrol here (Figure 4) could be considered as the right pose.In the visual inspection on the best pose of resveratrol in COX-2 binding pocket (Figure 4) using PyMOL (Lill and Danielson, 2011) and the examination of Figure 3 and Table 3, resveratrol was predicted as COX-2 inhibitor by binding to HIS75 (edge-to-face aromatic interaction), GLN178 (hydrogen bond), and PHE504 (hydrogen bond).Surprisingly, the selected pose for resveratrol did not bind to ARG499.The decision tree (Figure 3) provided alternative interactions in COX-2 ligand binding (Wang et al., 2010 a ; Wang et al., 2010 b ).Since the first branch of the decision tree involved hydrogen bond to ARG499 (Figure 3), employing this as the anchor interaction in the molecular docking simulation (Wang et al., 2010a;Yuniarti et al., 2011) could therefore increase the predictive ability of the SBVS protocol.

CONCLUSIONS
The optimized SBVS protocol employing PLANTS1.2 and PyPLIF followed by RPART method to produce decision tree to identify phytochemicals as COX-2 inhibitors has been retrospectively validated using DUD-E with additional marginal active compounds as the active compounds.The protocol resulted in better predictive ability in COX-2 inhibitors identification compared to the original protocol accompanying the release of DUD-E.However the sensitivity value was still considered as low, which was also indicated by predicting correctly only 1 out of 9 phytochemical as COX-2 inhibitors.The improvement could be achieved by employing hydrogen bond to ARG499 as the anchor interaction during the molecular docking simulations using PLANTS1.2before the PLIF identification using PyPLIF.
:(Bento et al., 2014); b) See Figure3for more explanation; c) TP and FN stand for True Positive and False Negative, respectively

Table I .
The confusion matrices and the statistical significances resulted from the validation of the SBVS protocol to identify COX-2 inhibitors

Table III .
The in silico screening results on some phytochemicals as COX-2 inhibitors