COMPUTER-AIDED DESIGN OF CHALCONE DERIVATIVES AS LEAD COMPOUNDS TARGETING ACETYLCHOLINESTERASE

Email: enade@usd.ac.id ABSTRACT One of well-established biological activities for chalcone derivatives is as acetylcholinesterase inhibitors, which can be developed for the therapy of Alzheimer’s disease. Assisted by retrospectively validated structure-based virtual screening (SBVS) protocol to identify potent acetylcholinesterase inhibitors, 80 chalcone derivatives were designed and virtually screened. The F-measure value as the parameter of the predictive ability of the SBVS protocol developed in the research presented in this article was 0.413, which was considerably better than the original SBVS protocol (Fmeasure=0.226). Among the screened chalcone derivatives two were selected as potential lead compounds to design potent inhibitors for acetylcholinesterase: 3-[4-(benzyloxy)-3methoxyphenyl]-1-(4-hydroxy-3-methoxy-phenyl)prop-2-en-1one(3k) and 3-[4-(benzyloxy)-3-methoxyphenyl]-1-(4hydroxyphenyl)prop-2-en-1-one (4k).


INTRODUCTION
Alzheimer's disease (AD), a progressive brain disorder, is a neurodegenerative disease which becomes symptomatic after brain changing happened over the years (Aggarwal et al., 2015).The prevalence of dementia, as AD symptom were varied (Rizzi et al., 2014) and the number of people living with AD have been predicted to be increased two times every two decades from 46.8 million by 2015 to 74.7 million people by 2030 (Prince et al., 2015).The deficiency of the brain neurotransmitter acetylcholine (ACh) is often associated with pathogenesis of AD (Tabet, 2006).Acetylcholine plays important roles in the nervous system such as increasing neurotransmitter release, supporting synaptic transmission, inducing synaptic plasticity, and coordinating firing of groups of neurons (Picciotto et al., 2012;Tsuda, 2012).The hydrolysis of ACh into choline and acetic acid, a reaction needed to allow the returning of cholinergic neuron to the resting state, was catalyzed by a family of enzymes called cholinesterase (Colovic et al., 2013).Acetylcholinesterase (AChE), one of cholinesterase types found in many types of conducting tissue, is a highly possible therapeutic target of Alzheimer disease (Mehta et al., 2012).
Recently, research on AChE inhibitor has been rapidly developed due to the availability of supporting facilities for designing AChE inhibitor compounds in the treatment of AD (Singh et al., 2013).Promising compounds to be developed as AChE inhibitor were chalcone derivatives (Sukumaran et al., 2016;Tran et al., 2016).Chalcones or 1,3-diphenyl-2propene-1-one can be obtained both from the plants (Abdelwahab, 2013;Adewusi et al., 2010) and from the synthetic way due to condensation reaction between substituted aromatic aldehyde with substituted acetophenones in alkaline condition (Jayapal and Sreedhar, 2010).Nevertheless, it is important to consider the structure-activity relationship and computational approach in designing more active AChE inhibitors (Andersson et al., 2013;Ece et al., 2015).Some strategies to optimize predictive abilities of Structure-based Virtual Screening (SBVS) protocols have been developed (de  Graaf et al., 2011; Istyastono, 2015; Istyastono  et al., 2015 a ; Istyastono and Setyaningsih, 2015;  Radifar et al., 2013 b ; Sirci et al., 2012).Some of the strategies have been employed successfully in the prospective screenings to discover novel potent fragments for histamine H1 ( de Graaf et al., 2011), H3 (Sirci et al., 2012) and H4 receptors (Istyastono et al., 2015 a ), and in the repurposing selective cyclooxygenase-2 inhibitor celecoxib as a potent ligand for estrogen α receptor (Istyastono et al., 2015 b ; Radifar et al., 2013 b ).In 2011, by combining ChemPLP score (Korb et al., 2009) and the post-docking scoring functions of the interaction fingerprint (IFP) program (Marcou and Rognan, 2007) as the cutoff values, de Graaf et al. could reach Fmeasure value of about 0.515 and discover 19 novel fragments with Ki ranging from 10 μM to 6 nM for histamine H1 receptor after virtually screened about 13 million compounds and tested only 26 hits resulted from the prospective virtual screening (de Graaf et al., 2011).In 2012, employing Fingerprint for Ligand and Protein (FLAP) software as the post-docking modeling strategy to identify active histamine H3 fragments Sirci et al. could discover 18 novel fragments out of 29 tested hits resulted from prospective virtual screening of 156,090 compounds (Sirci et al., 2012).In 2015, by employing several homology models with different reference ligands, using postdocking scoring functions of the IFP program (Marcou and Rognan, 2007), Istyastono et al. could optimize the F-measure value from 0.018 to 0.553 and discover 9 novel histamine H4 fragments after virtually screen 43,326 fragments and tested 37 hits resulted from the prospective virtual screening (Istyastono et al.,  2015 a ).Fortunately the structure of histamine receptors was well-studied, which established ASP 3.32 as the anchor of the receptor-ligand binding (Istyastono et al., 2011 a ; Jongejan et al.,  2008).These success stories employed the anchor interaction to filter the docking poses before proceeding to the subsequent step in the SBVS protocols (de Graaf et al., 2011;  Istyastono et al., 2015 a ; Sirci et al., 2012).However, the information of the anchor interaction is still rare for other targets in drug discovery.Moreover, instead of having anchor interaction, some targets accept more poses for their potent ligands (Istyastono, 2015).Several attempts to optimize SBVS protocols by taking into account several poses of active ligands (Istyastono et al., 2011 b ; Wang et al., 2015) by decision trees construction have been performed targeting estrogen α receptor and could optimize the F-measure value from 0.215 to 0.642 (Istyastono, 2015;Setiawati et al., 2014).Notably, the same strategy could not be applied in this research since no decision tree could be constructed.Another strategy to optimize the predictive ability of the SBVS protocol in this research should be developed and tested.
The research presented in this paper was aimed to design chalcone derivatives as potent AChE inhibitors by utilizing retrospectively validated SBVS protocols constructed by employing PLANTS1.2 as the molecular docking software (Korb et al., 2009(Korb et al., , 2007))and PyPLIFas the Protein-Ligand Interaction Fingerprints (PLIF) identification software for re-scoring (Radifar et al., 2013 a ; Radifar et al.,  2013 b ).New post PLIF identification descript or employing ensemble PLIF (abbreviated to ensplif) was introduced and employed here to construct and optimize SBVS protocols (Istyastono et al., 2017), which was then assessed (Cannon et al., 2007;de Graaf et al., 2011;Desaphy et al., 2013;Powers, 2011) and compared to the original SBVS accompanying the release of DUD-E (Mysinger et al., 2012).The best SBVS protocol constructed in this research could outperform the predictive ability of the SBVS protocol of DUD-E.The protocol was subsequently employed to assist the selection of chalcone derivatives as lead compounds in the development of acetylcholinesterase inhibitors

Ligands preparation for retrospective virtual screening
Similar to virtual target preparation, previously published methods for ligands preparation (Istyastono, 2016;Istyastono and Yuniarti, 2016) were adopted and modified.Known AChE active inhibitors and their decoys were downloaded in their SMILES format from DUD-E (Mysinger et al., 2012).They were stored locally as actives_final.ism, and decoys_final.ism.Each compound in the files was then subjected to Open Babel 2.2.3 conversion software to be converted in its three dimensional (3D) format as a mol2 file.The reprot module in SPORES was subsequently employed to properly check and assign the protonation state of the mol2 file into a proper mol2 file ready to be docked by using PLANTS1.2docking software.

Similar
to previously published procedures (Istyastono et al., 2015 b ; Istyastono  and Setyaningsih, 2015; Setiawati et al., 2014), all virtual screenings were performed by docking program PLANTS1.2.For each compound, 50 poses were produced and clustered at root mean square deviation (RMSD) of 1.0 Å.The ChemPLP scoring function was used at speed setting 2. The binding pocket was defined by the coordinates of the center of the co-crystal ligand_HUX803_0.mol2and a radius of 5 Å.All other options of PLANTS1.2 were left at their default setting.Every compound was virtually screened five times independently.Seven different interaction types (negatively charged, positively charged, hydrogen bond (H-bond) acceptor, H-bond donor, aromatic face-toedge, aromatic face-to-face, and hydrophobic interactions) were subsequently identified by employing PyPLIF for each docking pose (Radifar et al., 2013 a ; Radifar et al., 2013 b ).

Predictive ability of the SBVS protocols
The predictive ability of the SBVS protocols was determined by their F-measure value of the retrospective validation (Cannon et al., 2007;Desaphy et al., 2013).The SBVS protocols predict screened compound as active or inactive.Since the biology activity as AChE inhibitor of the compounds in the retrospective validation attempts using DUD-E has already known (Mysinger et al., 2012), the confusion matrix could be constructed (Cannon et al., 2007): (i) Active compounds which were predicted as active were assigned as true positives (TP); (ii) Active compounds which were predicted as inactive were assigned as false negatives (FN); (iii) Decoys which were predicted as active were assigned as false positives (FP); and (iv) Decoys which were predicted as inactive were assigned as true negatives (TN).The F-measure value (Cannon et al., 2007;Desaphy et al., 2013) was calculated as follows: (2 × TP)/((2 × TP) + FN + FP).

Predictive ability optimization of the SBVS protocols
For comparison as the standard, the compounds in the retrospective SBVS were ranked based on the best ChemPLP score and the ChemPLP score of the 1% identified FP was used as the cutoff to predict as active or inactive (de Graaf et al., 2011; Istyastono et al.,  2015 a ; Korb et al., 2009;Mysinger et al., 2012).The optimization of the SBVS protocols was performed by optimizing the post SBVS decision trees constructed in RPART package (Istyastono, 2015;R Core Team, 2016;Therneau et al., 2015) by using ensplif as the descriptors.Besides considering all poses resulted in the SBVS, ensplif as descriptors were calculated by selecting poses by using ChemPLP scores systematically from -125 to 0. The algorithm of the ensplif calculation was as follows: (i) Poses selection based on the ChemPLP score as the cutoff; (ii) In every PLIF bitstring, all "on" interactions were counted; (iii) The numbers resulted from point (ii) were divided by 250 since the molecular docking resulted in 50 docking poses in each run and every compound was screened 5 times (see subsection Automated molecular docking and virtual screening).

Computer-aided identification of chalcone derivatives as lead compounds to discover potent AChE inhibitors
Eighty chalcone derivatives were designed (Table I).Every structure in table I was built and converted to SMILES format by using module Copy as smiles in Marvin Sketch 14.11.10.0 (ChemAxon, 2014).The structure was then undergone ligand preparation process as presented in the subsection Ligands preparation for retrospective virtual screening.The best SBVS protocol to identify potent AChE inhibitors resulted from the subsection Predictive ability optimization of the SBVS protocols was used to virtually screen the designed compound.Based on the results of the virtual screening, some lead compounds were selected.The synthesis feasibility of the selected lead compounds was subsequently analyzed.

RESULTS AND DISCUSSION
Aimed to initiate the discovery and development of potent AChE inhibitors, this research constructed and optimized the predictive ability of SBVS protocol to identify potent AChE inhibitors and employed the best protocol to select best potential inhibitors from 80 chalcone derivatives presented in Table I as lead compounds.The construction and optimization of the SBVS protocol to identify potent AChE inhibitors employed potent AChE inhibitors with IC50 value of less than 1 M and their decoys organized and stored in DUD-E (Mysinger et al., 2012) as the retrospective compounds for the validation.Chalcone derivatives were selected since the synthesis expertise in the field is accessible for further investigation in the subsequent discovery and development process (Imran et al., 2015).

Construction and Predictive Ability Optimization of the SBVS Protocol
During molecular docking and PLIF identification phases in the retrospective SBVS campaign, nineteen ligands and 1,735 decoys could not pass the protocol, which were then predicted as inactive compounds.Therefore, the protocol has resulted in 19 FN and 1735 TN during the molecular docking and the PLIF identification phases.In every remaining result, the docking pose with the best ChemPLP score was collected and ranked based on the ChemPLP score.In this ranked database, at 1% FP (263 compounds) there were 52 TP, 401 FN, and 25,987 TN.Therefore the F-measure value of this ranked database was 0.135, which was corresponded to the enrichment factor value of 11.46.This predictive ability of the SBVS protocol by using 1% FP as the cutoff to predict the activity was much worse than the original protocol accompanying the release of DUD-E, which showed F-measure and enrichment factor values of 0.226 and 20.1, respectively (Mysinger et al., 2012).Moreover, with this predictive ability, the SBVS protocol is not suggested to be used for prospective virtual screening campaigns (Chen, 2015; de  Graaf et al., 2011; Istyastono et al., 2015 a ; Sirci et  al., 2012).Predictive ability optimization of this SBVS protocol was therefore required.
Equipped with the facts that some ligands could interact with their protein targets in more than one pose (Istyastono et al., 2011 b ;  Wang et al., 2015) and inspired by the lock-andkey theory (Koshland, 1994;Stoddard and Koshland, 1992), we introduce a novel post PLIF identification technique to optimize the predictive ability of the SBVS protocol by using ensplif.This strategy was successfully utilized to slightly increase the SBVS protocol to identify potent ligands for estrogen α receptor (Istyastono, 2015;Istyastono et al., 2017).Therefore, we were tempted to apply the same technique in this research since employing the technique used by Istyastono (2015) could not result in any constructed decision tree in this research.Taking into account all docking poses in every retrospective screened compound in this research resulted in a post SBVS decision tree using ensplif as the descriptors with Fmeasure value of 0.371, which was better than the ChemPLP-based ranked database at 1% FP and to the original protocol accompanying the release of DUD-E(F-measure = 0.226; Mysinger et al., 2012).Notably, with this Fmeasure value the SBVS protocol has already accepted for further prospective virtual screening.However, the protocol could still be Table I.Designed and virtually screened chalcone derivatives in this research.
optimized by selecting the more plausible docking poses with a certain ChemPLP score as the cutoff (de Graaf et al., 2011).The optimization by systematically assessing the predictive ability of employing ChemPLP scores from -125 to 0 as the cutoff for pose selection resulted in F-measure values ranging from 0.103 to 0.413.The best SBVS protocol with F-measure value of 0.413 was comprised of molecular docking simulation using PLANTS1.2(Korb et al., 2009), PLIF identification of the docking poses using PyPLIF (Radifar et al., 2013 a ; Radifar et al.,  2013 b ), and the decision tree constructed using RPART (Therneau et al., 2015) with ensplif calculated from PLIF of docking poses with ChemPLP score ≤ -33 (Figure 1).
The RPART method to construct decision trees used in the research is one of supervised machine learning methods which are susceptible for overfitting (Cannon et al., 2007;Gabel et al., 2014) and chance correlation (Lim et al., 2009;Tarcsay et al., 2013;Zambre et al., 2007).A model is overfitting if the ratio of the cross-validation error rate over the training error rate is more than 1.5 (Cappel et al., 2015;Istyastono, 2016).Since the ratio of the crossvalidation error rate over the training error rate of the best decision tree (Figure 1) was 1.158, this model was not overfitting.In order to ensure if the best decision tree was not a chance correlation, 1000 times y-randomization were performed (Lim et al., 2009;Smits et al., 2010) and there was no y-randomized model with F-measure higher than the F-measure of the best decision tree (Figure 1).Therefore, there was no evidence of overfitting and chance correlation of the SBVS protocol comprising the best decision tree constructed using RPART with ensplif calculated from PLIF of docking poses with ChemPLP score ≤ -33 as the descriptors.
There are 11 selected ensplif in the decision tree (Table II).Notably, hydrophobic Figure 1.The best decision tree employing ensplif as descriptors to identify potent ligands for AChE.Four types ofhow ligands bind to AChE or "key" are identified.
interaction to PHE 331 is essential for AChE ligand since the ligand should have ensplif #302 ≥ 0.878 to be predicted active as potent ligand for AChE, otherwise it will be predicted as inactive (Figure 1).Hydrophobic interactions dominate the selected interactions.This interaction type is however responsible only for the affinity but not for the selectivity because of its lack of directionality (Bissantz et al., 2010;Desaphy et al., 2013;Marcou and Rognan, 2007).Only three out of these 11 interactions have directionality (Bissantz et al., 2010;Desaphy et al., 2013;Marcou and Rognan, 2007;Radifar et al., 2013b): (i) Ensplif#193 (hydrogen bond to TYR 130 with tyrosine as the donor); (ii) Ensplif#208 (hydrogen bond to SER 200 with serine as the acceptor); and (iii) Ensplif#297 (aromatic edge-to-face interaction to PHE 330 ).These three interactions are responsible not only for the ligand affinity but also for the ligand selectivity (Bissantz et al., 2010;Istyastono et al., 2011aIstyastono et al., , 2011b)).Hydrogen bond and aromatic interaction have been identified as the main factors in activity cliffs with at least 100-fold activity increase (Furtmann et al., 2015).Based on the decision tree presented in Figure 1, there are 4 types of "key" that can bind to the "lock" AChE.Notably, all branches leading to active compound in the decision tree (Figure 1) or the "keys" involve either the hydrogen bond (ensplif#208 or #193) or the aromatic interaction (ensplif#297).
Oddly, in contrary with the hydrogen bonds, which are favorable, the aromatic interaction here is unfavorable.
In the retrospective attempts, the optimized protocol resulted in F-measure value of 0.413 from 124 TP; 329 FN; 26,226 TN and 24 FP.By employing the retrospective results, the sensitivity (true positive rate) and the specificity (true negative rate) can be calculated (Cannon et al., 2007) and get the following results: 0.274 and 0.999, respectively.This indicates that the optimized protocol tends to correctly predict inactive compounds.A predicted active compound using the optimized protocol is therefore highly probable to be an active AChE inhibitors but it is highly suggested to further investigate a predicted inactive compound especially the one that has ensplif number 302 of more than or equal to 0.878 (Figure 1).

Virtual Screening on Designed Chalcone Derivatives Employing the Optimized Protocol
Eighty chalcone derivatives presented in Table I were virtually screened employing the optimized SBVS protocol (Figure 1).However, none of these compounds were predicted as active.Therefore, selection of lead compounds instead of identification of potent AChE inhibitors from the chalcone derivatives (Table I) was the subsequent step.As mentioned in the previous subsection, Table II.The ensplif as selected descriptors in the decision tree (Figure 1  a) Ref: (Radifar et al., 2013b) all predicted active compounds must have ensplif #302 ≥ 0.878 (Figure 1).Among the virtually screened chalcone derivatives (Table I), only compounds 3k and 4k showed ensplif #302 ≥ 0.878.The ensplif #302 of compounds 3k and 4k were 0.956 and 0.944 (Figure 2).Therefore, compounds 3k and 4k were selected as lead compounds for further investigation in  thisquest to discover potent AChE inhibitors.Since compounds 3k and 4k showed ensplif #365 < 0.678 (Figure 2), the shortest path to design potent AChE inhibitor by employing these lead compounds is the path of key #1 (Figure 1).Additional hydrogen bond to SER 200 is therefore required (Figures 1 and 2).Based on visual inspection using PyMOL in docking poses of 3k and which show hydrophobic interaction to PHE 331 but do not show hydrophobic interaction to GLY 441 (Figure 2), this could be achieved by appending or substituting the R2 part of the selected lead compounds (Table I) with functional groups which have possibility to be hydrogen bond donors, e.g., guanidine, amide and isothiourea.Synthesis and further verification using in vitro analysis should be done in the near future to provide evidences that compounds 3k and 4k could serve as lead compounds.Retrosynthetic analysis or synthon disconnection approaches of compounds 3k and 4k (Figure 3).We propose here o-benzylvanillin (4-(benzyloxy)-3methoxybenzaldehyde) to be used as starting material which could be synthesized using vanillin (4-hydroxy-3-methoxybenzaldehyde) and benzylbromide (bromomethylbenzene). Further, o-benzylvanillin could be reacted with acetovanillone (1-(4-hydroxy-3-methoxyphenyl) ethanone) for yielding compound 3k and 3'-hydroxyacetophenone (1-(4-hydroxyphenyl) ethanone) for yielding compound 4k.

CONCLUSIONS
Compounds 3k and 4k have been identified as potent lead compounds in the drug discovery and development for AD therapy targeting AChE.The identification of these lead compounds employed a retrospectively validated SBVS protocol with a decision tree employing ensplif resulted from molecular docking PLANTS1.2 and PLIF identification PyPLIF.In the retrospective validation, the SBVS protocol showed Fmeasure value of 0.413, which outperformed the original SBVS accompanying the database used for the retrospective validation.

Figure 2 .
Figure 2. Docking poses of 3k(A) and 4k(B) which show hydrophobic interaction to PHE 331 but do not show hydrophobic interaction to GLY 441 and the corresponding important ensplif values (C).Carbon atoms of compounds 3k and 4k are presented in magenta.Carbon atoms of AChE are presented in green.Oxygen and nitrogen atoms are presented in red and blue, respectively.Hydrogen atoms are not shown for clarity.The ligands are in lines mode, while the enzyme is in surface mode with 0.5 transparencies.Important residues following path of key #1 in Figure 1 are also shown in stick mode.
) and their corresponding interactions in the AChE binding pocket