CONSTRUCTION AND RETROSPECTIVE VALIDATION OF STRUCTURE-BASED VIRTUAL SCREENING PROTOCOLS TO IDENTIFY POTENT LIGANDS FOR HUMAN ADRENERGIC BETA-2 RECEPTOR

Email: enade@usd.ac.id ABSTRACT Adrenergic Beta-2 Receptor (ADRB2) is a member of Gprotein coupled receptors family, which has served as targets for more than 30% of top-selling drugs in the market. Recently, an enhanced dataset of ligands and decoys for ADRB2 has publicly available. However, the original retrospective structure-based virtual screening campaign accompanying the dataset showed relatively poor quality with enrichment factor of true positives at 1% false positives (EF1%) value of 3.9. In this article, the construction and retrospective validation of a structure-based virtual screening protocol by employing PLANTS1.2 as the molecular docking software and PyPLIF as an alternative post docking scoring functions are presented. The results show that the developed protocols have better quality compared the original structure-based virtual screening with EF1% values of 24.24 and 8.22 by using ChemPLP from PLANTS1.2 and by using Tc-PLIF from PyPLIF, respectively. Further investigation by performing systematic filtering resulted in the identification of D113, S203, and N293 as molecular determinants in ADRB2-ligand binding.


INTRODUCTION
Adrenergic Beta-2 Receptor (ADRB2) plays an important role as the molecular target for drugs in the therapy for diseases as diverse as heart failure, hypertension and asthma (Cherezov et al., 2007;Taylor, 2007).ADBR2 is a member of G-Protein Coupled Receptors (GPCRs) family, to which more than 30% of top-selling drugs in the market bind (Klabunde and Hessler, 2002;Surgand et al., 2006).Notably, human ADRB2 was also the first human GPCR that could be crystallized and publicly available to provide insight on how ligands bind to GPCRs (Cherezov et al., 2007).The ADRB2 crystal structure has been subsequently employed in some prospective Structure-Based Virtual Screening (SBVS) campaigns and successfully discovered novel potent ADRB2 ligands (Kolb et al., 2009;Yakar and Akten, 2014).
The successful three dimensional (3D) structure characterization through crystallography of ADRB2 bound to its antagonist carazolol (Cherezov et al., 2007) was followed by 3D characterization of some other GPCRs (Chien et al., 2010;Jaakola et al., 2008;Shimamura et al., 2011;Wacker et al., 2010;Wu et al., 2010), which have offered opportunities to construct, validate and perform SBVS to discover novel potent ligands for a particular GPCR both on the crystal structures (Carlsson  et al., 2010; de Graaf et al., 2011 a ; Katritch et al., 2010;Kolb et al., 2009;Yakar and Akten, 2014) and homology models (Carlsson et al., 2011; de  Graaf et al., 2011 b ; Istyastono et al., 2011 b ; Sirci  et al., 2012; Tarcsay et al., 2013).Solely used of SBVS approaches on Histamine H1 Receptor (HRH1) crystal structure in the recent virtual screening campaigns showed extraordinary results, both retrospectively and prospectively (de Graaf et al., 2011 a ).One of the key strategies of the virtual screening was filtering the Protein-Ligand Interaction Fingerprint (PLIF) (Marcou and Rognan, 2007; Radifar et  al., 2013 a ): Only docking poses that form a hydrogen bond (H-bond) and an ionic interaction with D107 were considered (de Graaf et al., 2011 a ).This strategy can be recognized as "using prior knowledge" in SBVS campaigns (Seifert, 2009;Yuniarti et al., 2011).The customization of the SBVS protocols by filtering on key interactions has increased the SBVS quality significantly (de Graaf et al.,  2011 a ; Sirci et al., 2012; Yuniarti et al., 2011).Unfortunately, the information of key interactions is available only for a few drug targets.The key interaction used in the SBVS campaigns on crystal structure of HRH1 was identified from previous Site-Directed Mutagenesis (SDM) studies and chemogenomic analysis (de Graaf et al., 2011 a ; Shin et al., 2002;  Surgand et al., 2006).Besides SDM studies and chemogenomic analysis, some computer-aided strategies could be employed in order to obtain information on key interactions that can assist the improvement of SBVS quality.Istyastono et  al. (2011 a ) employed QSAR, 3D-QSAR, homology modeling and molecular dynamics to identify the molecular determinants of ligand binding modes in the Histamine H3 and H4 Receptors (HRH3 and HRH4, respectively).The obtained information was subsequently used in SBVS campaigns on homology models the receptors (Istyastono et al., 2011 a ; Sirci et al.,  2012).However, the use of multiple or combined approaches reflects time and resource consuming methods.Therefore, development of more effective and efficient computational methods to identify key interacttions as well as the molecular determinants in protein-ligand binding to increase the SBVS quality is of considerable interest.
The research presented in this paper was aimed to perform retrospective SBVS campaigns on a newly published enhanced dataset of ligands and decoys (DUD-e) for ADRB2 (Mysinger et al., 2012) and to identify the PLIF of each compound to ADRB2 by employing PyPLIF (Radifar et al., 2013 a ; Radifar  et al., 2013 b ; Setyaningsih et al., 2013).The SBVS qualities were subsequently assessed (de Graaf and Rognan, 2008) and compared to the original SBVS accompanying the release of DUD-e (Mysinger et al., 2012).The results showed that both scoring strategies employed in this research, i.e.ChemPLP score and Tc-PLIF resulted in a better SBVS.Notably the SBVS quality using ChemPLP scores outperformed the SBVS quality using Tc-PLIF values.The PLIFs of ADRB2-ligands identified in this research were subsequently employed in the key interactions identification in a further investigation by systematic filtering.These approaches have led to the identification of D113, S203 and N293 as the molecular determinants on ADRB2-ligand binding.

MATERIAL AND METHODS
The crystal structure of human ADRB2 obtained from the protein data bank (PDB) with PDB id of 3NY8 (Wacker et al., 2010) was used as the reference structure.Ligands (231) and decoys (15000) for ADRB2 from DUD-e (Mysinger et al., 2012) were employed as the test compounds to perform retrospective SBVS.All calculations and computational simulations were performed on a Linux (Ubuntu 10.04 LTS Lucid Lynx) machine with Intel(R) Xeon(R) CPU E31220 (@ 3.10 GHz) as the processors and 8.00 GB of RAM.Computational medicinal chemistry applications employed in this research were SPORES (ten Brink and Exner, 2009), PLANTS1.2 (Korb et al., 2009), Open Babel 2.2.3 (O'Boyle et al., 2011), PyPLIF 0.1.1 (Radifar, 2013 a ), and PyMOL 1.2r1 (Lill and Danielson, 2011).Statistical analysis was performed by using R 3.1.0( R Development Core Team, 2008).A shell script to take into account only poses that have the predefined interaction bitstring after the PLIF identification using PyPLIF (Table I).

Computational methods Virtual molecular target preparation
The crystal structure of human ADRB2 with the PDB id of 3NY8 (Wacker et al., 2010) was downloaded from the PDB website (http://www.rcsb.org/pdb/explore.do?structur eId=3ny8).The module splitpdb in SPORES was used to split the receptor, the co-crystal ligand, and the water molecules discovered in the pdb file and to subsequently convert the files into mol2 files ready to be employed in molecular docking simulation employing PLANTS1.2 docking software.This procedure produced the virtual target protein.mol2and the co-crystal ligand ligand_JRZ1203_0.mol2.

Ligands preparation for retrospective virtual screening
Known ADRB2 active ligands and their decoys were downloaded in their SMILES format from DUD-e (Mysinger et al., 2012).There were 231 ligands and 15.000 decoys downloaded and stored locally as actives_final.ism and decoys_final.ism.The files were subsequently concatenated into a file named all.smi.Each compound in the file was then subjected to Open Babel 2.2.3 conversion software to be converted in its three dimensional (3D) format at pH 7.4 as a mol2 file.The settypes module in SPORES was subsequently employed to properly check and assign the mol2 file into a proper mol2 file ready to dock by using PLANTS1.2docking software.

Automated molecular docking and virtual screening
All virtual screenings were performed by docking program PLANTS1.2.For each compound, 50 poses were calculated and scored by the ChemPLP scoring function at speed setting 2. The binding pocket of ADRB2 was defined by the coordinates of the center of the reference ligand and a radius of 5 Å (which is the maximum distance from the center defined by a 5 Å radius around the reference ligand).All other options of PLANTS1.2 were left at their default setting.Every compound was virtually screened three times.

Rescoring using protein-ligand interaction fingerprints calculated by PyPLIF
The co-crystal ligand binding mode in the ADRB2 crystal structure was used to generate reference PLIF by using PyPLIF.Seven different interaction types (negatively charged, positively charged, hydrogen bond (Hbond) acceptor, H-bond donor, aromatic faceto-edge, aromatic face-to-face and hydrophobic interactions) were used to define the PLIF.The cavity used for the PLIF analysis is consisted of a set of residues in the binding pocket of ADRB2 defined in subsection Automated molecular docking and virtual screening.Note that for each PLANTS docking pose, a unique subset of protein coordinates with rotated hydroxyl hydrogen atoms were used to define the PLIF.Standard PLIF scoring parameters, and a Tanimoto coefficient (Tc-PLIF) measuring PLIF similarity with the reference molecule pose was used to re-rank the docking poses of the known active ADRB2 ligands and their decoys.

SBVS quality assessment
The docking pose with the best ChemPLP score or the best Tc-PLIF value was selected for each virtually screened compound.Virtual screening accuracies were determined in terms of Area Under the Curve (AUC) of the Receiver-Operator Characteristic (ROC) plots computed with R statistical computing software Table I.Shell script to filter based on the predefined interaction bitstring (Radifar, 2013 a  The shell script should was adopted according to the relevant predefined bitstring.In this example by Radifar et al. (2013 a ), the bitstring number 103 is used version 3.1.0and the enrichment in True Positives rate (TP) reported at a false positive rate (FP) of 1% (EF1%) value.The EF1% values were calculated as follows: EF1% = TP/FP1%.

Systematic filtering on PLIFs
A shell script to perform systematic filtering on PLIFs resulted in subsection Rescoring using protein-ligand interaction fingerprints calculated by PyPLIF was created by adopting the one (Table I) provided by Radifar et al. (2013 a ).For every filtering result, a new rank based on the ChemPLP values was created and the EF1% values were then calculated (de Graaf et al., 2011 a ; Sirci et al.,  2012).The molecular determinants were identified by correlating the bitstring interaction that give significantly better EF1% values compared to the default ones (without PLIF filtering) to the relevant binding pocket residues (Wacker et al., 2010).The results were then retrospectively validated by examining available mutation data in the literatures stored in GPCRDB (Vroling et al., 2010).

Visual inspection
Visual inspection using PyMOL 1.2r1 (Lill and Danielson, 2011) was performed to investigate manually some representative docking poses to examine the plausible molecular determinants of the ADRB2-ligands binding.

RESULTS AND DISCUSSION
This research was aimed to construct a valid SBVS protocol to identify potent human ADRB2 ligands by employing PLIF identification using PyPLIF as an alternative rescoring strategy (Radifar et al., 2013 b ).The additional rescoring procedures offer possibilities to identify the molecular determinant in the ADRB2-ligand binding by providing PLIF bitstrings from every interaction types of every docking poses to all amino acids in the binding pocket (de Graaf et  al., 2011 a ; Marcou and Rognan, 2007; Radifar et  al., 2013 a ).Subsequent investigation by performing systematic filtering on the bitstrings could lead to the identification of the critical bitstrings that affect the SBVS quality.The identified critical bitstrings were suggested to be correlated to the potential molecular determinant in the ADRB2-ligand binding (de Graaf and Rognan, 2008; Istyastono et al.,  2011 b ).The virtual screening campaigns has resulted 2,284,650 docking poses and 799,627,500 bitstrings for all 15,231 screened ADRB2 ligands or decoys downloaded from DUD-e.By employing either ChemPLP score originated from PLANTS1.2 or Tc-PLIF value of PyPLIF (Korb et al., 2009; Radifar et al.,  2013 a ), the best pose for each screened compound was selected.In order to evaluate and compare the SBVS qualities, the selected poses were ranked according the relevant scoring functions, and the ROC curves were plotted were accordingly (Figure 1).The results showed that the developed protocols had better qualities compared the original SBVS with EF1% values of 24.24 by using ChemPLP from PLANTS1.2 as the scoring functions and of 8.22 by using Tc-PLIF from PyPLIF as the scoring functions (Figure 1).Based on Figure 1, the AUC values were calculated in 95 % level of confidence (de Graaf and Rognan, 2008).The AUC values resulted in employing PLANTS1.2 as the scoring functions and Tc-PLIF from PyPLIF as the scoring functions were 82.97 and 60.23, respectively.
Compared to the original SBVS accompanying the DUD-e release (Mysinger et al., 2012) with the EF1% value of 3.9 and the AUC value of 69.26, the SBVS protocols using PLANTS1.2developed in this research showed better qualities in term of EF1% and AUC values.Interestingly, the results indicated also that the SBVS on ADRB2 employing ChemPLP as the scoring functions outperformed the SBVS employing Tc-PLIF as the scoring functions.However, the PLIFs resulted in the PLIF identification using PyPLIF could serve as starting points in the identification of the critical bitstrings, which in turn could be correlated to the important residues in the ADRB2-ligand binding.
The PLIFs of docking poses resulted in this research have subsequently served as useful tools to identify the molecular determinants in ADRB2-ligands binding.The systematic filtering on all PLIF bitstrings for both "on" (represents favorable interaction) and "off" (represents unfavorable interaction) resulted in some bitstrings that gave better EF1% values compared to the unfiltered SBVS campaigns (the default ones).These important bitstrings and their related amino acid residues are presented in Table 2.By employing mutation data stored in GPCRDB (Vroling et al., 2011), the following were the identified and retrospectively validated molecular determinants in ADRB2-ligand binding: D113, S203 and N293.These could serve as the key information in the future ADRB2-ligand design (Figure 2): Potent ligands for ADRB2 should form ionic interaction to D113 and H-bond to both D113 (Strader et al., 1987;Elling et al., 1999;Ballesteros et al., 2001;Gouldson et al., 1997) and S203 (Suryanarayana and Kobilka, 1993;Sato et al., 1999;Liapakis et al., 2000;Rasmussen et al., 2011), but not to N293 (Wieland et al., 1996).
The SBVS campaigns using ChemPLP score as the scoring functions and considering only poses that have an ionic interaction to D113 resulted in a better virtual screening quality with EF1% of 33.33 (Table 2).This means that a virtually screened compound possessing ionic interaction to D113 with a better ChemPLP score compared to the compound recognized in the EF1% value in the retrospective SBVS campaigns CHEMBL38205 Tabel II.Filtering on PLIF bitstrings for both "on" (represents favorable interaction) and "off" (represents unfavorable interaction) that gave better EF1% values compared to the unfiltered SBVS campaigns (EF1% = 24.24).
On  *) Based on mutation data stored in GPCRB (Vroling et al., 2011).N/A: Not available (Ki at ADRB2 = 3-7nM (Tejani- Butt and Brunswick, 1986;Mysinger et al., 2012); ChemPLP score = -101.916)has 33.33 times better opportunities to be confirmed as a ADRB2 ligand compared to any random selected compounds.Figure 2 is presented to examine how the representative ligand CHEMBL38205 interacts to the ADRB2 binding pocket in comparison to the co-crystal ligands ICI 118,551 (Wacker et al., 2010).As can be seen in Figure 2, the co-crystal ligand ICI 118,551 forms an ionic bond to D113, Hbonds to D113 and N312, and an aromatic interaction to F290 (Wacker et al., 2010) (Figure 2A), while the representative ligand forms ionic bond to D113, H-bonds to D113 and S203, and an aromatic interaction to F290 (Figure 2B).Notably, both compounds do not form H-bond to N293.Similar to histamine receptors, ADRB2 as an aminergic GPCR has a conserved D113 residue as an ionic bond anchor (Cherezov et al., 2007; de Graaf et al.,  2011 a ; Istyastono et al., 2011 a ; Istyastono et al.,  2011 b ; Shimamura et al., 2011; Wacker et al.,  2010).The SBVS protocols developed in this research showed that by adding knowledge of molecular determinants in ADRB2-ligand binding in the protocols could increase the virtual screening quality as well as to identify the most plausible binding pose of ligands in the ADRB2 binding pocket.The similar strategy has successfully shown in the SBVS on a HRH1 crystal structure and HRH3 homology models (de Graaf et al., 2011 a ; Sirci et al., 2012).

CONCLUSIONS
The constructed SBVS protocol employing PLANTS1.2 and PyPLIF to identify ligands for ADRB2 has been retrospectively validated using newly published database  Butt and Brunswick, 1986;Mysinger et al., 2012); ChemPLP score = -101.916) in the ADRB2 binding pocket resulted in the SBVS with filtering on poses that have an ionic interaction to D113 (see Table II) with the residue as the anion (B).The ADRB2 (green carbon atoms) is presented in the cartoon mode with only some important residues are presented in the balls and sticks mode.Oxygen, nitrogen and hydrogen are presented in red, blue and white, respectively.For clarity, only polar hydrogen and interacting residues to the crystal ligand are shown as balls and sticks, while ADRB2 residues from sequence 161 to sequence 200 are not shown.H-bonds and ionic bond are depicted by black dashed lines and red dashed lines, respectively.
DUD-e.The protocol showed better virtual screening qualities in ligand identification compared to the original protocol accompanying the release of DUD-e.An improvement on virtual screening quality was subsequently achieved by adding information of molecular determinants in ADRB2-ligand binding into the protocol.The produced PLIFs have served as useful tools in the recognition of the molecular determinants in ADRB2-ligand binding: D113, S203, and N293.

Figure 1 .
Figure 1.ROC curves resulted in the retrospective SBVS campaign.The black lines represent the ROC curves when the results were ranked by ChemPLP scores, while the grey lines represent the ROC curves when the results were ranked by Tc-PLIF values.The dashed lines represent random selection.

Figure 2 .
Figure 2. The co-crystal ligands ICI 118.551 (cyan carbon atoms; balls and sticks mode) pose in the ADRB2 binding pocket (Wacker et al., 2010) (A) and the docking pose of the representative ligand CHEMBL38205 (magenta carbon atoms; balls and sticks mode; Ki at ADRB2 = 3-7nM (Tejani-Butt and Brunswick, 1986;Mysinger et al., 2012); ChemPLP score = -101.916) in the ADRB2 binding pocket resulted in the SBVS with filtering on poses that have an ionic interaction to D113 (see TableII) with the residue as the anion (B).The ADRB2 (green carbon atoms) is presented in the cartoon mode with only some important residues are presented in the balls and sticks mode.Oxygen, nitrogen and hydrogen are presented in red, blue and white, respectively.For clarity, only polar hydrogen and interacting residues to the crystal ligand are shown as balls and sticks, while ADRB2 residues from sequence 161 to sequence 200 are not shown.H-bonds and ionic bond are depicted by black dashed lines and red dashed lines, respectively. ). *)