SQL query for CHEMBL21

It was just memorandum for my self.
PostgreSql can save sql result as csv using COPY( sql statement ) TO ‘filelocation/filename’
And ‘\i file.sql’ command can execute SQL from file.
https://www.postgresql.org/docs/9.1/static/app-psql.html

COPY( SELECT
             AC.DOC_ID,
             AC.MOLREGNO,
             ST.MOLREGNO,
             ST.CANONICAL_SMILES,
             CP.QED_WEIGHTED,
             CP.ALOGP,
             CP.MOLECULAR_SPECIES,
             AC.STANDARD_TYPE,
             AC.STANDARD_RELATION,
             AC.STANDARD_VALUE,
             AC.STANDARD_UNITS,
             TD.PREF_NAME,
             BS.SITE_NAME,
             PCS.PROTEIN_CLASS_SYNONYM,
             PCS.SYN_TYPE
      FROM ASSAYS AS A,
           ACTIVITIES AS AC,
           COMPOUND_STRUCTURES AS ST,
           COMPOUND_PROPERTIES AS CP,
           TARGET_DICTIONARY AS TD,
           BINDING_SITES AS BS,
           PROTEIN_CLASS_SYNONYMS AS PCS,
           COMPONENT_CLASS AS CC,
           COMPONENT_SEQUENCES AS CS,
           TARGET_COMPONENTS AS TC
      WHERE A.ASSAY_ID=AC.ASSAY_ID
            AND ST.MOLREGNO=AC.MOLREGNO
            AND TD.TID = A.TID
            AND TD.TID = BS.TID
            AND TD.TID = TC.TID
            AND ST.MOLREGNO = CP.MOLREGNO
            AND TC.COMPONENT_ID = CS.COMPONENT_ID
            AND CC.COMPONENT_ID = CS.COMPONENT_ID
            AND CC.PROTEIN_CLASS_ID = PCS.PROTEIN_CLASS_ID
            AND CONFIDENCE_SCORE >= 9
            AND ASSAY_TYPE='B'
            AND AC.STANDARD_RELATION='='
            AND AC.STANDARD_TYPE = 'IC50'
            AND AC.STANDARD_UNITS='nM'
            AND TD.TARGET_TYPE='SINGLE PROTEIN'
            AND CP.MW_FREEBASE <= 600 )
  TO '/Users/iwatobipen/chembldata/chembl.csv' DELIMITER ',';
Advertisement

Published by iwatobipen

I'm medicinal chemist in mid size of pharmaceutical company. I love chemoinfo, cording, organic synthesis, my family.

8 thoughts on “SQL query for CHEMBL21

  1. I’m totally new to extracting data from Chembl using sql. I downloaded chembl sql database. However, I’m not sure how to point to the database and use your script to extract. Could you please direct me how to do that? My main goal is to obtain chembl data for target prediction but I wanted to start from your script to learn how to do these extractions first. Thank you!

  2. Hi, thanks for your comment. To run the script, you should install postgreSQL and create ChEMBL DB on it.
    You can get ChEMBL DB from here. https://chembl.gitbook.io/chembl-interface-documentation/downloads
    After installation, run the PostgreSQL on your PC, then login. Then you can run the script from console.
    \i hogehoge.sql is command of postgreSQL console which executes script written in hogehoge.sql.
    Pls let me know you have questions or troubles.
    Also if you use ubuntu following URL will be help for you or if you use MacOX, you can install postgreSQL with homebrew.
    https://www.digitalocean.com/community/tutorials/how-to-install-and-use-postgresql-on-ubuntu-18-04

    Thanks,

    1. Thank you for your help! I’ll do that and let you know if I run into any issues.

      1. So I installed postgresql on linuxdownload the chembl postgresql database from the link you gave (i have chembl_25_postgresql.dmp downloaded)

        in chembl postgresql installation instructions it asks to do 2 things:

        I created a database name like this:
        sudo -i -u postgres
        psql
        create database chembl_25

        But I’m not sure what to give for HOST, PORT, USERNAME arguments in this commands 
        $> pg_restore -h HOST -p PORT -U USERNAME -d chembl_25 chembl_25_postgresql.dmp

        Sorry these must be really basic sql database setup questions!

        Thank you!

      2. Hi Sue_chem,
        If you want to use ChEMBLDB in you PC only, you don’t pass port and host option. FYI, postgresql uses port 5432.

        If I could understand your situation correctly, now you have installed postgres SQL and got Chembldb.
        Now you created postgresql user.
        Following code is an example.

        $sudo systemctl start postgresql.service
        $sudo systemctl enable postgresql.service
        $sudo -iu postgres
        #set postgers password
        $psql
        $ALTER USER postgres PASSWORD 'newPassword';
        $\q
        
        # init DB
        $mkdir /var/lib/postgresql/data
        $/usr/lib/postgresql/10/bin/initdb -D /var/lib/postgresql/data
        $exit
        

        Then changed pg_hba.conf.
        Please check following URL.
        https://stackoverflow.com/questions/18664074/getting-error-peer-authentication-failed-for-user-postgres-when-trying-to-ge

        Now ready, run the pg_restore.

        $ pg_restore -U postgres -d chembl_25 chembl_25_postgresql.dmp 
        # it will take several minutes.
        

        OK check db.

        $ psql -U postgres -d chembl_25
        
        #from psql chell, type the command.
        
        chembl_25=# SELECT canonical_smiles FROM compound_structures limit 10;
                                      canonical_smiles                              
        ----------------------------------------------------------------------------
         Cc1cc(ccc1C(=O)c2ccccc2Cl)N3N=CC(=O)NC3=O
         Cc1cc(ccc1C(=O)c2ccc(cc2)C#N)N3N=CC(=O)NC3=O
         Cc1cc(cc(C)c1C(O)c2ccc(Cl)cc2)N3N=CC(=O)NC3=O
         Cc1ccc(cc1)C(=O)c2ccc(cc2)N3N=CC(=O)NC3=O
         Cc1cc(ccc1C(=O)c2ccc(Cl)cc2)N3N=CC(=O)NC3=O
         Cc1cc(ccc1C(=O)c2ccccc2)N3N=CC(=O)NC3=O
         Cc1cc(Br)ccc1C(=O)c2ccc(cc2Cl)N3N=CC(=O)NC3=O
         Clc1ccc(C(=O)c2ccc(cc2Cl)N3N=CC(=O)NC3=O)c(Cl)c1
         CS(=O)(=O)c1ccc(cc1)C(=O)c2ccc(cc2Cl)N3N=CC(=O)NC3=O
         C1CCCCCNc2cc[n+](Cc3cccc(c3)c4cccc(C[n+]5ccc(NCCCC1)c6ccccc56)c4)c7ccccc27
        (10 rows)
        
  3. Thank you so much for the detailed instructions! The linux environment I used is on a AWS instance. I then also tried to follow your instructions on my mac. Installed postgresql using brew and downloaded the chembl postgresql database.However systemctl wasn’t recognized on the mac. And seems like I can’t even get it on my mac. So I again went back to the linux instance on the AWS. Gave the host (ec2 id), port 5432 but got errors. I went back and checked on AWS /usr/lib/postgresql/9.3/bin/initdb is not even there. That folder has many other files but not initdb you used. I’m not sure what would be easier, try on Mac or the AWS linux instance. Sorry this is a way too complicated question to ask.

  4. Hi, thank you for your prompt reply.
    I’m sorry that I have no experience the installation on AWS. And systemctl is not available on MacOS. My understanding is correctly, you couldn’t find initdb. How about try to updatedb command from shell and search initdb?
    https://stackoverflow.com/questions/44104241/postgresql-error-initdb-command-not-found
    Here is an example for OSX postgresql installation.
    https://www.codementor.io/engineerapart/getting-started-with-postgresql-on-mac-osx-are8jcopb
    And there is another option I think. ChEBLDB is provided not only postgresql dump file and also sqlite3 dumpfile. If you use sqlite3 dumpfile, you don’t need to setup DB server.
    Here is an example http://www.dalkescientific.com/writings/diary/archive/2017/03/24/chembl_bioactivity_data.html.
    Also sqlite3 is available from python.
    I hope this will help for you.
    Thanks,

  5. Thank you so much for all your suggestions! I appreciate it very very much! It’s more than I ever expected. I’m going to take all your suggestions and try to get it to work. So many things to try. Thank you!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: