README.md 8.44 KB
Newer Older
1
# Chemical Checker Repository
Miquel Duran-Frigola's avatar
Miquel Duran-Frigola committed
2

3
The **Chemical Checker (CC)** is a resource of small molecule signatures. In the CC, compounds are described from multiple viewpoints, spanning every aspect of the drug discovery pipeline, from chemical properties to clinical outcomes.
Miquel Duran-Frigola's avatar
Miquel Duran-Frigola committed
4

5 6
* For a quick exploration of what this resource enables, please visit the [CC web app](http://chemicalchecker.org).
* For full documentation of the python package, please see the [Documentation](http://packages.sbnb-pages.irbbarcelona.org/chemical_checker).
Miquel Duran-Frigola's avatar
Miquel Duran-Frigola committed
7
* Concepts and methods are best described in the original CC publication, [Duran-Frigola et al. 2019](https://biorxiv.org/content/10.1101/745703v1).
8
* For more information about this repository, discussion, notes, etc... please refer to our [Wiki page](http://gitlabsbnb.irbbarcelona.org/packages/chemical_checker/wikis/home).
9

Martino Bertoni's avatar
Martino Bertoni committed
10
The **Chemical Checker Repository** holds the current implementation of the CC in our `SB&NB` laboratory. As such, the repository contains a significant number of functionalities and data not presented in the primary CC manuscript. The repository follows this directory structure:
Miquel Duran-Frigola's avatar
Miquel Duran-Frigola committed
11

Martino Bertoni's avatar
Martino Bertoni committed
12
* `container`: Deal with containerization of the CC, contains the definition files for Singularity image.
Martino Bertoni's avatar
Martino Bertoni committed
13 14
* `notebook`: Contains exemplary Jupyter Notebooks that showcase some CC features.
* `package`: The backbone of the CC in form of a Python package.
Martino Bertoni's avatar
Martino Bertoni committed
15 16
* `pipelines`: The pipeline scripts (e.g. for updating the CC, generating data for the web app, etc...).
* `setup`: The setup script to install the CC.
17 18 19 20


Due to the strong computational requirements of our pipeline, the code has been written and optimized to work in our local HPC facilities. Installation guides found below are mainly addressed to `SB&NB` users. As stated in the manuscript, the main deliverable of our resource are the CC _signatures_, which can be easily accessed:

Martino Bertoni's avatar
Martino Bertoni committed
21 22 23
* through a [REST API](https://chemicalchecker.com/help),
* downloaded as [data files](https://chemicalchecker.com/downloads) or 
* predicted from SMILES with the [Signaturizer](http://gitlabsbnb.irbbarcelona.org/packages/signaturizer).
Miquel Duran-Frigola's avatar
Miquel Duran-Frigola committed
24

Martino Bertoni's avatar
Martino Bertoni committed
25
## Chemical Checker `lite`
Martino Bertoni's avatar
Martino Bertoni committed
26

Martino Bertoni's avatar
Martino Bertoni committed
27
The CC package can be installed in a couple of minutes directly via `pip`:
Martino Bertoni's avatar
Martino Bertoni committed
28

Miquel Duran-Frigola's avatar
Miquel Duran-Frigola committed
29
```bash
Martino Bertoni's avatar
Martino Bertoni committed
30
pip install chemicalchecker
Martino Bertoni's avatar
Martino Bertoni committed
31 32
```

33
This installs the `lite` version of the CC that can be used for basic task (e.g. to open signatures) but most of the fancy CC package capabilities will be missing.
Martino Bertoni's avatar
Martino Bertoni committed
34

35
_**N.B.** Only bare minimum dependencies are installed along with the package_
Martino Bertoni's avatar
Martino Bertoni committed
36

Miquel Duran-Frigola's avatar
Miquel Duran-Frigola committed
37

Martino Bertoni's avatar
Martino Bertoni committed
38

Martino Bertoni's avatar
Martino Bertoni committed
39
# Installation
40

41
All the dependencies for the CC will be bundled within a singularity image generated during the installation process.
Martino Bertoni's avatar
Martino Bertoni committed
42
Generating such an image requires roughly 20 minutes:
43

Oriol Guitart's avatar
Oriol Guitart committed
44

45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
1. [Install Singularity](https://sylabs.io/guides/3.8/admin-guide/admin_quickstart.html#installation-from-source)

                $ sudo apt-get update && sudo apt-get install -y \
                    build-essential \
                    uuid-dev \
                    libgpgme-dev \
                    squashfs-tools \
                    libseccomp-dev \
                    wget \
                    pkg-config \
                    git \
                    cryptsetup-bin\
                    golang-go

                $ export VERSION=3.8.0 && # adjust this as necessary \
                    wget https://github.com/sylabs/singularity/releases/download/v${VERSION}/singularity-ce-${VERSION}.tar.gz && \
                    tar -xzf singularity-ce-${VERSION}.tar.gz && \
                    cd singularity-ce-${VERSION}

                $ ./mconfig && \
                    make -C ./builddir && \
                    sudo make -C ./builddir install
67

68

Martino Bertoni's avatar
Martino Bertoni committed
69
2. Clone this repository to your code folder:
Martino Bertoni's avatar
Martino Bertoni committed
70 71
        
        cd ~ && mkdir -p code && cd code
72
        git clone http://gitlabsbnb.irbbarcelona.org/packages/chemical_checker.git
73

Martino Bertoni's avatar
Martino Bertoni committed
74
3. Run the setup script (this script will require to type your password) with:
Oriol Guitart's avatar
Oriol Guitart committed
75

76
        cd chemical_checker && sh setup/setup_chemicalchecker.sh
Oriol Guitart's avatar
Oriol Guitart committed
77

Martino Bertoni's avatar
Martino Bertoni committed
78
## Running `Vanilla` Chemical Checker
Martino Bertoni's avatar
Martino Bertoni committed
79 80

This is the easiest scenario where you simply use the CC code 'as is'.
Oriol Guitart's avatar
Oriol Guitart committed
81

82
The setup_chemicalchecker script has created an alias in your ~/.bashrc so you can start the CC with:
83 84 85 86
```bash
source ~/.bashrc
chemcheck
```
Martino Bertoni's avatar
Martino Bertoni committed
87 88 89

_**N.B.** If you are using another shell (e.g. zsh) just copy the chemcheck alias from your .bashrc to your .zshrc_

90

Martino Bertoni's avatar
Martino Bertoni committed
91 92 93
## Running custom Chemical Checker

If you are contributing with code to the CC you can run the singularity image specifying your local develop branch:
Oriol Guitart's avatar
Oriol Guitart committed
94

95
```bash
Martino Bertoni's avatar
Martino Bertoni committed
96
chemcheck -d /path/to/your/code/chemical_checker/package/
97
```
Martino Bertoni's avatar
Martino Bertoni committed
98
    
Martino Bertoni's avatar
Martino Bertoni committed
99 100 101
## Running with alternative config file

The CC rely on one config file containing the information for the current enviroment (e.g. the HPC, location of the default CC, database, etc...). The default configuration refere to our `SB&NB` enviroment and must be overridden specifying a custom config file when working in a different enviroment:
102

Martino Bertoni's avatar
Martino Bertoni committed
103 104 105
```bash
chemcheck -c /path/to/your/cc_config.json
```
Oriol Guitart's avatar
Oriol Guitart committed
106

107 108 109 110 111 112 113 114
## Running with alternative image

You might want to use a previously compiled or downloaded image:

```bash
chemcheck -i /path/to/your/cc_image.simg
```

Oriol Guitart's avatar
Oriol Guitart committed
115
## Usage
Oriol Guitart's avatar
Oriol Guitart committed
116

Martino Bertoni's avatar
Martino Bertoni committed
117
We make it trivial to either start a Jupyter Notebook within the image or to run a shell:
Oriol Guitart's avatar
Oriol Guitart committed
118

119
1. Run a Jupyter Notebook with:
Miquel Duran-Frigola's avatar
Miquel Duran-Frigola committed
120

121
        chemcheck
Miquel Duran-Frigola's avatar
Miquel Duran-Frigola committed
122

123
    1.1. Open your browser, paste the URL that the script has produced.
Martino Bertoni's avatar
Martino Bertoni committed
124

Martino Bertoni's avatar
Martino Bertoni committed
125
    1.2. Start a new notebook (on the top right Jupyter page click New -> Python )
Martino Bertoni's avatar
Martino Bertoni committed
126

127
    1.3. Type `import chemicalchecker`
Martino Bertoni's avatar
Martino Bertoni committed
128

129
2. Run a shell within the image:
Martino Bertoni's avatar
Martino Bertoni committed
130

131
        chemcheck -s [-d <PATH_TO_SOURCE_CODE_ROOT>] [-c <PATH_TO_CONFIG_FILE>]
Oriol Guitart's avatar
Oriol Guitart committed
132
        
133
    2.1 Type `ipython`
Oriol Guitart's avatar
Oriol Guitart committed
134
    
135
    2.2 Type `import chemicalchecker`
Miquel Duran-Frigola's avatar
Miquel Duran-Frigola committed
136

137

Martino Bertoni's avatar
Martino Bertoni committed
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179
## Running an update pipeline

When properly configured the CC can be updated or generated from scratch. This operation critically depend on available infrastructure. In our HPC infrastructure comprising 12 nodes and 364 cores it takes roughly 2 weeks to complete and update. PLease check the `pipelines` directory for more details.


# `SB&NB` configuration

## Mounting `/aloy` in Singularity

1. Add bind paths to singularity config file:

        sudo echo "bind path = /aloy/web_checker" >> /etc/singularity/singularity.conf


2. Make sure that `/aloy/web_checker` is available on your workstation (e.g. `ls /aloy/web_checker` should give a list of directories) if **not**:

        mkdir /aloy/web_checker
        sudo echo "fs-paloy.irbbarcelona.pcb.ub.es:/pa_webchecker /aloy/web_checker       nfs     defaults,_netdev 0 0" >> /etc/fstab
        sudo mount -a

## Working from a laptop

First, check that you are connected to the `SB&NB` local network:
```bash
ping pac-one-head.irb.pcb.ub.es
```
Then, mount the remote filesystem
```bash
sudo mkdir /aloy
chown <laptop_username>:<laptop_username> /aloy
sshfs <sbnb_username>@pac-one-head.irb.pcb.ub.es:/aloy /aloy
```
You can unmount the filesystem with:
```bash
# Linux
fusermount -u /aloy
# MacOSX
umount /aloy
```

# Contributing

Martino Bertoni's avatar
Martino Bertoni committed
180
## Introducing new dependencies
Martino Bertoni's avatar
Martino Bertoni committed
181 182

### Adding a package or software to the image
183 184 185

1. You will have to enter the singularity sandbox

Miquel Duran-Frigola's avatar
Miquel Duran-Frigola committed
186
        cd ~/chemical_checker
187 188 189 190 191 192 193 194 195 196 197 198
        sudo singularity shell --writable sandbox

2. Install the package/software and exit the image

        pip install <package_of_your_dreams>
        exit

3. Re-generate the image:

        rm cc.simg
        sudo singularity build cc.simg sandbox

Martino Bertoni's avatar
Martino Bertoni committed
199
4. In case you make use of the HPC utility, remember to copy your newly generated image to a directory accessible by the queuing system and edit the config file (section PATH.SINGULARITY_IMAGE) accordingly e.g.:
200 201 202 203

        cp cc.simg /aloy/scratch/<yout_user>/cc.simg


Martino Bertoni's avatar
Martino Bertoni committed
204
### Adding a permanent dependency to the package
205

Martino Bertoni's avatar
Martino Bertoni committed
206
Not re-inventing the wheel is a great philosophy, but each dependency we introduce comes at the cost of maintainability. Double check that the module you want to add is the best option for doing what you want to do. Check that it is actively developed and that it supports Python 3. Test it thoroughly using the sandbox approach presented above. When your approach is mature you can happily add the new dependency to the package.
207 208 209 210 211 212

To do so you can add a `pip install <package_of_your_dreams>` line to the following files in container/singularity:

* cc-full.def (the definition file used by setup_chemicalchecker.sh)
* cc_py36.def (unit-testing Python 3 environment)

Pau Badia i Mompel's avatar
Pau Badia i Mompel committed
213
Don't forget to also add a short comment on why and where this new dependency is used, also in the commit message. E.g. "Added dependency used in preprocessing for space B5.003". The idea is that whenever B5.003 is obsoleted we can also safely remove the dependency.
214