網頁

2019年10月27日 星期日

Windows install RDkit Tensorflow Keras

1.安裝RDkit

打開Anaconda3 Prompt (Anaconda3)終端機
$ conda create -c rdkit -n my-rdkit-env rdkit python=3.7
$ conda activate my-rdkit-env

create -c rdkit -n my-rdkit-env意思是在Windows Anaconda3環境裡產生一個名叫my-rdkit-env的環境,因為tensorflow只支援Python3.7,所以在最後還要輸入python=3.7代表在my-rdkit-env環境下是使用python=3.7,接著就可以依序安裝tensorflow跟Keras了!

2.在my-rdkit-env環境下安裝tensorflow2.0
$ conda install tensorflow-gpu=2.0 python=3.7

3.在my-rdkit-env環境下安裝Keras
$ conda install Keras

4.在my-rdkit-env環境下安裝jupyter notebook
$ conda install jupyter

5.安裝CUDA
因為RDkit只支援CUDA 10.0,所以到下面連結下載CUDA 10.0
https://developer.nvidia.com/cuda-toolkit-archiv


2019年10月26日 星期六

Centos關閉mail

Method 1:
$ echo "unset MAILCHECK" >> /etc/profile
$ source /etc/profile



參考

Method 2:
$ vi /etc/crontab

MAILTO=""



參考

Centos在/var/spool/mail/root出現Cannot open /var/log/sa/sa*: No such file or directory

問題:
主機會發出/var/spool/mail/root訊息,裡面顯示Cannot open /var/log/sa/sa*: No such file or directory

解決:
$ cd /var/log
$ rm -r sa #刪掉舊的sa資料夾後執行
$ mkdir sa
$ sar -o 26 (26是當天日期)



參考
https://www.twblogs.net/a/5b8fdeeb2b7177672215e6c4

Knowledge Graph – A Powerful Data Science Technique to Mine Information from Text (with Python code)

https://www.analyticsvidhya.com/blog/2019/10/how-to-build-knowledge-graph-text-using-spacy/?utm_content=103320382&utm_medium=social&utm_source=facebook&hss_channel=fbp-236305876563528

Towards a Super Resolution Autoencoder (code included)

https://www.linkedin.com/pulse/towards-super-resolution-autoencoder-code-included-ibrahim-sobh-phd?fbclid=IwAR39_nyGVc0j2y6fbuqhnwPFbUpBntxhY1PoU2UtsFegXlZCxp2NN3Qn1sU

2019年10月20日 星期日

2019年10月19日 星期六

「Level-4」生成模型与对抗样本——变分自编码器简介及其有趣性质

https://zhuanlan.zhihu.com/p/82420402

On the use of the Kullback–Leibler divergence in Variational Autoencoders

可變自動編碼器(VAE)的loss function可以分為兩部分。第一個測量autoencoding的量,即原始樣本與其重構之間的誤差。第二個是有關standard multivariate normal distribution的Kullback-Leibler散度(縮寫為KL散度)。我們將畫圖說明KL散度對encoder 和decoder 輸出的影響。

2019年10月11日 星期五

Step-by-step understanding LSTM Autoencoder layers

https://towardsdatascience.com/step-by-step-understanding-lstm-autoencoder-layers-ffab055b6352

define one-to-one, one-to-many, many-to-one, and many-to-many LSTM neural networks in Keras

https://stackoverflow.com/questions/52138290/how-can-we-define-one-to-one-one-to-many-many-to-one-and-many-to-many-lstm-ne

Python keras.layers.RepeatVector() Examples

https://www.programcreek.com/python/example/89689/keras.layers.RepeatVector

LSTM 類神經網路學習紀錄

http://mark1002.github.io/2018/04/04/LSTM-%E9%A1%9E%E7%A5%9E%E7%B6%93%E7%B6%B2%E8%B7%AF%E5%AD%B8%E7%BF%92%E7%B4%80%E9%8C%84/

LSTM Arguments設定input_shape

為何在Keras Document裡面的Arguments明明沒有input_shape,可是常常會看到有人使用呢?如下所示:

model.add(LSTM(32, input_shape=(10, 64)))

這是因為LSTM是keras.engine.base_layer.wrapped_fn()抽像類別的子類別,所有的循環層(LSTM,GRU,SimpleRNN)都繼承本層,因此下面的參數可以在任何循環層中使用。

  • cell: A RNN cell instance. A RNN cell is a class that has:
    • call(input_at_t, states_at_t) method, returning (output_at_t, states_at_t_plus_1). The call method of the cell can also take the optional argument constants, see section "Note on passing external constants" below.
    • state_size attribute. This can be a single integer (single state) in which case it is the size of the recurrent state (which should be the same as the size of the cell output). This can also be a list/tuple of integers (one size per state).
    • output_size attribute. This can be a single integer or a TensorShape, which represent the shape of the output. For backward compatible reason, if this attribute is not available for the cell, the value will be inferred by the first element of the state_size.
    It is also possible for cell to be a list of RNN cell instances, in which cases the cells get stacked one after the other in the RNN, implementing an efficient stacked RNN.
  • return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
  • return_state: Boolean. Whether to return the last state in addition to the output.
  • go_backwards: Boolean (default False). If True, process the input sequence backwards and return the reversed sequence.
  • stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
  • unroll: Boolean (default False). If True, the network will be unrolled, else a symbolic loop will be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences.
  • input_dim: dimensionality of the input (integer). This argument (or alternatively, the keyword argument input_shape) is required when using this layer as the first layer in a model.
  • input_length: Length of input sequences, to be specified when it is constant. This argument is required if you are going to connect Flatten then Dense layers upstream (without it, the shape of the dense outputs cannot be computed). Note that if the recurrent layer is not the first layer in your model, you would need to specify the input length at the level of the first layer (e.g. via the input_shape argument)


參考
https://keras.io/layers/recurrent/
https://keras-cn.readthedocs.io/en/latest/layers/recurrent_layer/

2019年10月10日 星期四

Variational Autoencoder: Intuition and Implementation

目前對於數據生成的神經網路架構存在兩種並駕齊驅的生成模型:生成對抗網絡(GAN)和變異自編碼器(VAE)。 這兩種模型的訓練方式相當不同。 GAN來自於game theory,其目標是找到discriminator網絡與generator 網絡之間的Nash Equilibrium。 另一方面,VAE來自於bayesian inference,也就是建立數據的基本機率分佈模型,以便可以從該分佈中採樣新數據。

在本文中,我們將直觀地研究VAE模型及其在Keras中的實現。

Keras implementation of an LSTM neural network to classify and predict the MINST dataset

這篇將會針對經常用於時間序列數據進行分類的神經網路LSTM對2D數據進行分類,例如手寫字母的圖像來進行討論。

2019年10月6日 星期日

RDKit指令

1. MolFromSmiles

mol = Chem.MolFromSmiles('C(C)CC')
print(mol)

將SMILE轉成Mol格式,可以使用在像是fingerprint的引數
fp1 = MACCSkeys.GenMACCSKeys(mol)

https://blog.csdn.net/u012325865/article/details/81784517

2. MolToSmiles

smi = Chem.MolToSmiles(mol)
print(smi)



可以將Mol格式轉成SMILE

3. Canonicalization
Chem.MolToSmiles(Chem.MolFromSmiles(smi), isomericSmiles=True, canonical=True)

在大多數情況下,對於同一結構,可能有許多SMILES字符串。Canonicalization是將所有可能的SMILES字串轉換成一種canonical SMILES。假設您要在dataset中查找是否已經存在某結構。使用canonical SMILES代替2-d圖形結構可以將問題簡化為簡單的文本匹配問題。從dataset中每個化合物的canonical SMILES,並將查詢的結構轉換為其canonical SMILES。如果該SMILES不存在,代表它會是一個新的結構。

https://ctr.fandom.com/wiki/Convert_a_SMILES_string_to_canonical_SMILES

4.











Solving environment: | Found conflicts! Looking for incompatible packages.

conda更新如果遇到上述問題,請輸入以下指令

$ conda update anaconda



參考
http://showteeth.tech/posts/52735.html

ubuntu 18.04 install RDKit

1. 建立rdkit環境
$ conda create -c rdkit -n my-rdkit-env rdkit

2. 到anaconda3的bin資料夾
$ cd anaconda3/bin

3. 安裝conda-foerg rdkit
$ conda install -c conda-forge rdkit

4. 安裝cmake cairo pillow eigen pkg-config
$ conda install -y cmake cairo pillow eigen pkg-config

5. 安裝boost-cpp boost py-boost
$ conda install -y boost-cpp boost py-boost
$ conda update -n base -c defaults conda

6. 安裝gxx_linux-64
$ conda install -y gxx_linux-64

7. 複製rdkit資料夾到家目錄
$ cd
$ sudo apt install git
$ git clone https://github.com/rdkit/rdkit.git

8. 進入rdkit資料夾
$ cd rdkit

9. 產升build資料夾
$ mkdir build

10. 進入build資料夾
$ cd build

11. cmake (切記這邊一定要注意你的Python是哪個版本)
$ cmake .. -DPy_ENABLE_SHARED=1 -DRDK_INSTALL_INTREE=ON -DRDK_INSTALL_STATIC_LIBS=OFF -DRDK_BUILD_CPP_TESTS=ON -DPYTHON_NUMPY_INCLUDE_PATH="$CONDA_PREFIX/lib/python3.7/site-packages/numpy/core/include" -DBOOST_ROOT="$CONDA_PREFIX"

12. 在my-rdkit-env環境建立include numpy
否則會發生fatal error: numpy/arrayobject.h: No such file or directory #include <numpy/arrayobject.h>

$ which python3
/home/chiustin/anaconda3/envs/my-rdkit-env/bin/python3

$ cd /home/chiustin/anaconda3/envs/my-rdkit-env/include
$ mkdir numpy
$ cd numpy 
$ cp /usr/include/numpy/* .

12. make以及make install
$ make
$ make install



參考
http://www.rdkit.org/docs/Install.html