I did not know what PPL is.
Recently I knew probabilistic programing and found nice article in arxiv.
Click to access arxiv18-deep-ppl.pdf
A deep probabilistic programming language is a language for specifying both deep NN and probabilistic models.
Probabilistic programming creates systems that help make decisions in the face of uncertainty.
In this article, author describes pros and cons of Deep learning and probabilistic programming.
Advantages from probabilistic programming(PP) is that PP can give not prediction but also a probability. It means PP give probabilistic models to my understanding.
On the other hand, advantages from Deep Learning is feature extraction from large dataset. DL gives high accuracy.
Stan, PyMC is major package for PPL. If you have used tensorflow for DL, I recommend consider edward as one option.
http://edwardlib.org/
Edward does not support recent version of tensorflow. For new version of tensorflow, tensorflow-probability is recommended.
Today, I used edward for PP.
Following example is very simple case.
At first linear regression.
y = wx + b
Define data builder and define variable.
For PP, w and b is stochastic variable, define each variable with Normal().
%matplotlib inline import edward as ed import numpy as np import tensorflow as tf from edward.models import Normal from matplotlib import pyplot as plt def buildtodayta(N, w): D = len(w) x = np.random.normal(0., 2., size=(N,D)) y = np.dot(x, w) + np.random.normal(0., 0.01, size=N) return x, y N = 40 D = 10 w_true = np.random.randn(D)*0.4 xt, yt = buildtodayta(N, w_true) yt.shape, xt.shape X = tf.placeholder(tf.float32, [N,D]) w = Normal(loc=tf.zeros(D), scale=tf.ones(D)) b = Normal(loc=tf.zeros(1), scale=tf.ones(1)) y = Normal(loc=ed.dot(X, w)+b, scale=tf.ones(N))
Then define theta. Theta is set of latent variables. In the following code, qw and qb is theta.
qw = Normal(loc=tf.get_variable("qw/loc", [D]), scale=tf.nn.softplus(tf.get_variable("qw/scale", [D]))) qb = Normal(loc=tf.get_variable("qb/loc", [1]), scale=tf.nn.softplus(tf.get_variable("qb/scale", [1])))
Finally, to find optimal theta conduct minimization of Kullback-Leibler divergence of q(x|z) and p(x|z).
inference = ed.KLqp({w:qw, b:qb}, data={X:xt, y:yt}) inference.run(n_samples=5, n_iter=100) y_post = ed.copy(y, {w: qw, b: qb})
def visualise(X_data, y_data, w, b, n_samples=10): w_samples = w.sample(n_samples)[:, 0].eval() b_samples = b.sample(n_samples).eval() plt.scatter(X_data[:, 0], y_data) plt.ylim([-10, 10]) inputs = np.linspace(-8, 8, num=400) for ns in range(n_samples): output = inputs * w_samples[ns] + b_samples[ns] plt.plot(inputs, output)
After learning, I can sampling from qw and qb.
Let’s compare before and after.
Before learning, w and b is not optimized, regression lines are unreasonable.
visualise(xt, yt, w, b, n_samples=3)
After learning the regression lines catch the trend of the dataset.
visualise(xt, yt, qw, qb, n_samples=3)
Next, try to cosine curve.
%matplotlib inline import edward as ed import matplotlib.pyplot as plt import numpy as np import tensorflow as tf from edward.models import Normal plt.style.use('seaborn-pastel') def build_toy_dataset(N=50, noise_std=0.1): x = np.linspace(-14,14, num=N) y = np.cos(x) + np.random.normal(0, noise_std, size=N) x = x.astype(np.float32).reshape((N, 1)) y = y.astype(np.float32) return x, y plt.scatter(build_toy_dataset()[0], build_toy_dataset()[1], c="b")
Next, define DNN (2layers not so deeeeeeeeep ;-)).
def neural_network(x, W_0, W_1, b_0, b_1): h = tf.tanh(tf.matmul(x, W_0) + b_0) h = tf.matmul(h, W_1) + b_1 return tf.reshape(h, [-1]) ed.set_seed(794) N = 50 D = 1 x_train, y_train = build_toy_dataset(N) W_0 = Normal(loc=tf.zeros([D,30]), scale=tf.ones([D,30])) W_1 = Normal(loc=tf.zeros([30,1]), scale=tf.ones([30,1])) b_0 = Normal(loc=tf.zeros(30), scale=tf.ones(30)) b_1 = Normal(loc=tf.zeros(1), scale=tf.ones(1)) x = x_train y = Normal(loc = neural_network(x, W_0, W_1, b_0, b_1), scale=tf.ones(N) * 0.1) qW_0 = Normal(loc=tf.get_variable("qW_0/loc", [D,30]), scale=tf.nn.softplus(tf.get_variable("qW_0/scale", [D,30]))) qW_1 = Normal(loc=tf.get_variable("qW_1/loc", [30,1]), scale=tf.nn.softplus(tf.get_variable("qW_1/scale", [30,1]))) qb_0 = Normal(loc=tf.get_variable("qb_0/loc", [30]), scale=tf.nn.softplus(tf.get_variable("qb_0/scale", [30]))) qb_1 = Normal(loc=tf.get_variable("qb_1/loc", [1]), scale=tf.nn.softplus(tf.get_variable("qb_1/scale", [1])))
rs = np.random.RandomState(0) inputs = np.linspace(-9, 9, num=400, dtype=np.float32) x = tf.expand_dims(inputs, 1) mus = tf.stack([neural_network(x, qW_0.sample(), qW_1.sample(), qb_0.sample(), qb_1.sample()) for _ in range(10)])
sess = ed.get_session() tf.global_variables_initializer().run() outputs = mus.eval() fig = plt.figure(figsize=(10, 6)) ax = fig.add_subplot(111) ax.set_title('iteration: 0') ax.plot(x_train, y_train, 'ks', alpha=0.5, label=('x', 'y')) ax.plot(inputs, outputs[0].T, 'b', lw=2, alpha=0.5, label='prior draws') ax.plot(inputs, outputs[1:].T, 'y', lw=2, alpha=0.5) ax.set_xlim([-10, 10]) ax.set_ylim([-3,3]) ax.legend() plt.show()
inference = ed.KLqp({W_0: qW_0, b_0:qb_0, W_1:qW_1, b_1:qb_1}, data={y: y_train}) inference.run(n_iter=2000, n_samples=5)
outputs = mus.eval() fig = plt.figure(figsize=(10,6)) ax = fig.add_subplot(111) ax.set_title('iteration: 2000') ax.plot(x_train, y_train, 'ks', alpha=0.5, label=('x', 'y')) ax.plot(inputs, outputs[0].T, 'b', alpha=0.5, lw=2, label='post draws') ax.plot(inputs, outputs[1:].T, 'y', alpha=0.5, lw=2, label='post draws') ax.set_xlim([-14, 14]) ax.set_ylim([-3,3]) ax.legend() plt.show()