what is probabilistic programming?

I did not know what PPL is.
Recently I knew probabilistic programing and found nice article in arxiv.
A deep probabilistic programming language is a language for specifying both deep NN and probabilistic models.
Probabilistic programming creates systems that help make decisions in the face of uncertainty.

In this article, author describes pros and cons of Deep learning and probabilistic programming.
Advantages from probabilistic programming(PP) is that PP can give not prediction but also a probability. It means PP give probabilistic models to my understanding.
On the other hand, advantages from Deep Learning is feature extraction from large dataset. DL gives high accuracy.
Stan, PyMC is major package for PPL. If you have used tensorflow for DL, I recommend consider edward as one option.
Edward does not support recent version of tensorflow. For new version of tensorflow, tensorflow-probability is recommended.

Today, I used edward for PP.
Following example is very simple case.

At first linear regression.
y = wx + b
Define data builder and define variable.
For PP, w and b is stochastic variable, define each variable with Normal().

%matplotlib inline
import edward as ed
import numpy as np
import tensorflow as tf
from edward.models import Normal
from matplotlib import pyplot as plt
def buildtodayta(N, w):
    D = len(w)
    x = np.random.normal(0., 2., size=(N,D))
    y = np.dot(x, w) + np.random.normal(0., 0.01, size=N)
    return x, y
N = 40
D = 10
w_true = np.random.randn(D)*0.4
xt, yt = buildtodayta(N, w_true)
yt.shape, xt.shape

X = tf.placeholder(tf.float32, [N,D])
w = Normal(loc=tf.zeros(D), scale=tf.ones(D))
b = Normal(loc=tf.zeros(1), scale=tf.ones(1))
y = Normal(loc=ed.dot(X, w)+b, scale=tf.ones(N))

Then define theta. Theta is set of latent variables. In the following code, qw and qb is theta.

qw = Normal(loc=tf.get_variable("qw/loc", [D]), scale=tf.nn.softplus(tf.get_variable("qw/scale", [D])))
qb = Normal(loc=tf.get_variable("qb/loc", [1]), scale=tf.nn.softplus(tf.get_variable("qb/scale", [1])))

Finally, to find optimal theta conduct minimization of Kullback-Leibler divergence of q(x|z) and p(x|z).

inference = ed.KLqp({w:qw, b:qb}, data={X:xt, y:yt})
inference.run(n_samples=5, n_iter=100)
y_post = ed.copy(y, {w: qw, b: qb})
def visualise(X_data, y_data, w, b, n_samples=10):
  w_samples = w.sample(n_samples)[:, 0].eval()
  b_samples = b.sample(n_samples).eval()
  plt.scatter(X_data[:, 0], y_data)
  plt.ylim([-10, 10])
  inputs = np.linspace(-8, 8, num=400)
  for ns in range(n_samples):
    output = inputs * w_samples[ns] + b_samples[ns]
    plt.plot(inputs, output)

After learning, I can sampling from qw and qb.
Let’s compare before and after.

Before learning, w and b is not optimized, regression lines are unreasonable.

visualise(xt, yt, w, b, n_samples=3)

After learning the regression lines catch the trend of the dataset.

visualise(xt, yt, qw, qb, n_samples=3)

Next, try to cosine curve.

%matplotlib inline
import edward as ed
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from edward.models import Normal
def build_toy_dataset(N=50, noise_std=0.1):
    x = np.linspace(-14,14, num=N)
    y = np.cos(x) + np.random.normal(0, noise_std, size=N)
    x = x.astype(np.float32).reshape((N, 1))
    y = y.astype(np.float32)
    return x, y
plt.scatter(build_toy_dataset()[0], build_toy_dataset()[1], c="b")

Next, define DNN (2layers not so deeeeeeeeep ;-)).

def neural_network(x, W_0, W_1, b_0, b_1):
    h = tf.tanh(tf.matmul(x, W_0) + b_0)
    h = tf.matmul(h, W_1) + b_1
    return tf.reshape(h, [-1])
N = 50
D = 1
x_train, y_train = build_toy_dataset(N)
W_0 = Normal(loc=tf.zeros([D,30]), scale=tf.ones([D,30]))
W_1 = Normal(loc=tf.zeros([30,1]), scale=tf.ones([30,1]))
b_0 = Normal(loc=tf.zeros(30), scale=tf.ones(30))
b_1 = Normal(loc=tf.zeros(1), scale=tf.ones(1))
x = x_train
y = Normal(loc = neural_network(x, W_0, W_1, b_0, b_1), scale=tf.ones(N) * 0.1)
qW_0 = Normal(loc=tf.get_variable("qW_0/loc", [D,30]), scale=tf.nn.softplus(tf.get_variable("qW_0/scale", [D,30])))
qW_1 = Normal(loc=tf.get_variable("qW_1/loc", [30,1]), scale=tf.nn.softplus(tf.get_variable("qW_1/scale", [30,1])))
qb_0 = Normal(loc=tf.get_variable("qb_0/loc", [30]), scale=tf.nn.softplus(tf.get_variable("qb_0/scale", [30])))
qb_1 = Normal(loc=tf.get_variable("qb_1/loc", [1]), scale=tf.nn.softplus(tf.get_variable("qb_1/scale", [1])))

rs = np.random.RandomState(0)
inputs = np.linspace(-9, 9, num=400, dtype=np.float32)
x = tf.expand_dims(inputs, 1)
mus = tf.stack([neural_network(x, qW_0.sample(), qW_1.sample(), qb_0.sample(), qb_1.sample()) for _ in range(10)])
sess = ed.get_session()
outputs = mus.eval()

fig = plt.figure(figsize=(10, 6))
ax = fig.add_subplot(111)
ax.set_title('iteration: 0')
ax.plot(x_train, y_train, 'ks', alpha=0.5, label=('x', 'y'))
ax.plot(inputs, outputs[0].T, 'b', lw=2, alpha=0.5, label='prior draws')
ax.plot(inputs, outputs[1:].T, 'y', lw=2, alpha=0.5)
ax.set_xlim([-10, 10])

inference = ed.KLqp({W_0: qW_0, b_0:qb_0, W_1:qW_1, b_1:qb_1}, data={y: y_train})
inference.run(n_iter=2000, n_samples=5)
outputs = mus.eval()
fig = plt.figure(figsize=(10,6))
ax = fig.add_subplot(111)
ax.set_title('iteration: 2000')
ax.plot(x_train, y_train, 'ks', alpha=0.5, label=('x', 'y'))
ax.plot(inputs, outputs[0].T, 'b', alpha=0.5, lw=2, label='post draws')
ax.plot(inputs, outputs[1:].T, 'y', alpha=0.5, lw=2, label='post draws')
ax.set_xlim([-14, 14])


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.