# what is probabilistic programming?

I did not know what PPL is.
Recently I knew probabilistic programing and found nice article in arxiv.

Click to access arxiv18-deep-ppl.pdf

A deep probabilistic programming language is a language for specifying both deep NN and probabilistic models.
Probabilistic programming creates systems that help make decisions in the face of uncertainty.

In this article, author describes pros and cons of Deep learning and probabilistic programming.
Advantages from probabilistic programming(PP) is that PP can give not prediction but also a probability. It means PP give probabilistic models to my understanding.
On the other hand, advantages from Deep Learning is feature extraction from large dataset. DL gives high accuracy.
Stan, PyMC is major package for PPL. If you have used tensorflow for DL, I recommend consider edward as one option.
http://edwardlib.org/
Edward does not support recent version of tensorflow. For new version of tensorflow, tensorflow-probability is recommended.

Today, I used edward for PP.
Following example is very simple case.

At first linear regression.
y = wx + b
Define data builder and define variable.
For PP, w and b is stochastic variable, define each variable with Normal().

```%matplotlib inline
import edward as ed
import numpy as np
import tensorflow as tf
from edward.models import Normal
from matplotlib import pyplot as plt
def buildtodayta(N, w):
D = len(w)
x = np.random.normal(0., 2., size=(N,D))
y = np.dot(x, w) + np.random.normal(0., 0.01, size=N)
return x, y
N = 40
D = 10
w_true = np.random.randn(D)*0.4
xt, yt = buildtodayta(N, w_true)
yt.shape, xt.shape

X = tf.placeholder(tf.float32, [N,D])
w = Normal(loc=tf.zeros(D), scale=tf.ones(D))
b = Normal(loc=tf.zeros(1), scale=tf.ones(1))
y = Normal(loc=ed.dot(X, w)+b, scale=tf.ones(N))
```

Then define theta. Theta is set of latent variables. In the following code, qw and qb is theta.

```qw = Normal(loc=tf.get_variable("qw/loc", [D]), scale=tf.nn.softplus(tf.get_variable("qw/scale", [D])))
qb = Normal(loc=tf.get_variable("qb/loc", ), scale=tf.nn.softplus(tf.get_variable("qb/scale", )))
```

Finally, to find optimal theta conduct minimization of Kullback-Leibler divergence of q(x|z) and p(x|z).

```inference = ed.KLqp({w:qw, b:qb}, data={X:xt, y:yt})
inference.run(n_samples=5, n_iter=100)
y_post = ed.copy(y, {w: qw, b: qb})
```
```def visualise(X_data, y_data, w, b, n_samples=10):
w_samples = w.sample(n_samples)[:, 0].eval()
b_samples = b.sample(n_samples).eval()
plt.scatter(X_data[:, 0], y_data)
plt.ylim([-10, 10])
inputs = np.linspace(-8, 8, num=400)
for ns in range(n_samples):
output = inputs * w_samples[ns] + b_samples[ns]
plt.plot(inputs, output)
```

After learning, I can sampling from qw and qb.
Let’s compare before and after.

Before learning, w and b is not optimized, regression lines are unreasonable.

```visualise(xt, yt, w, b, n_samples=3)
``` After learning the regression lines catch the trend of the dataset.

```visualise(xt, yt, qw, qb, n_samples=3)
``` Next, try to cosine curve.

```%matplotlib inline
import edward as ed
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from edward.models import Normal
plt.style.use('seaborn-pastel')
def build_toy_dataset(N=50, noise_std=0.1):
x = np.linspace(-14,14, num=N)
y = np.cos(x) + np.random.normal(0, noise_std, size=N)
x = x.astype(np.float32).reshape((N, 1))
y = y.astype(np.float32)
return x, y
plt.scatter(build_toy_dataset(), build_toy_dataset(), c="b")
``` Next, define DNN (2layers not so deeeeeeeeep ;-)).

```def neural_network(x, W_0, W_1, b_0, b_1):
h = tf.tanh(tf.matmul(x, W_0) + b_0)
h = tf.matmul(h, W_1) + b_1
return tf.reshape(h, [-1])
ed.set_seed(794)
N = 50
D = 1
x_train, y_train = build_toy_dataset(N)
W_0 = Normal(loc=tf.zeros([D,30]), scale=tf.ones([D,30]))
W_1 = Normal(loc=tf.zeros([30,1]), scale=tf.ones([30,1]))
b_0 = Normal(loc=tf.zeros(30), scale=tf.ones(30))
b_1 = Normal(loc=tf.zeros(1), scale=tf.ones(1))
x = x_train
y = Normal(loc = neural_network(x, W_0, W_1, b_0, b_1), scale=tf.ones(N) * 0.1)
qW_0 = Normal(loc=tf.get_variable("qW_0/loc", [D,30]), scale=tf.nn.softplus(tf.get_variable("qW_0/scale", [D,30])))
qW_1 = Normal(loc=tf.get_variable("qW_1/loc", [30,1]), scale=tf.nn.softplus(tf.get_variable("qW_1/scale", [30,1])))
qb_0 = Normal(loc=tf.get_variable("qb_0/loc", ), scale=tf.nn.softplus(tf.get_variable("qb_0/scale", )))
qb_1 = Normal(loc=tf.get_variable("qb_1/loc", ), scale=tf.nn.softplus(tf.get_variable("qb_1/scale", )))

```
```rs = np.random.RandomState(0)
inputs = np.linspace(-9, 9, num=400, dtype=np.float32)
x = tf.expand_dims(inputs, 1)
mus = tf.stack([neural_network(x, qW_0.sample(), qW_1.sample(), qb_0.sample(), qb_1.sample()) for _ in range(10)])
```
```sess = ed.get_session()
tf.global_variables_initializer().run()
outputs = mus.eval()

fig = plt.figure(figsize=(10, 6))
ax.set_title('iteration: 0')
ax.plot(x_train, y_train, 'ks', alpha=0.5, label=('x', 'y'))
ax.plot(inputs, outputs.T, 'b', lw=2, alpha=0.5, label='prior draws')
ax.plot(inputs, outputs[1:].T, 'y', lw=2, alpha=0.5)
ax.set_xlim([-10, 10])
ax.set_ylim([-3,3])
ax.legend()
plt.show()
``` ```inference = ed.KLqp({W_0: qW_0, b_0:qb_0, W_1:qW_1, b_1:qb_1}, data={y: y_train})
inference.run(n_iter=2000, n_samples=5)
```
```outputs = mus.eval()
fig = plt.figure(figsize=(10,6))
ax.set_title('iteration: 2000')
ax.plot(x_train, y_train, 'ks', alpha=0.5, label=('x', 'y'))
ax.plot(inputs, outputs.T, 'b', alpha=0.5, lw=2, label='post draws')
ax.plot(inputs, outputs[1:].T, 'y', alpha=0.5, lw=2, label='post draws')
ax.set_xlim([-14, 14])
ax.set_ylim([-3,3])
ax.legend()
plt.show()
``` 