Discrete Bayesian Networks

Network initialization

If all the variables in dataset are discrete, DiscreteBN is recommended to use. To initialize a DiscreteBN object, you can use the following code:

import bamt.networks as networks

bn = networks.DiscreteBN()

Data Preprocessing

If the dataset contains float values (e.g. 1.0, 2.0 etc), they should be converted to integers or discretized before using DiscreteBN. Before applying any structure or parametric learning, the data should be preprocessed as follows:

import bamt.Preprocessor as pp
import pandas as pd
from sklearn import preprocessing

data = pd.read_csv('data.csv')

encoder = preprocessing.LabelEncoder()
discretizer = preprocessing.KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='quantile')

p = pp.Preprocessor([('encoder', encoder), ('discretizer', discretizer)])
discretized_data, est = p.apply(data)

info = p.info

Structure Learning

For structure learning of discrete BNs, bn.add_nodes() and bn.add_edges() methods should be used.

from pgmpy.estimators import K2

bn.add_nodes(info) # add nodes from info obtained from preprocessing

bn.get_info() # to make sure that the network recognizes the variables as discrete

params = {
            # Defines initial nodes of the network, list of node names
           'init_nodes':[...]
            # Defines initial edges of the network, list of tuples (node1, node2)
           'init_edges':[...]
            # Strictly set edges where algoritm must learn, list of tuples (node1, node2)
           'white_list':[...]
           # blacklist edges, list of tuples (node1, node2)
           'bl_add':[...]
            # Allow algorithm to remove edges defined by user, bool
           'remove_init_edges':True
          }

# Structure learning using K2Score and parameters defined above
bn.add_edges(discretized_data, scoring_function=('K2', K2), params=params)

bn.plot('foo.html') # Plot the network, save it to foo.html, NOT rendered in notebook

Parametric Learning

For parametric learning of discrete BNs, bn.fit_parameters() method is used.

bn.fit_parameters(data)

bn.get_info() # get information table about the network