Continuous Bayesian Networks
Network initialization
If all the variables in dataset are continuous, ContinuousBN is recommended to use.
To initialize a ContinuousBN object, you can use the following code:
from bamt.networks.continuous_bn import ContinuousBN
bn = ContinuousBN(use_mixture=True)
ContinuousBN has an additional parameter use_mixture.
It is used to determine whether to use mixuters of Gaussian distributions to represent the conditional distribution of continuous variables.
If use_mixture is True, mixuters of Gaussian distributions are used to represent the conditional distribution of continuous variables.
Data Preprocessing
If the dataset contains integer values that should be treated as continuous variables (e.g. 1, 2 etc), they should be converted to float.
Before applying any structure or parametric learning, the data should be preprocessed as follows:
import bamt.Preprocessor as pp
import pandas as pd
from sklearn import preprocessing
data = pd.read_csv('data.csv')
encoder = preprocessing.LabelEncoder()
discretizer = preprocessing.KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='quantile')
p = pp.Preprocessor([('encoder', encoder), ('discretizer', discretizer)])
discretized_data, est = p.apply(data)
info = p.info
Structure Learning
For structure learning of continuous BNs, bn.add_nodes() and bn.add_edges() methods are used.
from pgmpy.estimators import K2Score
bn.add_nodes(info) # add nodes from info obtained from preprocessing
bn.get_info() # to make sure that the network recognizes the variables as continuous
params = {
# Defines initial nodes of the network, list of node names
'init_nodes':[...]
# Defines initial edges of the network, list of tuples (node1, node2)
'init_edges':[...]
# Strictly set edges where algoritm must learn, list of tuples (node1, node2)
'white_list':[...]
# blacklist edges, list of tuples (node1, node2)
'bl_add':[...]
# Allow algorithm to remove edges defined by user, bool
'remove_init_edges':True
}
# Structure learning using K2Score and parameters defined above
bn.add_edges(discretized_data, scoring_function=('K2', K2Score), params=params)
bn.plot('foo.html') # add nodes from info obtained from preprocessing
Parametric Learning
For parametric learning of BNs, bn.fit_parameters() method is used.
bn.fit_parameters(data)
bn.get_info() # get information table about the network