Preprocess module

bamt.preprocess.discretization.get_nodes_sign(data: DataFrame) dict[source]
Function to define sign of the node

neg - if node has negative values pos - if node has only positive values

Parameters:

data (pd.DataFrame) – input dataset

Returns:

output dictionary where ‘key’ - node name and ‘value’ - sign of data

Return type:

dict

bamt.preprocess.discretization.get_nodes_type(data: DataFrame) dict[source]
Function to define the type of the node

disc - discrete node cont - continuous

Parameters:

data (pd.DataFrame) – input dataset

Returns:

output dictionary where ‘key’ - node name and ‘value’ - node type

Return type:

dict

bamt.preprocess.discretization.discretization(data: DataFrame, method: str, columns: list, bins: int = 5) Tuple[DataFrame, KBinsDiscretizer][source]

Discretization of continuous parameters

Parameters:
  • data (pd.DataFrame) – input dataset

  • method (str) – discretization approach (equal_intervals, equal_frequency, kmeans)

  • columns (list) – name of columns for discretization

  • bins (int, optional) – number of bins. Defaults to 5.

Returns:

output dataset with discretized parameters KBinsDiscretizer: fitted exemplar of discretization class

Return type:

pd.DataFrame

bamt.preprocess.discretization.label_encoding(data, columns)[source]
bamt.preprocess.discretization.onehot_encoding(data, columns)[source]
bamt.preprocess.discretization.code_categories(data: DataFrame, method: str, columns: list) Tuple[DataFrame, dict][source]

Encoding categorical parameters

Parameters:
  • data (pd.DataFrame) – input dataset

  • method (str) – method of encoding (label or onehot)

  • columns (list) – name of categorical columns

Returns:

output dataset with encoded parameters dict: dictionary with values and codes

Return type:

pd.DataFrame

bamt.preprocess.discretization.inverse_discretization(data: DataFrame, columns: list, discretizer: KBinsDiscretizer) DataFrame[source]

Inverse discretization for numeric params

Parameters:
  • data (pd.DataFrame) – input dataset with discrete values

  • columns (list) – colums for inverse_discretization

  • discretizer (KBinsDiscretizer) – fitted exemplar of discretization class

Returns:

output dataset with continuous values

Return type:

pd.DataFrame

bamt.preprocess.discretization.decode(data: DataFrame, columns: list, encoder_dict: dict) DataFrame[source]

Decoding categorical params to initial labels

Parameters:
  • data (pd.DataFrame) – input dataset with encoded params

  • columns (list) – columns for decoding

  • encoder_dict (dict) – dictionary with values and codes

Returns:

output dataset with decoded params

Return type:

pd.DataFrame

bamt.preprocess.graph.nodes_from_edges(edges: list)[source]
Retrieves all nodes from the list of edges.

Arguments

Returns

nodes : list

Effects

None

bamt.preprocess.graph.edges_to_dict(edges: list)[source]
Transfers the list of edges to the dictionary of parents.

Arguments

Returns

parents_dict : dict

Effects

None

bamt.preprocess.numpy_pandas.loc_to_DataFrame(data: array)[source]

Function to convert array to DataFrame :param data: input array :type data: np.array

Returns:

with string columns for filtering

Return type:

data (pd.DataFrame)

bamt.preprocess.numpy_pandas.get_type_numpy(data: array)[source]
Function to define the type of the columns of array

disc - discrete node cont - continuous

Parameters:

data (np.array) – input array

Returns:

output dictionary where ‘key’ - node name and ‘value’ - node type

Return type:

dict

Notes: – You may have problems with confusing rows and columns