Associate Data Source (Data Association Wizard)

Context

Open Data Source (Data Import Wizard) brings data into BayesiaLab to create a new Bayesian network, while Associate Data Source (Data Association Wizard) adds new data to a pre-existing network.

BayesiaLab can load data from flat text files (e.g., CSV, TXT) or connected databases.

Usage

  • There are a total of six steps in the Data Association Wizard, which are mostly identical to the steps in the Data Import Wizard.

  • To launch the Data Association Wizard for a data table in a

    • text file, select Main Menu > Data > Open Data Source > Text File.

    • database, select Main Menu > Data > Open Data Source > Database.

Workflow

Step 1 — Data Structure Definition

  • See Step 1 of the Data Import Wizard

Step 2 — Definition of Variable Types

  • See Step 2 of the Data Import Wizard.

  • Additionally, clicking the Unmatched Columns button displays all the columns in the database that are not in the network.

  • The Unmatched Columns window allows you to select whether to use or not use the unmatched columns from the new dataset.

Step 3 — Data Selection, Filtering, and Missing Value Processing

Step 4 — Node and Node State Association

  • This step links the variables in the dataset to the nodes of the network.

  • As such, this step depends on the three previous steps and the selection of variable types.

  • Here you can define how the variables in the to-be-associated dataset will be mapped to the nodes already in the network.

  • The following assignments are possible:

    • Discrete variable in the dataset → Discrete node in the network

    • Discrete variable in the dataset → Continuous node in the network

    • Continuous variable in the dataset → Continuous node in the network

  • If variables in the dataset have the same name and type as existing nodes in the network, BayesiaLab will automatically propose an association.

Step 5 — Discretization and Aggregation

Step 6 — Data Association Report

Workflow Illustration

You can process in the same way for the continuous node N. You can also select and add several nodes at the same time.

The zone 3 contains the buttons used to add or remove associations.

The zone 4 contains the list of associations. It can contain also added variables from the database that will be treated as new nodes in the network. A double-click on an association display, if necessary, a dialog used to edit a discrete or a continuous association. As you can see, some associations show a warning icon. This icon indicates that some unusual behaviors are present in those associations.

The zone 6 contains three buttons. The first and second buttons allow extending automatically the minimum and maximum of each continuous node that does not fit the database's limits. The third button allows filtering automatically each row that does not fit the network's limits.

Discrete Column Association

When you want to add or edit an association between a discrete column of the database and a discrete or continuous node, a dialog box appears:

The zone 3 contains the buttons to add or remove states' associations.

By default, the database's states which are the same as the network's ones, as the aggregates or as the states' long names will be automatically linked.

If filtered values exist in the database but are not declared in the network, it is possible to merge them with the specific state *, if it exists. In this case, this state will be automatically defined as filtered for each concerned node.

Continuous Column Association

When you want to add or edit an association between a continuous column of the database and a continuous node, a dialog box appears:

This dialog is displayed only if the limits of the variable from the database are outside the limits of the node from the network.

By default, the limits of the node of the network are used and all the values outside these limits will be removed from the database. If you want to keep them, use the corresponding options.

If filtered values exist in the database but are not declared in the network, it is possible to merge them with the specific state *, if it exists. In this case, this state will be automatically defined as filtered for each concerned node.

Step 5: Discretization of the Continuous Variables and State Aggregation of the Discrete Variables

This step occurs only when some columns of the database are not linked with nodes of the network but are distributed. These columns will create new nodes in the network and must be discretized if they are continuous and their states can be aggregated if they are discrete.

Same as Step 4 in Data Importation Wizard.

Step 6: Associate Report

  1. The modified nodes table:

    • For the discrete nodes, will be indicated, if necessary, the correspondence between the states in the database and in the network.

    • For the continuous nodes, will be indicated, if necessary, the initial minimum of the data and the retained final minimum and also the initial maximum and the retained final maximum.

  2. The hidden nodes table: indicates the node that are in the network and that don't have any associated data.

  3. The added nodes table: indicate the list of variables added to the network from the database. This table is the same as in the import report

Last updated

Logo

Bayesia USA

info@bayesia.us

Bayesia S.A.S.

info@bayesia.com

Bayesia Singapore

info@bayesia.com.sg

Copyright © 2024 Bayesia S.A.S., Bayesia USA, LLC, and Bayesia Singapore Pte. Ltd. All Rights Reserved.