Fading

When a Bayes net is supposed to capture relationships between variables in a world which is constantly changing, it is useful to treat more recent cases with a higher weight than older ones.  An example might be an adaptive Bayes net that is constantly receiving new cases and doing inferences while it slowly changes to match a changing world.

Netica achieves this partial forgetting of the past by using fading.  You do regular learning from cases as the cases arrive, and every so often you select the nodes to be faded, choose Table Fade, and enter a degree from 0 to 1.  Netica will reduce the experience and smooth the probabilities of the selected nodes by an amount dictated by the degree, with 0 having no effect, and 1 creating uniform distributions with no experience (thereby undoing all previous learning).  Then when you continue to learn new cases, they will effectively be weighted more than the cases you just faded.

Fading once with degree = 1 d, and again with degree = 1 f, is equivalent to a single fading with degree = 1 df.  So the effects of multiple fadings accumulate as they should.  To be most accurate you would fade a very small amount after each case, but for all practical purposes you can just fade a larger amount after a batch of cases.

If an occurrence time for each case is known, and the cases are learned sequentially through time, then the amount of fading to be done is: degree = 1 – r ^ Dt where Dt is the amount of time since the last fading was done, and r is a positive number less than (but close to) 1, and depends on the units of time and how quickly the environment is changing.  Different nodes may require different values of r.

During fading, each of the probabilities in the node’s conditional probability table (CPT) is modified as follows (where prob and exper are the old values of probability and experience, and prob' and exper' are the new values):

prob' = normalize (prob * exper * (1 - degree) + degree)

exper’ is obtained as the normalization factor from above (remember that there is one experience number per vector of probabilities). So:

prob' * exper' = prob * exper * (1 - degree) + degree