Machine Reasoning for Decision Making by Information Entropy Minimum Principle
(An algorithmic approach for inductive reasoning, the way we think and judge)

Dr. Charles Kim (

Professor of Electrical Engineering and Computer Science

Howard University


Provision of quantitative threat level given diverse datasets and information requires an intelligent system which extracts dominant contributors and learns and updates as new data is added to the datasets.  The machine reasoning system keeps, upon existing and updated datasets, extracting dominant contributory attributes, generating rules for outcome (True/False, Good/Bad, Threat/NoThreat) determination with the attributes, and producing the probability of the rules themselves along with margins of error. 

The main theory behind dominant attribute discovery and decision rule extraction from datasets is the information entropy minimum principle. “Information measure” (I) is defined as proportional to the negative of the logarithm of probability (p), with k a constant: I = -k*ln(p).  Information entropy (S) is defined as the expected value of information: S= -k*p*ln(p). In the entropy minimum state, all of the information has been extracted, and there is no information gain, leading to maximum certainty.

The first step in determining the dominant attribute is to convert all analog-valued sample data to binary valued data.  The “binarization” is performed by threshold calculation.  The calculated threshold value with minimum conditional entropy optimizes the separation of two outcomes, Threat (T) and No-Threat (F). The conditional entropy S(x), which, for a chosen value x, is defined with conditional probabilities of two outcomes, T and F, under 2 conditions (one for a sample value lower (x-) than a certain threshold value x and the other greater (x+) than that) as, 

S(x) = -p(x-) [p(T|x-)ln(p(T|x-)) + p(F|x-)ln(p(F|x-)) -p(x+) [p(T|x+)ln(p(T|x+)) + p(F|x+)ln(p(F|x+)).

 A binarized sample data is obtained after converting the analog data into binary values, 1 for sample values above the threshold, 0 for below the threshold.   The same conditional entropy can be applied to determine dominant attributes in correlating an attribute to the outcomes.  A conditional entropy equation for the ith attribute, Si, for T or F under 0 or 1 attribute value is as follows:

Si = -pi(0) [pi(T|0)ln(pi(T|0)) + pi(F|0)ln(pi(F|0)) -pi(1) [pi(T|1)ln(pi(T|1)) + pi(F|1)ln(pi(F|1)).

After applying the conditional entropy to all m attributes, a certain attribute Ak which produces the minimum conditional entropy will be the best attribute in correlating the sample data to the outcomes.    Then the decision rule, Rk for the attribute k, can be drawn from the best (highest) conditional probability from the set of four: pk(T|1),  pk(F|1), pk(T|0), and pk(F|0). 

If, for example, pk(T|1) is the highest from the set, then the decision rule is formed as follows:  

Rk: IF (Ak = 1), THEN (T). 

In this step, the probability (or certainty) of this decision rule itself is generated from the maximum entropy based Bayes estimate by <p(O)> = {x + 1 }/ {n + 2}, where, x is the total number of samples satisfying the condition (T|1), and  n is the total number of samples satisfying the attribute condition. Also, the margin of error of the drawn probability is obtained by e(O)= z*[{ <p(O)>* (1 – <p(O)>}/{n+2)] ,where z is a z-score value for desired confidence interval.

Usually, not all samples can be directly linked to a single decision rule.  Therefore, we apply step-wise approximation by which, after the first attribute and its corresponding decision rule are found, we remove all the samples which match the decision from the binarized dataset and we repeat the conditional entropy minimum process for the remaining data samples. 

We tested the implemented machine reasoning system with a few example datasets including Political Regime Characteristics and TransitionsBehavioral Risk Factor Surveillance System of CDC, and the Profiles of Individual Radicalization in the United Status (PIRUS) of the National Consortium for Study of Terrorism and Response to Terrorism.


Application Areas: (1) Threat level determination in Irregular Warfare and Counterinsurgency with human terrain data; (2) Dominant behavior discovery in insider threat detection and monitoring, resiliency, and diagnostics; (3) Radicalization detection; (4) Machine learning for Radiation Effect for parametric impact on sensitivity of electronic devices.



Whitepaper: Development of an Automated Diagnostic Rule Generation System for Mental and Behavioral Disorders

Dissertation: An intelligent decision making system for detecting high impedance faults

Article: Classification of faults and switching events by inductive reasoning and expert system methodology

Article: A learning method for use in intelligent computer relays for high impedance faults

Article: High impedance fault detection using an adaptive element model

Article: Machine reasoning for determining radiation sensitivity of semiconductor devices (The 20th International Conference on Artificial Intelligence, July 30 - August 2, 2018.  Las Vegas, NV.)

Article: Identification of symptom parameters for failure anticipation by timed-event trend analysis