HotLink Menu in RightSide Column 
Last Updated 6272010 


Normalized Information DistanceThe Normalized Information Distance is a universal distance measurement for many different objects. Since the Normalized Information Distance is based on Kolmogorov complexity, the actual distance value cannot be computer, nevertheless, there are still many ways to take advantage of the concept. First of all, compression algorithms can be used to approximate Kolmogorov complexity, as long as the objects being compressed can have a string representation. Second, page count statistics obtained from the World Wide Web can be used for names and even abstract concepts. These practical applications of the normalized information distance are then applied to machine learning tasks to assist in tasks such as data mining. We will explore both the theoretical and practical implications of Normalized Information Distance. Real world applications include music clustering, bioinformatics and machine translation. Kolmogorov complexity measures the absolute information content of individual objects. In the tasks of clustering and data mining, it is also necessary to measure the absolute information distance between individual objects. Measuring absolute information distance is a universal requirement for determining a distance between two objects x and y. The distance between two string is defined as the minimum quantity of information required to translate x into y or vice versa. Unfortunately, because of the universal requirement that this information be based on Kolmogorov complexity, the distance is uncomputable. To get around this dilemma, several formulas and approaches have been developed that come close to approximating Kolmogorov complexity.
Continuing along the line that Kolmogorov Complexity is the length of the shortest program that outputs x on input y, has some interesting implications. One of the implications is that if x and y are concatenated, we still have two distinct objects, just concatenated. When we concatenate these programs, then, run them through a compressor, and compare the output of the concatenated result with the individual components, we now have a metric for the similarity between x and y. As x and y become more similary, the output of the concatenation of x and y, will tend to be the same as x and y compressed and unconcatenated. Normalized Compression DistanceThe Normalized Information Distance is impractical since it cannot be computed. To overcome this limitation, a practical form of the Normalized Information Distance was developed. This practical form is known as the Normalized Compression Distance. The Normalized Compression Distance utilizes the properties of data compressors (gzip, bzip, & PPMZ) to replace Kolmogorov complexity with a practical value. Making the proper substitutions gives us the following formula:
where:
PhylogeniesDNA sequences seem ideally suited for the Normalized Compression Distance technique. A DNA sequence is a finite string with a 4Letter alphabet: {A,C,G,T}. A hotly debated topic in biology is which 2 of the 3 placental mammalian groups: Rodents, Ferungulates, and Primates are most closely related. The test involves placing the mitochondrial gene sequences of several mammals in a standard data compressor such as PPMZ, and creating a distance matrix. The values in the distance matrix are then used to reconstruct a phylogeny tree giving an NCD classification tree. The results of this experiment are illustrated in the following diagram:
Our agent has now adopted the goal of driving to Bucharest, and is considering where to go from Arad. There are three roads out of Arad, one toward Sibu, one to Timisoara, and one to Zerind. None of these achieves the goal, so unless the agent is very familiar with the geography of Romania, it will not know which road to follow. In other words, the agent will not know which of its possible actions is best, because it does not yet know enough about the state that results from taking each action. If the agent has no additional information  i.e., if the environment is unknown in the sense defined in Section 2.3  then it has no choice but to try one of the actions at random. But suppose the agent has a map of Romania. The point of a map is to provide the agent with information about the states it might get itself into, and the actions it can take. The agent can use this information to consider subsequent stages of a hypothetical journey via each of the three towns, trying to find a journey that eventually gets to Bucharest. Once it has found a path on the map from Arad to Bucharest, it can achieve its goal by carrying out the driving actions that correspond to the legs of the journey. In general, an agent with several immediate options of unknown value can decide what to do first by examining future actions that eventually lead to states of known value. To be even more specific about what we mean by "examining future actions," we have to be more specific about properties of the environment. For now, we will assume that the environment is observable, so that the agent always knows the current state. For the agent driving in Romania, it's reasonable to suppose that each city on the map has a sign indicating its presence to arriving drivers. We will also assume the environment is discrete, so that at any given state there are only finitely many actions to choose from. This is true for navigating in Romania because each city is connected to a small number of other cities. We will assume the environment is known, so that the agent knows which states are reached by each action. Next, we assume our map is accurate. And finally, we assume that the environment is deterministic, so that each action has exactly one outcome. Under ideal conditions, this is true for the agent in Romania  it means that if it chooses to drive from Arad to Sibiu, it does end up in Sibiu. Under these assumptions, the solution to any problem is a fixed sequence of actions. The process of looking for a sequence of actions that reaches the goal is called a search. A search algorithm takes a problem as input and returns a solution in the form of an action sequence. Once a solution is found, the actions it recommends can be carried out. This is called the execution phase. Figure Agent1 gives us a simple formulate, search, execute design for the agent:
After formulating a goal and a problem to solve, the agent calls a search procedure to solve it. It then uses the solution to guide its actions, doing whatever the solution recommends as the next thing to do  typically the first action of the sequence  and then removing that step from the sequence. Once the solution has been executed, the agent will formulate a new goal. Notice that while the agent is executing the solution sequence it ignores its percepts when choosing an action because it knows in advance what they will be. An agent that carries out its plans with its eyes closed, so to speak, must be quite certain of what is going on. Control theorists call this an openloop system, because ignoring the percepts breaks the loop between agent and environment. We first describe the process of problem formulation, and then present various algorithms for the Search function. In order to continue with this guide, and continue the discussion on WellDefined Problems and Solutions: Press the Button below:

Big O Notation NP Completeness Intelligent Agents Formulating Problems Toy Problems Vacuum World 8 Queens The Knuth Sequence Real World Problems Route Finding Problems Touring Problems Traveling Salesperson Problem VLSI Layout Robot Navigation Automatic Assembly Sequencing Search Algorithm Infrastructure Child Node Creation Measuring Algorithm Performance BreadthFirst Search BreadthFirst Characteristics C++ BreadthFirst Search Code BreadthFirst Data Structure BreadthFirst Make Map BreadthFirst Make Neighbor BreadthFirst Point of Origin BreadthFirst Initial Listing BreadthFirst Get Child City C++ BreadthFirst Search Code UniformCost Search DepthFirst C++ Solution DepthFirst Data Structure DepthFirst Display Data DepthFirst Initial Listing DepthFirst Path Record DepthFirst Main() DepthLimited Search Bidirectional Search Comparing Strategies Greedy BestFirst A* Search Optimality of A* A* Search Data Structure A* Search Data Entry C++ Min Priority Queue C++ A* Headers & Prototypes C++ A* Search Main() 