An Empirical Study of Binary Search Trees

Stage 1: Create the BST class and implement and test 1,2,3(you may use my ppt delete and other algorithms I have shown you, but only if  you understand them!) and 4 by wednesday.  This will kick you into gear.

Stage 2: Implement 5 and 6 by friday.

Stage 3: Add and test IPL functions by wed and write the RandDelInsPair(int dType) method.

Stage 4: Add the main processing code, example given below, to generate the Asymmetric  and symmetric data by friday.

Stage 5: Write paper  and turn it in with the program by due date.

Problem Definition: In this project you are to first read An Empirical Study of Insertion and Deletion in Binary Search Trees by Jeffrey Eppinger.  This is a delightful paper well worth your reading.  You are to write a simulation program that collects data using dynamic insertion and deletions to a BST tree that if graphed in excel recreates the graphs for 128 and 256 nodes in figures 5 and 9 of this paper.    Each point plotted on the graphs should be determined by averaging at least 100 iterations(runs). Have your program generate these point values for each case and write its output to a file to be read by excel (or some other package) for the purpose of graphing. Graph each and turn in a report, with embedded graphs, comparing your results with those found in the paper. The graphs should be properly labeled. Do the above for both 128 and 256 node trees and use 100,000 insertion deletion pairs for each run. The main problem you need to consider is the selection of a node in the tree to delete. This must be very efficient(i.e. O(1)time ) and uniformly  random.  We will create a vector to hold the values that exist in the tree and update this vector every time  you update the tree in appropriate methods.  In writing the above you are to first create a tree class that contains both the tree and the array that holds the values.   Here is the .h file for the tree.  Note that in the following class I augmented the BST class with a vector that holds the values in the tree. This is not normally done in a tree class. This class is for research purposes and if used for a normal tree BST the code that modifies variables associated with the vector needs to be removed.  Probably the best way to do this is to create a new class that is derived from a normal BST class and in this new class add the vector(int) and associated methods that modify it.  We do not know about inheritance yet so I did not go that direction.  I suggest that you implement the methods in the order that I suggest below.  Test each method CAREFULLY before continuing to the next one.  A small error in Sdelete or Pdelete can create unbelievable havoc. Test by adding  a small number of nodes in a variety of ways and use inorder() to print the values. If something is wrong you can add support methods such as NumNodes() to give you the number of nodes in the tree.  For example I had an error where the number of nodes in the tree was not the same as the number of nodes in the vector and this helped me.

Extra Credit: If you get this running early you can add a third delete method that is even better that the symmetric delete.  It first requires that you add a _nodeCt to you Node object and update the two deletes to maintain this value. You can then either do a Sdelete or Pdelete on the node to be deleted depending on the size of the left and right subtrees.  If the left subtree is larger than the right subtree you can perform a Pdelete here otherwise perform a Sdelete.   This has a tendency to tend the tree toward a more balanced one faster.  Don’t do this unless the rest of the program is completed.  You will only get credit for this is the other two methods work correctly.

class BST
   struct treeNode;
   typedef treeNode* pTreeNode;
   vector<int> valueList;// holds the values presently in the tree. Normally not in a tree
   int _size;//number of values in tree and in vector;
   struct treeNode {
     int _key;
     pTreeNode _left;
     pTreeNode _right;
     treeNode(int key) : _key(key), _left(nullptr), _right(nullptr) {}
   pTreeNode _root;
   void insert(pTreeNode & , int );//1
   void inorder(pTreeNode tree);//2 Prints out the tree doing an inorder tree walk
   void SdeleteAux(pTreeNode &, int);//3
   void PdeleteAux(pTreeNode &, int);//5
   int  iplAux(pTreeNode & , int);// 7
   void DestroyTree(pTreeNode);//4
   BST(int);//1, constructs tree with random nodes. It calls insert() multiple times
   ~BST();// 4 ,Calls DestroyTree()
   void Insert(int value);//1, Inserts new node,adds it to the vector and increments _size
   void Inorder();//2, Inorder tree walk printout.
   void Sdelete(int);//3, successor delete from tree, does not modify size or vector
   void Pdelete(int); //5, predecessor delete from tree, does not modify size or vector // The following performs an deletion/insertion pair randomly 
 // Randomly selects value in vector , deletes it from tree and // inserts a new random value into vector and tree. Uses either Sdelete(1) or Pdelete(0)
   void RandDelInsPair(int dType);//6 If dType is 0 use Sdelete otherwise use Pdelete
   int IPL(); //7, Calls iplAux() to return the internal path length. Will discuss in class

To give you a better idea of how you can collect the data here is a snapshot of some code in main that collects data for the predecessor delete only case.  Don’t even type this in until all you methods above are working correctly. Capice!?  Note: I used these definitions.

#define TREESIZE 256
#define VALUESIZE 32000    // This is what you mod by to get a random number.
#define INSDELPAIRS 100000

BST t(TREESIZE);//create an empty tree with _size=0
 int iterations = 100;
 cout << "Data for predecessor delete only " << endl;
 vector<long long>iplData(40, 0);// 40 sample ipl values
 cout << "--------------------------------------------------" << endl;
 for (int ct = 0; ct < iterations; ct++) {//collect data 100 times to average
   if(ct%2==0)cout << ".";// A nice trick to see your program running
   BST t(TREESIZE);//create an random tree with TREESIZE nodes
   for (int i = 0; i < INSDELPAIRS; i++) {// INSDELPAIRS is 100,000
     if (i % 2500 == 0)iplData[i / 2500] += t.IPL();
     t.RandDelInsPair(1);//Im doing a Pdelete every time here
   // t gets destroyed here when it goes out of scope
 cout << endl;
 for (int i = 0; i < 40; i++)// print out the averaged values
    cout<<iplData[i] / iterations<<endl;

The format of your report should be as follows.

  1. Name and data and title at top
  2. Problem Description
  3. Problem Solution and analysis with embedded graph(combine all curves on one graph)
  4. Discuss the meaning of the graphs.
  5. Conclusion

Write a single very well documented program that generates the above data, with all iterations averaged, given input parameters of tree size, number of deletion/insertion pairs, number of iterations and whether one is using the symmetric(alternating) or asymmetric deletions in the algorithm. In other words do not write multiple versions of the program.

Turn In: Print your well-documented source code and attach to it the above report. You do not need to turn in a disk. This project cannot be whipped out!

Start on it NOW!!!! Remember: Keep Backups of all you DO!!

Comments are closed.