EBB problem 1: Remove duplicates from an array

How I program thee, let me count the ways. EBB.

PROBLEM: Write a function that if sent an array (or vector<int>) will return a new array that contains the unique values represented in the sent array.  In other words all the duplicates have been removed.

This is a classic interview problem that is worth studying from different perspectives.  We can program this very educational problem in a variety of ways that have complexities from n2  , nlog n down to n.   In these problems we assume that the array sent to the function is not modified.  The problem also comes in several flavors. Dynamic memory array vrs vector<int>, order requirements, value restrictions etc. Here are a few possible solutions to this problem.  There are others although most are probably variations of those below.

Flavor : The returned array is not in the same order as they occurred in the original.

// RemoveDups2.cpp : Remove dumps in nlog n time using vectors
vector<int> RemoveDups2(const vector<int>& M) {
 vector<int> L = M;
 sort(L.begin(), L.end());
 auto last = std::unique(L.begin(), L.end());
 L.erase(last, L.end());
 return L;
}

Flavor : The range of values are restricted. eg 0 to 100

// Here we use a bitset to mark repeats when the // data values are from 0 to 100 in size; The restricted // range of values allows us to use a bitset to mark repeats; // We also keep the same order that occurred in M // Runs in O(n) time;
 vector<int> RemoveDups2(const vector<int>& M) {
    vector<int> L; //the result will be placed here
    bitset<101> seen(0);
    for (int i = 0; i < M.size(); i++) {
       if (!seen[M[i]]) {// not seen then copy over and mark
       L.push_back(M[i]);
       seen[M[i]] = 1;
      }
    }
    return L;
 }

Flavor : The returned array is in the same order that they occurred in the original.

// Here is the usual O(n2) squared algorithm that is often
// the first attempt by a student; Inefficient 
// Here we use an unused value to mark repeats in the original array
// If the values are greater than or equal to 0 we can use -1
// NOTE:If we send in a value M instead a & param then the copy is not required
// This solution also keeps the same order
vector<int> RemoveDups2(const vector<int>& M) {
   vector<int> L = M; // copy M so we can modify the copy
   vector<int> R;
   for (int i = 0; i < L.size(); i++)
      for (int j = i + 1; j < L.size(); j++) {// mark out copies
         if (L[j] == L[i] && L[j]!=-1) L[j] = -1;
      }
   for (int i = 0; i < L.size(); i++) {
       if (L[i] != -1)R.push_back(L[i]);
   }
 return R;}
// Here we keep the original order and search only
// the unique item list to see if it has been found
// Its worst case complexity is O(n2) but average 
// case is dependent on the number of unique values
// that occur in M since we search Ans and not M.
vector<int> RemoveDups2(const vector<int>& M) {
   vector<int> Ans; //the result will be placed here
   bool notthere;
   Ans.push_back(M[0]);
   for (int i = 1; i < M.size(); i++) {
       // if M[i] is not in A add it.
       notthere = true;
       // this loop is dependent of the number of unique
 // values seen so far.
       for (int j = 0; j < Ans.size(); j++) {
            if (M[i] == Ans[j]) {
               notthere = false;
               break;
            }
       }
       if (notthere)Ans.push_back(M[i]);
    }
     return Ans;
}
/////////////////////////////////////////////////////////
// In this case we will use a hash table to attack the
// problem. Consequently there is no restriction on the
// range of values in M. Order n complexity.
// We us STL's unordered_set. This is probably one of the best solutions // using the STL support. Be sure and include #include <unordered_set>
vector<int> RemoveDups2(const vector<int>& M) {
   unordered_set<int> uniq;
   vector<int> Ans;
   for (int i = 0; i < M.size(); i++) {
     if (uniq.count(M[i]) == 0) {// not in set
        Ans.push_back(M[i]);
        uniq.insert(M[i]);
     }
   }
 return Ans;
}

Comments are closed.