My professor in ‘project in algorithms’ class created this problem just for this assignment. I implemented an algorithm that finds majority element in private array using “group counts” in C.
The problem is to determine whether 0 or 1 is the majority element of a private array who we don’t have an direct access to. We can only use “group counts” method, which, given 4 indices in the array, returns one out of three values. 0: There are 2 0’s and 2 1’s among the 4 indices 2: There are (1 0 and 3 1’s) or (3 0’s and 1 1) among the 4 indices 4: There are 4 0’s or 4 1’s amond the 4 indices.
My algorithm determines which is the majority element for any private array of any given length bigger than or equal to 10.
The efficiency of an obvious solution will be 0.5N where N is the length of an array. I was able to lower it to 0.45166n.
Efficiency:
- The special feature for our team is that we first divide the private array into small arrays of size 4, and take specialized approaches to gain information efficiently depending on their QCOUNT return value.
- Especially we take advantage of the fact that groups that return 4 are consisted of the same value. It lets us learn values of up to 32 indices with one QCOUNT call later.
General Algorithm:
We divide the private array into small arrays of size 4, and get the return value of QCOUNT from each. We sort those small arrays into 3 buckets, according to their return values. We have three buckets: bucket 0, bucket 2, bucket 4. This method is called inspect4(). Doing this will require (n/4) QCOUNT calls.
- Bucket 0: (p=⅜)
- Arrays that are sorted in here are discarded.
- We increment a_total and b_total by 2.
- Bucket 2: (p=½)
- We can choose any index and say that it’s value is “a”. We call this index a_index. Call the other value b.
- Small arrays in the bucket have values (a, a, a, b) or (a, b, b, b).
- We check each small array. If the indices of that small array is x1, x2, x3, x4, then we call QCOUNT( [ a_index, x1, x2, x3 ] ). Since there are (n/2) indices in (bucket 2) and we call Qcount for every 4 indices, this adds (n/2)(¼) = (n/8) QCOUNT calls. We call this method inspect3a(). There are three different cases:
- Case 0: (p=⅜)
- It means x1, x2, x3 have values a, b, b. It also determines the value of x4 automatically b.
- a_total++;
- b_total += 3;
- Case 4: (p=⅛)
- It means x1, x2, x3 have values a, a, a. It also determines the value of x4 automatically b.
- a_total += 3;
- b_total++;
- Case 2: (p=½)
- It means x1, x2, x3 are either (a, a, b) or (b, b, b). We do not know which case it is, but we do know that x4 is a.
- a_total++;
- We need an index with value b in order to determine whether x1, x2, x3 are (a, a, b) or (b, b, b).
- We call QCOUNT( [ b_index, x1, x2, x3 ] ).
- If it returns 0, then x1, x2, x3 are (a, a, b)
- Counter_a += 2;
- Counter_b++;
- If it returns 4, then x1, x2, x3 are (b, b, b)
- Counter_b += 3;
- We call this method inspect3b()
- This adds (n/2)(½)(¾)(⅓) = (n/16) QCOUNT calls
- Bucket 4: (p=⅛)
- All the indices in these small arrays have the same value. We only need to know the value of one of those indices in order to figure out the values of the whole array.
- We choose one index out of one array. For example, if there are four small arrays [w1, w2, w3, w4] [x1, x2, x3, x4 ] [ y1, y2, y3, y4 ][ z1, z2, z3, z4 ], then we call QCOUNT ( [ w1, x1, y1, z1 ] ) by calling inspect4() again. This adds (n/8)(1⁄16) = n/128 QCOUNT calls
- Case 0: (p=⅜)
- Discard these indices
- A_total += 8
- B_total += 8
- Case 2: (p=½)
- W1, x1, y1, z1 has either (a, a, a, b) or ( a, b, b, b ) values.
- We use the same technique as bucket 2 in order to figure this out, by calling inspect3a() and inspect3b(). The only difference is that the amount we increase a_total and b_total should be multiplied by 4. We add (n/8)(½)(1⁄16)+(n/8)(½)(½)(¾)(1⁄12) = (3⁄512)n QCOUNT calls
- Case 4: (p=⅛)
- At this point, there are (1⁄64)n indices left to check and those indices are be grouped into a bigger group of 16 indices.
- We determine values of those bigger groups by calling inspect2() method, which can determine the values of two groups with one QCOUNT call. The only difference from the initial scan is that the amount we increase a_total and b_total should be multiplied by 16. We call QCOUNT once for every 32 indices, and therefore we add (1⁄2048)n QCOUNT calls.
- CAUTION: There can be one group of 16 left at the end. Use inspect1() method to handle this case.
- There can be up to 3 remaining small arrays, total of 12 remaining indices. Their values can be checked by calling inspect2() method and/or inspect1() method. On average (0~3 remainders), 1 QCOUNT call will be added.
- There can be up to 3 remaining indices that weren’t used in initial bucket sorting. This can be checked by using inspect2() method and/or inspect1() method. On average (0~3 remainders), 1 QCOUNT call will be added.
- At any point of the algorithm, if we had found which is majority element, we return the result without calling any more QCOUNT method.
- The average case efficiency is 0.45166n + 2
Methods:
- inspect4()
- Explained above. Calls QCOUNT with 4 undetermined indices.
- inspect3a()
- Explained above. Calls QCOUNT with 1 known index and 3 undetermined indices out of 4 indices that returned 2 from inspect4().
- Even though we are not including 1 index out of the group of 4 indices in the array we call QCOUNT with, depending on the return value of the QCOUNT call, we learn the value of the last index automatically.
- Since we learn here a value of a specific index, we record that information for later use. (We need to know 3 indices of value a in order to call inspect2() and inspect1(). In addition, we need to know a specific index with value b in order to call inspect3b().)
- inspect3b()
- Explained above. Called upon undetermined indices that returned 2 in inspect3a().
- inspect2()
- We keep two counter value for a and b.
- We inspect two indices with one qcount call.
- If it returns 4, then the two new indices have value a.
- We increment a_total by 2.
- If it returns 2, then the two new indices are a and b.
- We increment both a_total and b_total by 1.
- If it returns 0, then the two new indices have value b.
- We increment b_total by 2.
- We record that these two indices have value b for later use.
- inspect1()
- We inspect one index with one qcount call.
- If it returns 4 or -99, then we know the new index is a.
- a_total ++;
- If it returns 2, then the new index is b.
- b_total ++;
- We record that this index has value b for later use.
Helper functions that ensure correctness of the program in edge cases:
- get_a()
- Called when we need to pick on index to say that it has value a.
- Making sure we can call inspect3a() and subsequent functions.
- Picks an index from any of all the possibilities from first scan:
- (Case0, case2, case4, and remainders.)
- It chooses one depending on availability and efficiency.
- ensure_a_indices_count_3()
- Called when we need to know 3 indices of value a in order to proceed.
- Making sure that we can call inspect2() and inspect1().
- If we do not already know those indices,
- If we know 3 indices of b, we call swap_ab() which swaps all the information the algorithm has learned about a and b. We can continue without needing to call any more QCOUNT
- Otherwise, using minimun 1 maximum 5 QCOUNT calls, chooses 3 indices of same value out of first 5 indices. The existence of 3 indices of same value out of 5 indices is guaranteed. Then we use one more QCOUNT call to see if they are value a. If not, we call swap_ab() in order to make their value a.
- inspect_the_rest()
- Called when the algorithm had not found an index of value b.
- Inspects the rest of indices without ever calling functions that need to know an index of value b.
- find_b_index()
- Called when the majority element is b but we do not know any index of value b.
- We cannot call inspect3b() in this case.
- Inspect 2 indices with one QCOUNT call until we find an index of value b.
- scan_case4_remainders()
- Calls inspect2() and inspect1() on the remainders from bucket 4 that couldn’t be grouped into bigger group
- handle_remainder_from_first_scan()
- Calls inspect2() and inspect1 on the remainders from initial scanning. There can be up to 3 remainders that couldn’t be group into a group of 4 indices.
- initialize_variables()
- Initialize all global variable so that each mysub() call will return correct result.
Average case efficiency from algorithmic theory:
- Initial scan : n/4
- We scan n indices, 4 at a time
- Scan_case2 : n/8
- We scan (n/2) indices, 4 at a time
- Scan_case22 : n/16
- We scan (3n/16) indices, 3 at a time
- Scan_case4 : n/128
- We scan (n/8) indices, 16 at a time
- Scan_case42: n/256
- We scan (n/16) indices, 16 at a time
- Scan_case422 : n/512
- We scan (3n/128) indices, 12 at a time
- Scan_case44 : n/2048
- We scan (n/64) indices, 32 at a time
- 2 additional QCOUNT calls on average.
- From the remainder from first scan, there can be 0 ~ 3 remainders.
- 0 remainders: 0 QCOUNT calls
- 1 remainders: 1 QCOUNT calls ( inspect1() )
- 2 remainders: 1 QCOUNT calls ( inspect2() )
- 3 remainders: 2 QCOUNT calls ( inspect2() + inspect1() )
- Therefore, 1 QCOUNT call on average
- From the remainder from case4, there can be 0 ~ 3 groups left.
- This adds 1 QCOUNT call on average, using the same logic
- f( n ) = ( 925 / 2048 )n + 2 = 0.45166n + 2
- Example values of f:
- f( 20 ) = 11.03
- f( 200 ) = 92.33
- f( 2000 ) = 905.32