Given an array a that contains only numbers in the range from 1 to a.length, find the first duplicate number for which the second occurrence has the minimal index. In other words, if there are more than 1 duplicated numbers, return the number for which the second occurrence has a smaller index than the second occurrence of the other number does. If there are no such elements, return -1.
We’re also given the constraints that our solution should be O(n) time and O(1) space.
A first attempt
Every statement in a problem like this typically has some meaning. So the knowledge that our input array
a “contains only numbers in the range from 1 to a.length” is a clue. Positive integers, that can’t exceed the size of the array. Clearly, we’re expected to use the array values as array indexes somehow.
One solution would be to walk the array, keeping track of whether we’ve seen each value in a seperate hash or array. That solution would look something like this:
def first_duplicate(a) counts =  a.each do |value| return value if counts[value] counts[value] = true end -1 end
The good news is that this solution 1) works and 2) is O(n) runtime (we only traverse the array once). Unfortunately, it’s also O(n) space so we know it could be more efficient. Let’s try a slightly different approach.
Refining the algorithm
Up until now, we’ve assumed that knowledge about whether a value has been “seen” by our pointer has to be stored in some other variable. We need the left most duplicate, so we can’t sort the array first which might make things easier.
Coming back to our first clue in the description of the challenge, we know that all our input numbers are positive. So, we actually have an extra piece of data in the original array we can play with: it’s sign. We can walk the array, looking up the value at a[value] for each one, and changing it to it’s negative counterpart. If, when we examine a value, we find it’s already negative, we know that we’ve already seen it and it’s therefore a duplicate.
So, let’s try again:
def first_duplicate(a) a.each do |value| return value.abs if a[value.abs - 1] < 0 a[value.abs - 1] = -a[value.abs - 1] end -1 end
Performance and memory optimized!
Sprinkle some design on it
In real life, if we needed this algorithm we’d probably want a better interface. A more Ruby way of implementing this might be to monkey patch the
class Array def first_duplicate # implementation end end
Then we can call
a.first_duplicate instead. Note that if you’re going to do this, you’ll want to use something like the first implementation of the code, since the more memory efficient version is destructive.
But monkey patching in this way is potentially dangerous. A different approach we could take is to create a module and include that in any classes we want to be able to use it.
module Duplicable def first_duplicate # implementation end end # now include it in other classes class Array include Duplicable end
This approach has the advantage of 1) containing our logic in its own module and 2) making it reusable. We could include this module in as many classes as we want, as long as they quack
Any other approaches?
How would you solve this? Let me know if I missed anything or how you might take it in a different direction.
(Originally appeared on Medium).
I’m a Sr. Software Engineer with Voom/Airbus from Seattle. I love building software with an eye for quality and writing about the process.