0 = 1, 1 = 3, 2 = 5, 100 = 201. What is going on? Ruby has object_ids on objects but they are shorter on small integers. Fire up irb and follow along:
> obj = Object.new
=> #<Object:0x100362a30>
> obj.object_id
=> 2149258520
The value of object_id is the normal looking ids I’ve seen in the past. But then I did this:
0.object_id
=> 1
> 1.object_id
=> 3
> 2.object_id
=> 5
> 100.object_id
=> 201
Ok, why the values? Why is 0’s id 1 and 100’s id 201? It looks like it’s doubling the value and adding one but that seems a little random and probably not what it’s actually doing. So I read a bit by Caleb Tennis on Oreilly and found a tip off that it’s related to binary:
Thus 0×0101 (5) becomes 0×1011 (11).
Notice that Caleb is noting that “101” is shifting left and adding 1 at the lowest bit. So then this should work:
x = 0
puts x.object_id == ( (x << 1) + 1 )
true
=> nil
And it does. We can set a bigger number and run it again and it still works.
x = 65536
puts x.object_id == ( (x << 1) + 1 )
true
=> nil
I will call this predictable behavior later. x.object_id is 131073 on both sides because we know how object_id is generated:
x = 65536
=> 65536
> puts x.object_id
131073
> puts ( (x << 1) + 1 )
131073
So what is going on? Well first, let’s print out the binary. Taken from icfun.blogspot.com, we’ll define a number to binary string method in irb.
def dec2bin(number)
number = Integer(number);
if(number == 0)
return 0;
end
ret_bin = "";
## Untill val is zero, convert it into binary format
while(number != 0)
ret_bin = String(number % 2) + ret_bin;
number = number / 2;
end
return ret_bin;
end
We can test that it works with any integer. Let’s test with the number 5 like this: puts dec2bin(5)
. Prints out 101. If we shift left by one bit like this: dec2bin((5 << 1))
then we’ll get 1010. Add 1 bit and we get the binary of eleven: 1011. We can see if this is eleven with dec2bin(11) and we get “1011” which is what we expected. It makes sense.
So wtf. Why know this stuff? Well, when doing object comparisons, I always expected some garbage or randomness and length similar to a hash. Like when doing (Object.new).object_id
but this isn’t dependable or consistent when getting the object_id of a Fixnum. And even that is not consistent. The examples above are using small numbers. When we try this:
x = 4; puts x.object_id == ( (x << 1) + 1 )
true
It works fine. 4 is small. Let’s call this “predictable”. We can bit shift and add 1 to get the same value as ruby does with object_id.
But a big enough number like 5000000000000000000 is false. So what’s the inflection point? Maybe the inflection point is 2^32 because it’s memory related:
x = 4294967296; puts x.object_id == ( (x << 1) + 1 )
Nope. Strange.
So I played around manually and started finding that 4 quintillion is true but 5 quintillion is false. Weird. If it’s memory related, it’s no number I recognize. So we know that there’s some inflection point between 4 quintillion and 5 quintillion. I’m not about to figure this thing out by hand. Let’s write a program to find our magic number.
Note: the current version (if any) is on github.
# find the inflection point of how ruby calculates object_ids predictably
# for example:
# x = 4; x.object_id == ( (x << 1) + 1 )
# => true
# however,
# x = 5000000000000000000; x.object_id == ( (x << 1) + 1 )
# => false
# answer = 4611686018427387903
# we'll start with a number that's close
starting_number = 4000000000000000000
# this will be the number we'll try with
current_number = starting_number
# our decimal position that will be used for the loop
# we don't need to start with 4, which is starting_nmber[0..0]
index = 1
digit_at_index = current_number.to_s[index..index].to_i
# digit state
digit_second = 0
digit_third = 0
# vector for current_number
# true = up, false = down
direction = true
last_direction = direction
changed_directions = 0
# have we exhausted all digits for the current rank/position
# if so, we move on to the next position
digit_done = false
jump = 5
# is our result the same as shift left plus one?
def predictable?(number)
number.object_id == ( (number << 1) + 1 )
end
# go until we've iterated along the length of starting_number
while (index <= starting_number.to_s.length - 1)
digit_done = false
digit_at_index = current_number.to_s[index..index].to_i
# shift our cheap history variables
digit_third = digit_second
digit_second = digit_at_index
# this tests whether we went over our solution
if predictable?(current_number.to_i)
# if true, try incrementing but only if we can later
direction = true
else
# if false, number is too high
direction = false
end
#puts "index: #{index}"
if last_direction != direction
jump -= jump / 2
changed_directions += 1
else
jump -= 1 unless jump == 1
end
last_direction = direction
# split the distance
# if we start with 0, this becomes 5 if going up
# if we start with 5, this becomes 3 if going down
# it's a half to target number
# increase
if direction
digit_at_index += jump unless digit_at_index == 9
#puts digit_at_index
if digit_at_index == 9 && digit_second == 9
digit_done = true
else
digit_done = false
end
# decrease
else
digit_at_index -= jump unless digit_at_index == 0
if digit_at_index == 0 && digit_second == 0
digit_done = true
else
digit_done = false
end
end
# if we flip-flop back and forth between finding our number is too high
# or too low then our number is probably in the middle
# 4 swaps will leave us with the lower number
# which then the next digit will need to increase
# TODO: if this flops the wrong way, the algorithm breaks.
if changed_directions == 4
digit_done = true
digit_at_index = digit_second
end
# substitute our done digit in place
current_number_string = current_number.to_s
current_number_array = current_number_string.chars.to_a
current_number_array[index] = digit_at_index
current_number = current_number_array.join.to_i
# move on to the next digit
if digit_done
index += 1
digit_at_index = current_number.to_s[index..index].to_i
digit_ceiling = 10
digit_floor = 0
jump = 5
changed_directions = 0
direction = true
digit_second = 0
digit_third = 0
end
puts current_number
# bug avoid
if current_number.to_s.length != starting_number.to_s.length
exit
end
# not needed but useful for watching how the algorithm works
sleep 0.1
end
When we run this little number searcher we find our overflow (or inflection) point. It will run for a while, finding each digit like this:
4400000000000000000
4700000000000000000
4500000000000000000
.
.
.
4611686018427387900
4611686018427387904
4611686018427387902
4611686018427387903
4611686018427387904
4611686018427387903
4611686018427387903
Eventually we’ll have a final value of 4611686018427387903 which we can test like this:
4611686018427387903.object_id => 9223372036854775807
4611686018427387904.object_id => 2152612560
You can see it overflow making 9223372036854775807 the largest object_id in Ruby.
This program is not very efficient or well written. There’s a better way to find a number but I decided to go with a digit-by-digit algorithm. It was not very easy and quick to write. I nearly scrapped it and did it the right way but managed to get it working around midnight one night. I hope it makes for a good (or anti-pattern) example.