Simple XOR: In ruby using a test driven approach
XOR works on bits. So what is a bit? Put simply, it is a 0
or a 1
. The XOR
operation compares 2 bits and if they are the same it returns a 0
. If they are different it returns a 1
. Take a look at this this figure below to get a better idea:
I decided to arm my brain with some cryptography knowledge, and XOR
is one of the first methods I am learning about. I should mention that it is noted XOR is not the best way to encipher your data. Not even close actually. So please don’t use anything you read here for anything more than an experiment.
With that disclaimer out of the way I am comfortable focusing on using a test driven approach to learn.
At any point if you are stuck are not able to follow along, take a look at the source code hosted on github. I will be implementing the methods in ruby
, and testing with rspec
. We start by creating a file xorer.rb
and adding a describe
block:
When we run rspec we see the following error:
This is because we have not yet defined an Xorer
module. So above the describe
block let’s define it:
When we run rspec this time, we see the following:
It is time to add our next test, but what should it be? Basically, I want to take a key and XOR against some plain text. So given these two strings I need to start by converting them into there binary representation. We will need to take it character by character, first turning the character into a byte or character code.
So given the character a
, the byte should be 97
. That will be our first test.
Running rspec now we should see the following:
The key to this error is the undefined method 'char_to_byte' for Xorer:Module
. Which points out the obvious, we are missing a method named char_to_byte
in our Xorer
module. So let’s add this:
When we run rspec this time, it fails with:
Basically, that it was expecting the char_to_byte
method to return 97
but instead nothing or nil
was returned. So let’s fulfill our expectation and return 97
.
…and when we run our test, it passes!
The problem is that while it passes this test, what if we passed the character b
to the char_to_byte
method? Let’s change our test to the following:
…and when we run our test, it passes! Oops… no, actually it doesn’t. It fails with:
So why did I waste your time? Well, I didn’t really waste your time. This process of TDD begins with taking a small idea and asking yourself: Do I have a test for that? If the answer is No, well the next step is to Write a test. Once you have your test, the next question you ask is: Does the test pass? If the answer is No, well the next step is to Write just enough code for the test to pass.
Then you ask again: Does the test pass? At the time when our code_to_byte
method was returning 97
and our test was expecting a 97
, we could answer yes to that question. We had written just enough code for the test to pass. So the next question would be: Do you need to refactor? Well since we know that hard coding 97
as a return value is not really changing the character to a byte, our answer is Yes.
So our next step is to Refactor the code. Let’s change our test back to pass a
, and expect the result to be 97
. Then we change the char_to_byte
method to the following:
We ask ourself Does the test pass? and to answer that question we run rspec and get the following:
Great! It passes again. So what is the ord
method we’re calling on char
? From the ruby docs we see that ord
returns the Integer ordinal of a one-character string. Which is exactly what we’re trying to do. Or is it? Well, kind of, but not exactly.
We know that a bit
is a 0
or a 1
. An On
or Off
. One of two possible values. A byte is 2 to the power of 8. Which gives us a possible 256 values, 0 to 255 (since computers start counting at 0). 97
is the ascii value of the character a
and it does fall into this range.
The problem arrises if the plain text has unicode values. For instance, given '六'.ord
(which is Chinese for “Six”) will return the value: 20845
, which is two bytes long. To keep things simple I am assuming that we will only be dealing with English characters.
So we ask ourself again Do you need to refactor? I am happy with our implementation of this feature so the answer is No. Moving on it is now time to Select a new feature to implement.
We now need to turn the byte into a binary value to work so that we can XOR it. So let’s add the following:
If you cannot count in binary you can take my word that the binary representation of 97
is 1100001
. Since we’re learning an explanation would be better.
So computers only work with 0
’s and 1
’s and 97 is a byte (value between 0 and 255 or 2 to the power of 8). Take a look at the figure below:
A bit is considered On
if it has a value of 1
and Off
if it has a value of 0
. If it is Off
we do not count it. With all the bits Off
(00000000
) the value is 0
. With all the bits On
(11111111
) the value is 255.
In the case of ‘1100001’ we only have 7 digits so the left most bit would be a 0
. So we have a 1
in the 1
column, a 1
in the 32
column, and a 1
in the 64
column.
With that out of the way let’s run our test:
We see that we have not yet defined the byte_to_binary_string
method. So in our Xorer
module let’s add that:
The Fixnum
class has a to_s
method that takes a radix base between 2 and 36. If we run our test now its passing:
We don’t need to refactor this so let’s work on our XOR feature. We need a test for it. I’m going to pop the training wheels off and step it up a notch. Assume that after adding a test I run rspec and based on the error message, add the least amount of code to make it pass.
Select a new feature to implement: xor
Do you have a test for that?
Does the test pass?
Write just enough code for the test to pass
Does the test pass?
Do you need to refactor? Based on the requirements of this test No
, we don’t need to refactor. So let’s work on our next feature.
Our xor
method works fine when given binary values like 0
in this case. What happens if we pass in binary strings like ‘1100001’ & ‘1100010’? Let’s write another test for that, but first what is the value we’re expecting supposed to be? Let’s actually XOR this by hand:
So we’re expecting the value: ‘0000011’.
Does the test pass?
Write just enough code for the test to pass
Ok, so that method just got a bit more complicated… So what are we doing here? First we need to handle either a string or a number. To do that we can just turn the values into strings with the to_s
method. Next we are calling the scan
method on the string and passing in the argument /./
. This is a regular expression. Regular expressions are outside of what I want to cover, but in this case take a deep breath and accept this explanation.
The /
’s use enclose the regular expression the way '
’s enclose a string. So the heart of this particular regular expression is the .
, which says match any character
.
The scan
method iterates through the string, matching the given pattern. For each match, a result is generated and either added to the resulting array.
Next we call the map
method on the array. map
takes a block and transforms each value in the array. In this case we are taking each digit in the string and calling the to_i
method on it. This allows us to do an ^
(XOR) on the values.
So now we create an array to hold the XOR results. We run a loop from 0 to val_1
’s size, minus 1
, using i
as the index. Finally we join the xord
array and return the resulting string.
Does the test pass?
Unfortunately when we run this our previous test fails. This is because we are now returning a string instead of an integer. Since we changed the api we need to update the failing test. So update the failing test to:
Does the test pass?
Do you need to refactor? No, this looks good. So, on to the next feature. Next we have to go the other way. From a binary string to a byte and from a byte to a char. Let’s start with converting a binary string into a byte.
Do you have a test for that?
Does the test pass?
Write just enough code for the test to pass
So what is going on here? We have a binary string and start by calling reverse
on it. So the name implies, it reverses the characters in the string. More on why we’re doing that in a moment. Next we do the same scan
& map
that we used before to create an array of binary numbers.
Next we create a count
variable and set it to 1
. Our algorithm is simple, we multiply the bit by count
and add that value to sum
. Then we multiply the value of count
by 2
and that becomes the new value of count
.
Since we’re dealing with our binary string in reverse this makes perfect sense. count
starts at 1
, then becomes 2
, then 4
, and so on…
Finally we return the sum
, which is now our byte or character integer.
Does the test pass?
Do you need to refactor? No, our test is passing and this works fine for now.
Select a new feature to implement: convert a byte to a char
Do you have a test for that?
Does the test pass?
Write just enough code for the test to pass
Here we’re using the chr
method which returns a string containing the character representation.
Does the test pass?
Do you need to refactor? No, this is actually a good stopping point for now.
In my next post I will add the ability to read in a ‘key’ (string of characters) and a file containing ‘plain text’ from the command line. We will XOR
the ‘key’ against the ‘plain text’ and output the resulting ‘cipher text’ to a file. We will run into some problems along the way and add some helper methods. Then we will look at pack
and unpack
as potential ways of working with the data.
Hope this was somewhat helpful, and if you have any comments or feedback please let me know. You can view the source code hosted on github.
See part 2 of our simple xorer.