Custom Source Control: Refactoring
This is a continuation of my last post on implementing a “custom source control” application in ruby. In that post we built a custom source control application based on “Source Control Made Easy” by Jim Weirich.
Before starting you may want to checkout the repository for the application on github. You can then checkout the part-1-complete release. Or if you’d rather just down a zip file containing the source for custom source control: part-1-complete. Let’s refactor…
Recall our initial stories from the previous post:
Initialize a new repository
As a user
I want the ability to initialize a repository
So that I can begin adding snapshots to it
Create snapshots of my work
As a user
I want to create snapshots
So that I can save snapshots of my work
Checkout previous snapshots
As a user
I want to checkout a previous snapshot
So that I can fix issues or get back to a working state
I also left you with a list of some stuff I wanted to address in future posts. Lets take another look at what I will be working on now:
- Separating the tests from the actual implementation code.
- DRY’ing out our code.
So without further ado…
Separating the tests from the actual implementation code.
The first thing that we’ll want to do is separate the tests from the application code. Start out by changing directories into the custom_source_control directory.
cd custom_source_control
Next lets create a directory for our code and another for our tests (which going forward I may refer to them as specs).
mkdir lib
mv custom_source_control.rb lib
mkdir specs
mv test_dir specs
Our directory structure should now look like the following:
├── lib/
│ └── custom_source_control.rb*
└── specs/
└── test_dir/
├── test_file_1.txt
└── test_file_2.txt
Now we’re going to create a Rake file. Rake is a build system that we will use from amongst other things, running our specs. I’d like to mention that Rake is another tool brought to us by Jim Weirich.
require 'rake/testtask'
Rake::TestTask.new do |t|
t.libs << ['lib', 'specs']
t.test_files = FileList['specs/*_spec.rb']
t.verbose = true
end
So what are we doing here? Basically we are making it easy to run our specs. So we’ll be able to do: rake test
for running our tests, and something else to run the application. We’re now going to create a spec helper file that will require our application code as well as the minitest library.
touch specs/helper.rb
require 'minitest/autorun'
require 'custom_source_control'
Now let’s move everything in lib/custom_source_control.rb
from the require 'minitest/autotest'
line down, to a new file specs/custom_source_control_spec.rb
. At the top of the file we need to change the require 'minitest/autotest'
to require 'helper'
. We also need to change the before
block so that as part of the testing, we change directory to the test_dir
directory.
require 'helper'
describe CustomSourceControl do
...
before do
repo_dir = File.join(File.dirname(__FILE__), 'test_dir')
Dir.chdir repo_dir
@csc = CustomSourceControl.new
@csc.initialize_repository
end
...
end
At this point we should be able to run our tests with rake test
, and everything should be passing.
rake test
# Running tests:
..............
Finished tests in 0.107780s, 129.8942 tests/s, 194.8413 assertions/s.
14 tests, 21 assertions, 0 failures, 0 errors, 0 skips
DRY’ing out our code.
Next up on our list… DRY or Don’t repeat yourself, Basically where we find we have code doing a very similar job is a potential candidate. Doing a visual scan of the code and I notice a few control_file_exists?
methods. These methods serve as a convenient way to check for the existence of some of the mandatory repository files. One way to DRY this functionality up is to use the define_method
method.
So I start out by refactoring the manifest_exists?
and metadata_exists?
methods into:
['manifest', 'metadata'].each do |ctl_file|
define_method("#{ctl_file}_exists?") do
File.exists? File.join('.esc', "__#{ctl_file}__")
end
end
What we’re doing is creating an array taking the prefix of previous methods and iterating through that array. With each element we’re passing it in as an argument to the define_method
method. The method body should be pretty straightforward; we’re doing a File.join
on the repository directory, and the control files. Let’s run our tests to make sure everything still works.
# Running tests:
..............
Finished tests in 0.032827s, 426.4782 tests/s, 639.7173 assertions/s.
14 tests, 21 assertions, 0 failures, 0 errors, 0 skips
Why stop there? Let’s add head_exists?
.
['__manifest__', '__metadata__', 'HEAD'].each do |ctl_file|
define_method("#{ctl_file.downcase.gsub('_', '')}_exists?") do
File.exists? File.join('.esc', ctl_file)
end
end
So in refactoring head_exists?
we had to turn the array of method names into an array of filenames. This would create methods __manifest___exists?
, __metadata___exists?
& HEAD_exists?
. Which is not what we want. So we simply add a call to the String#downcase
method which properly creates the head_exists?
method. We also add a call to String.gsub
to handle removing the _
’s. The last thing is to change the File.join
and remove the _
’s as well.
(retest)
# Running tests:
..............
Finished tests in 0.036157s, 387.2003 tests/s, 580.8004 assertions/s.
14 tests, 21 assertions, 0 failures, 0 errors, 0 skips
The next bit of code has to do with the cwd_hashes
method.
def cwd_hashes
sha1 = OpenSSL::Digest::SHA1.new
hashes = {}
cwd_files.each do |file|
hashes[file] = sha1.hexdigest(File.read(file))
sha1.reset
end
hashes
end
def hash_for_file(file = nil)
sha1 = OpenSSL::Digest::SHA1.new
sha1.hexdigest(File.read(file))
end
It instantiates a new SHA1 object, then iterates through the current working directories files and hashes them, assigning the hash to… well our hashes hash. I am tempted to refactor the code to something like the following:
def cwd_hashes
cwd_files.inject({}) { |hash, file| hash[file] = hash_for_file(file); hash }
end
Which passes our tests, but I wonder if we take a performance hit since we’re now creating multiple instances of a SHA1 object. It would be nice if we could benchmark this somehow… Well we can! First let’s add the following code:
def cwd_hashes
cwd_files.inject({}) { |hash, file| hash[file] = hash_for_file(file); hash }
end
alias_method :refactor_cwd_hashes, :cwd_hashes
def initial_cwd_hashes
sha1 = OpenSSL::Digest::SHA1.new
hashes = {}
cwd_files.each do |file|
hashes[file] = sha1.hexdigest(File.read(file))
sha1.reset
end
hashes
end
Add the following to to the bottom of the file:
def benchmark_cwd_hashes
repo_dir = File.join(File.dirname(__FILE__), '../', 'specs', 'test_dir')
Dir.chdir repo_dir
10000.times do |idx|
File.open("file_#{idx}", 'w') do |f|
f.write "I am file #{idx}"
end
end
@csc = CustomSourceControl.new
@csc.initialize_repository
@csc.snapshot
require 'benchmark'
Benchmark.bmbm do |x|
x.report("initial:") { @csc.initial_cwd_hashes }
x.report("refactor:") { @csc.refactor_cwd_hashes }
end
if Dir.exists? '.esc'
FileUtils.rm_rf '.esc'
FileUtils.rm Dir.glob('file_*')
end
end
if __FILE__ == $0
benchmark_cwd_hashes if ARGV[0] == '--benchmark'
end
Everything wrapping the benchmark stuff is the same setup code we’re using in our tests.
Next let’s add the following to our rake file:
task :benchmark do
system('ruby', "-Ilib", 'lib/custom_source_control.rb', '--benchmark', out: $stdout, err: :out)
end
Run it with:
rake benchmark
It should output results similar too:
Rehearsal ---------------------------------------------
initial: 0.140000 0.130000 0.270000 ( 0.265694)
refactor: 0.160000 0.120000 0.280000 ( 0.279160)
------------------------------------ total: 0.550000sec
user system total real
initial: 0.120000 0.120000 0.240000 ( 0.239919)
refactor: 0.170000 0.120000 0.290000 ( 0.284348)
As you can see, not much of a difference.
The next pieces of code are the hash_and_copy_manifest
& hash_and_copy_metadata
methods.
['manifest', 'metadata'].each do |method|
define_method("hash_and_copy_#{method}") do
file = File.join('.esc', "__#{method}__")
hash = hash_for_file(file)
FileUtils.cp(file, File.join('.esc', hash))
hash
end
end
(retest)
# Running tests:
..............
Finished tests in 0.035168s, 398.0892 tests/s, 597.1338 assertions/s.
14 tests, 21 assertions, 0 failures, 0 errors, 0 skips
There is not a whole lot going on in the initialize_repository
method, but in snapshot
we can make a few changes.
def snapshot
File.new(File.join('.esc', '__metadata__'), 'w')
File.new(File.join('.esc', '__manifest__'), 'w')
write_manifest
...
write_metadata manifest_hash
...
end
The creation of the metadata & manifest files doesn’t need to happen because we do a File.open
and write to them within the write_manifest
& write_metadata
methods.
# Running tests:
..............
Finished tests in 0.053743s, 260.4990 tests/s, 390.7486 assertions/s.
14 tests, 21 assertions, 0 failures, 0 errors, 0 skips
The next thing to fix is the fact that metadata timestamp was hardcoded to 2014-03-07 23:59:59 -0800
. We can remove that line of code, but the tests will fail again. So what can we do? Stub the Time.now
method. If you’re not on ruby 2 or higher, you are going to need to take a few extra steps. Create a Gemfile
file and add the following:
source 'https://rubygems.org'
gem 'minitest'
bundle install
Now we update helper.rb
with the following:
require 'minitest/autorun'
require 'minitest/mock'
require 'custom_source_control'
in custom_source_control.rb
:
def write_metadata(manifest_hash = nil)
File.open(File.join('.esc', '__metadata__'), 'w') do |file|
file.puts "Snapshot Manifest: #{manifest_hash}"
file.puts "Snapshot Parent: #{(head_contents.empty?) ? 'root' : head_contents}"
file.puts "Snapshot Taken: #{Time.now}"
end
end
in custom_source_control_spec.rb
, wrap the stuff inside our before
blocks in:
before do
Time.stub :now, '2014-03-07 23:59:59 -0800' do
...
end
end
(retest)
# Running:
..............
Finished in 0.046209s, 302.9713 runs/s, 454.4569 assertions/s.
14 runs, 21 assertions, 0 failures, 0 errors, 0 skips
I left off in the last post with a question: How do I use this thing now that it’s built?. I left off with a suggestion on how to make the script runable. Now I’d like to update the script and add this functionality. Open up custom_source_control.rb
and add the following to the very bottom.
if __FILE__ == $0
unless ['--benchmark', '--checkout', '--initialize', '--snapshot'].include? ARGV[0]
puts "#{ARGV[0]} is not a subcommand."
exit 1
end
csc = CustomSourceControl.new
case ARGV[0]
when '--benchmark'
benchmark_cwd_hashes
when '--checkout'
if ARGV[1]
csc.checkout ARGV[1]
else
puts "'checkout' subcommand takes a second argument, SHA1 of the metadata file to checkout."
exit 1
end
when '--initialize'
csc.initialize_repository
when '--snapshot'
csc.snapshot
end
end
Lastly, what I’ve done is symlink the file into my /usr/local/bin
directory as csc
ln -s $HOME/developer/projects/custom_source_control/lib/custom_source_control.rb /usr/local/bin/csc
export PATH=$PATH:/usr/local/bin
Still, more to do:
- We don’t really clean up after ourselves so that functionality needs to be added.
- Getting the checkout hash is also a manual process so that
csc log
functionality we talked about would come in handy. - We are not handling any types of errors mind you
- Testing for more edge cases, and fixing any bugs we find.
- Adding code coverage.
- Adding the ability to handle command line arguments with
OptionParser
. - Adding tests and functionality to diff checkins.
- Adding tests and functionality to list the history (metadata file hashes from head all the way back to root)
- Possibly turning this into a gem.
…so, again, its not quite production ready.
You can double check your work with the project I have hosted on github