Thursday, January 31, 2013

Versioning of objects in S3



Hi!  


Today I want to share the results of my study in the sphere of versioning objects in S3. I heard about this feature long ago, but didn’t know much. Even now I don’t know particularly a lot, and will explain only how to enable versioning in the bucket and get different versions of objects.  

My study shows that this function is not in a great demand, and there are no ready-to-use free utilities for versioning. But there are libraries which have this functionality. Something like “If you need it, develop it yourself”.

So, to enable versioning, first we need to switch it on in the bucket. There are two possible ways to do this, and they both are based on API.

·         direct REST/SOAP request
·         via a library

I will use aws-sdk Ruby library, which we will install:

$ gem install aws-sdk

Then go to Ruby console:

$ irb

Next authorize and enable versioning for the bucket:

require 'aws-sdk'

s3 = AWS::S3.new(
    :access_key_id => ENV['AMAZON_ACCESS_KEY_ID'],
    :secret_access_key => ENV['AMAZON_SECRET_ACCESS_KEY']
)
my_bucket=s3.buckets['epamcccctesting']
my_bucket.enable_versioning

I think you are aware of the AMAZON_ACCESS_KEY_ID and AMAZON_SECRET_ACCESS_KEY environment variables, so we will not pay too much attention to them.

So, in the console we will see that the versioning is enabled:
But, how does the versioning work? Quite simply. During an attempt to substitute a file, AWS doesn’t replace it but assigns a new version, which it specifies in the POST request headers, or in the library parameters.

The versions look like the following:

x-amz-version-id: mHYT.SyFXgHoG6xCy5yQVk6n6riJct4u
x-amz-version-id: .KSpevNIkZSgBoCz4vU3iTBttGWXWqIc

After that you can refer to the required file version needed via GET request, by specifying versionId in the GET-parameters. Without this indication we will get the latest version of the file.

Example: I uploaded three file versions to the bucket and received different versions in the headers. You can get these versions by clicking these links:


Everything is pretty easy. To delete the file also use the versionId  parameter.

In general, it is clear that such functionality exists, but it is unclear why it is not widely used up to now, and there is no standard and easy-to-use implementation for CLI. For example, this would be convenient to store back-ups. Also you can find a dozen of other examples where file versioning would be a convenient and simple solution.

Maybe you use versioning somewhere? Can you share anything interesting?

UPD: For example, displaying dates and size of object versions:

my_bucket=s3.buckets['epamcccctesting']
file = my_bucket.objects["file"]

file.versions.each do |version|
       puts version.head[:last_modified].to_s + '   ' + version.version_id + '    ' + version.head[:content_length].to_s + ' bytes'
end

2012-12-20 16:15:11 +0200   NQc0gba0nv6znIfSHRaxR0fT3I.ZaUQ5    4 bytes
2012-12-20 16:14:52 +0200   s73raBjbDF2pZpQT9o4qPu4Yn0piy1wL    3 bytes
2012-12-20 16:13:59 +0200   6Txnrqbcb4LaXo2MGYP9gn61Em0UIrUq    2 bytes

And since recently you can enable versioning via console:

No comments:

Post a Comment