Amazon introduced the concept of S3 object storage to a wide-range of users. Their interface is the defacto-standard to store files in web applications. Nowadays, it is used by other vendors as well. Ceph and RiakCS are some examples, where the same interface is available. This blog post will setup s3cmd with Ceph radosgw.
It is used as an interface for distributed storage due to the fact that the only thing you need to put and retrieve files is http. It is very handy to have a clear interface to manage files across different machines. For example you may want to backup your local files to a s3 cluster. This requires a toolset that enables you to easily store and retrieve s3 files from command line. s3cmd is a great tool for just that. Out of the box it works with Amazon S3, but can be easily configured with radosgw, which is commonly used in conjunction with OpenStack.
Before we start implementing the base set, you need to install
s3cmd on you machine.
For Ubuntu you need to install the s3 repo before installing the package
# install repo key wget -O- -q http://s3tools.org/repo/deb-all/stable/s3tools.key | sudo apt-key add - # install repo sudo wget -O/etc/apt/sources.list.d/s3tools.list http://s3tools.org/repo/deb-all/stable/s3tools.list # update package manager sudo apt-get update # install s3cmd sudo apt-get install s3cmd
On Mac you may use brew to install s3cmd:
# install s3cmd brew install s3cmd
S3 uses a different terminology, compared to posix filesystems:
- bucket: kind of directory
- object: similar to a file
Although Amazon S3 allows you to create a
directory within a bucket, it will not be a real directory. In S3, a directory is more like a file prefix. Take this in consideration when you upload huge amounts of data.
In general it is very efficient to use prefixes for searches instead of postfixes. E.g. if you look for file types, you may store your files as
txt.file instead of
Basic commands for
# S3cmd help s3cmd --help # Display all buckets s3cmd ls # Create new bucket s3cmd mb s3://YOURBUCKET # Upload new file s3cmd put test.txt s3://YOURBUCKET # Download file s3cmd get s3://YOURBUCKET/test.txt test.txt
Configuration for radosgw
s3cmd is optimized for Amazon S3, you need to apply a few changes to convince
s3cmd to talk with
radosgw. At first we create a new configuration with s3cmd via interactive mode:
s3cmd --configure -c s3test.cfg Enter new values or accept defaults in brackets with Enter. Refer to user manual for detailed description of all options. Access key and Secret key are your identifiers for Amazon S3 Access Key: key Secret Key: secret Encryption password is used to protect your files from reading by unauthorized persons while in transfer to S3 Encryption password: Path to GPG program [/usr/local/bin/gpg]: When using secure HTTPS protocol all communication with Amazon S3 servers is protected from 3rd party eavesdropping. This method is slower than plain HTTP and can't be used if you're behind a proxy Use HTTPS protocol [No]: yes New settings: Access Key: key Secret Key: secret Encryption password: Path to GPG program: /usr/local/bin/gpg Use HTTPS protocol: True HTTP Proxy server name: HTTP Proxy server port: 0 Test access with supplied credentials? [Y/n] n Save settings? [y/N] y Configuration saved to 's3test.cfg'
The generated configuration file will look similar to:
[default] access_key = abc bucket_location = US cloudfront_host = cloudfront.amazonaws.com cloudfront_resource = /2010-07-15/distribution default_mime_type = binary/octet-stream delete_removed = False dry_run = False encoding = UTF-8 encrypt = False follow_symlinks = False force = False get_continue = False gpg_command = /usr/local/bin/gpg gpg_decrypt = %(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s gpg_encrypt = %(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s gpg_passphrase = guess_mime_type = True host_base = s3.amazonaws.com host_bucket = %(bucket)s.s3.amazonaws.com human_readable_sizes = False list_md5 = False log_target_prefix = preserve_attrs = True progress_meter = True proxy_host = proxy_port = 0 recursive = False recv_chunk = 4096 reduced_redundancy = False secret_key = secret send_chunk = 4096 simpledb_host = sdb.amazonaws.com skip_existing = False socket_timeout = 300 urlencoding_mode = normal use_https = True verbosity = WARNING
Afterwards you may try to access your bucket with
s3cmd -c s3test.cfg ls ERROR: S3 error: 403 (InvalidAccessKeyId): The AWS Access Key Id you provided does not exist in our records.
We need to adapt a few lines to configure s3cmd for radosgw. Currently
s3cmd does not know the radosgw hostname. Replace
host_bucket with the hostname of the radosgw host:
host_base = s3.amazonaws.com host_bucket = %(bucket)s.s3.amazonaws.com
host_base = radosgw.example.com host_bucket = %(bucket)s.radosgw.example.com
Now you should be able to run
s3cmd -c s3test.cfg ls 2014-09-06 12:05 s3://YOURBUCKET
Sidenote: I had a few issues using lowercase bucket names with
radosgw. Uppercase buckets worked like a charm.
Encryption with gpg
If we talk about backups, we never want to store them unencrypted.
s3cmd supports gpg. The cool thing of using s3cmd in conjunction with gpg is the fact that it works completely transparent. Once configured, all files are automatically encrypted during upload and decrypted whenever you download a file.
To activate encryption change the
s3cmd configuration file:
encrypt = True gpg_command = gpg gpg_decrypt = %(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s gpg_encrypt = %(gpg_command)s -c --cipher-algo AES256 --force-mdc --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s gpg_passphrase = longsecurepassphrase
At first you have to enable encryption with
encrypt = True. Although the default configuration uses
gpg_command = /usr/local/bin/gpg, I prefer to use
gpg_command = gpg. Now, you have to set
gpg in your system path, but on the other hand, the configuration works on Linux and Mac without any change. Additionally you want to ensure that gpg uses the right cipher. I prefer to set the cipher explicitly:
--cipher-algo AES256. Although default with AES256, the option
--force-mdc forces the use of encryption with a modification detection code.
s3cmd works with
radosgw and encryption.
$> cat > test.txt samplecontent $> s3cmd put test.txt s3://YOURBUCKET test.txt -> s3://YOURBUCKET/test.txt [1 of 1] 4 of 4 100% in 0s 7.51 B/s done $> s3cmd get s3://YOURBUCKET/test.txt test2.txt s3://YOURBUCKET/test.txt -> test2.txt [1 of 1] 4 of 4 100% in 0s 5.74 B/s done $> s3cmd del s3://YOURBUCKET/test.txt File s3://YOURBUCKET/test.txt deleted $> cat test2.txt samplecontent