Control download with Net::HTTP::Stats

Download a file is a source of problems. Host can be down, bandwith can be too slow, file can be too big, download takes too much time … This is an important point for a program which download many files. Furthemore it’s often useful to notice users how much of a file is downloaded and when it will be finished.

That’s why I wrote a little module named Net::HTTP::Stats. This module count the number of bytes read, and then set correctly some variables like rate, estimated left time, percent of bytes read and so on. I used this module for a web bot, which downloads many many web pages, and it’s working really well.

The following code download Ruby and print the percent of download, the rate and the estimated remaining time.

require 'net/http'
require 'net/http/stats'
content = ''
uri = URI.parse('http://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.6.tar.gz')
response = Net::HTTP.get_response_with_stats(uri) do |resp, bytes|
  content << bytes
  puts "#{resp.bytes_percent}% downloaded at #{(resp.bytes_rate / 1024).to_i} Ko/s, remaining #  {resp.left_time.to_i} seconds"
end

This script will print something like:

13% downloaded at 59 Ko/s, remaining 65 seconds

The module Net::HTTP::Stats add the get_response_with_stats method. It works like get_response. However the block have a 2nd argument which contains bytes. It’s not possible to get bytes via response.read_body, because bytes have been already read to set the stats.

From this work it’s easy to write rules:

That’s why I wrote a 2nd method called get_response_with_rules. Rules are stored in a hash. Like get_response_with_stats the block takes 2 same arguments.

rules = {
  # Download is interrupted if spent time is greater (in sec).
  :max_time => 5 * 60,
  # Download is interrupted if estimated time is greater (in sec).
  :max_left_time => 5 * 60,
  # Download is interrupted if body is greater (in byte).
  :max_size => 50 * 1024 * 1024,
  # Wait some time before checking max_time and max_left_time (in sec).
  # something between 5 and 20 seconds should be good.
  :min_time => 15
}
content = ''
uri = URI.parse('http://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.6.tar.gz')
response = Net::HTTP.get_response_with_rules(uri, rules) do |resp, bytes|
  content << bytes
end

To use Net::HTTP::Stats you have to copy the « lib » folder into your project or any location pointed by your $LOAD_PATH. Any feed back is welcome.

Download Net::HTT::Stats.


Home