Why HTTP Streaming?
Posted by fxn April 18, 2011 @ 03:23 PM
Rails 3.1 is going to support HTTP streaming, aka chunked responses, this post explains what's all about.
What Is HTTP Streaming?
Ordinary dynamic HTTP responses need a Content-Length header. Their timeline look like this:
HTTP request -> dynamic content generation -> HTTP response
Those are three serial steps because normally you need to generate the content in order to be able to know its size, and thus fill the Content-Length header of the response.
HTTP provides an alternative to this schema to be able to flush data as it is produced, known as chunked transfer encoding. That's what we are referring to as streaming in recent commits.
Streamed responses have no Content-Length header. Rather, they have a Transfer-Encoding header with a value of "chunked", and a body consisting of a series of chunks you write to the socket preceded by their individual sizes. Modulus details.
This is an example taken from Wikipedia:
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0
Point is, you are able to flush chunks to the socket as soon as you have them, no need to wait for the whole thing to be generated.
When Do Web Browsers Fetch Assets?
Web broswers parse documents as their content is received. When they find an asset referenced, think an image, stylesheet, or script, a request to fetch them is fired. That happens in parallel while the document is being received and processed, no matter whether the content comes chunked or not.
Browsers have limits on the number of concurrent requests they are allowed to do, a global one (typically +30), and another per domain (nowadays typically 4 or 6), but within those limits, requests for getting assets happen as the content is parsed.
Modern clients do not even block on JavaScript files as old ones did, they implement scanners that look ahead for asset nodes and request them. For example, this is the preload scanner of WebKit.
Trivia: While investigating this I discovered by accident that if the MIME type is unclear, for example "text/html" without an explicit charset, then web browsers buffer 1 KB of data firing no asset requests to be able to peek at the content and do an educated guess.
So What's The Benefit Of Streaming?
Streaming doesn't cut latency, neither it cuts the time a dynamic response needs to be generated. But since the application sends content right away instead of waiting for the whole response to be rendered, the client is able to request assets sooner. In particular, if you flush the head of an HTML document CSS and JavaScript files are going to be fetched in parallel, while the server works on generating content. The consequence is that pages load faster.
Followup
Streaming is still being polished for Rails 3.1, expect another post in the future covering its practical aspects in Ruby on Rails applications.
Thanks
Tony Gentilcore provided his insider's guidance into this, thank you very much Tony! Also, thanks a lot to the Browserscope project for their really useful tables.

So would this basically have the same result as using different asset domains, but this time serving it all from a single domain?
As in, loading js files from js.domain.com, images from i.domain.com and the main page from www.domain.com?
@Jean, no. Using different hosts lets you bypass the per-domain limit on parallel asset downloads. Streaming the output lets you send the head (which could take, for example, 50ms to generate) to the client as soon as it is done instead of waiting for the rest of the page to be generated (which could be, for example, 500ms). The browser could then start downloading the assets half a second earlier than if the server buffered the whole page and sent it in 1 piece.
@Jean,
Not really, the different asset domains is a trick to circumvent hitting the limit on “simultaneous requests per domain” Xavier mentioned in the post.
By having your assets in different domains you load more than 4 assets at a time.
The idea behind streaming is also about having more parallel requests, in this case you will start requesting the assets before the webpage has finished loading.
Will this conflict with the If-None-Match handling that current rails branches use? If rails starts sending the response before it’s complete, presumably it won’t be able to perform the etag checksum and avoid sending a stale response.
It is good to enforce that by using this feature, Rails won’t be able to deal with errors the same way it works currently, since it has already sent some data before the error has happened…
I meant: “It is good to mention…”
Also note that this feature will work only with ruby 1.9 .
So how’s this going to work with layouts? Isn’t the block content rendered first, then the layout template? That means the only content you’ll be streaming that hasn’t been rendered yet will really be the bottom of the layout, right?
For smooth parallel JavaScript loading I recommend http://headjs.com/. I use it in production and I am very happy with that.
Indeed, streaming comes with its own restrictions. And it has a switch, you will be able to stream some actions and have regular responses for others.
It is implemented using fibers, conceptually content is generated in a more lineal way, top-down, rather than inside-out. That’s new, but it is necessary to be able to flush head before you render the template. For example, template ivars for the layout are out (that doesn’t affect controller ivars).
Streaming is still being implemented, when it is more stable that followup post I announced will cover these practical aspects.
Should “MIME time” be “MIME type” ?
@Guoliang thanks, fixed :).
thanks a lot
Is it me or does this http-streaming work exactly against what one of the things that made the rails 2 release so great: rack-interface?
Is my understanding correct that if your using chunked responses along with compression that you loose most of the benefit of chunked responses since the web server will wait for the entire response in order to compress? If that is indeed the case won’t most people see very little benefit in their app? Or is it the case that the real benefit for chunked responses is in delivery of larger binary type responses?
@Jared my understanding is that web servers compress on the fly. All parts should play well together in the final solution and recommended server setups.
Should the size of the “This is the data” chunk be 35? And 1A for “and this is the second one”?
@r chunk sizes use base 16. The exact number, byte more, byte less, depends on the newlines.
From what I remember “Chunked” is the default mode of PHP, and it has worked ok there.
just we know,your website is useful to many people.
Nice work.
It worth mentionning that the previously good practice that say one should put its js tags before </body> will become a bad one.
We include js just before body to avoid old browsers to wait for a script to be downloaded before fetching other assets (they won’t parallel download anything if they are downloading a .js).
With that new way, I guess we’ve better to put them in <head>, because they’ll have a chance to be downloaded before content start to be generated, thus there will be less latency between the start of content generation and the domready event.