<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Till Death Do Us Part: Tomcat, Weak ETags, and JavaScript/CSS Caching</title>
    <link>http://blog.bcarlso.net/articles/2007/10/19/tomcat-weak-etags-and-javascript-css-caching</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Thoughts around my marriage to technology</description>
    <item>
      <title>Tomcat, Weak ETags, and JavaScript/CSS Caching</title>
      <description>Being known as the HTTP/REST guy in the company, I was pulled into an interesting conversation about CSS/JavaScript caching issues this week and was fortunate enough to learn a couple of things on the way. It seems to be a common (anti)pattern in the Java world to consistently fight JavaScript browser caching issues by simply adding a query parameter to your script tags:

&lt;pre&gt;
&amp;lt;script type="text/javascript"
        src="myscript.js?version=&amp;lt;%= Application.VERSION %&amp;gt;"&amp;gt;
&amp;lt;/script&amp;gt;
&lt;/pre&gt;

I've seen (and used) a number of variations on the same theme, including going so far as to create a JSP custom tag for more advanced schemes.

The obvious solution is to use the &lt;i&gt;Last-Modified&lt;/i&gt; header with the timestamp of the file. According to the &lt;a href="http://www.ietf.org/rfc/rfc2616.txt"&gt;spec&lt;/a&gt;, the client can utilize this value to create a conditional GET request by adding the &lt;i&gt;If-Modified-Since&lt;/i&gt; header. If we look at all of the major browsers, they dutifully follow this pattern by sending the If-Modified-Since header the next time the resource is requested.

The "workflow" goes something like this:

&lt;pre&gt;
GET /some-resource.html

HTTP/1.1 200 OK
Last-Modified: Wed, 26 Sep 2007 04:58:08 GMT

&amp;lt;html&amp;gt;
   &amp;lt;head&amp;gt;&amp;lt;title&amp;gt;Some Resource&amp;lt;/title&amp;gt;&amp;lt;/head&amp;gt;
   &amp;lt;body&amp;gt;&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
&lt;/pre&gt;

The next time the browser asks for the file and the file remains unchanged:

&lt;pre&gt;
GET /some-resource.html
If-Modified-Since: Wed, 26 Sep 2007 04:58:08 GMT

HTTP/1.1 304 Not Modified
Last-Modified: Wed, 26 Sep 2007 04:58:08 GMT
&lt;/pre&gt;

Note the use of the 304 status code and no message body. This is an indication to the client that it is free to use the cached version of the resource.

What if the file has changed? This is as simple as returning the content with an updated Last-Modified date as seen below:

&lt;pre&gt;
GET /some-resource.html
If-Modified-Since: Wed, 26 Sep 2007 04:58:08 GMT

HTTP/1.1 200 OK
Last-Modified: Thu, 27 Sep 2007 05:00:00 GMT

&amp;lt;html&amp;gt;
   &amp;lt;head&amp;gt;&amp;lt;title&amp;gt;Some Updated Resource&amp;lt;/title&amp;gt;&amp;lt;/head&amp;gt;
   &amp;lt;body&amp;gt;&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
&lt;/pre&gt;

Now that the server has returned the updated resource, the client should use update its caches with the latest version and Last-Modified information.

Easy huh? Well you'd think so... This works fine on most browsers, and unfortunately it doesn't work quite as you would expect in IE. Using &lt;a href="http://www.fiddlertool.com/fiddler/"&gt;Fiddler&lt;/a&gt; you can track what's actually going on and see that IE ignores the 200 + content returned via the conditional GET and takes the version of the resource from cache anyway!

This behavior in IE is, in my experience, the cause of many of our caching woes. Fortunately, there is a lesser known cousin to Last-Modified that IE supports pretty well. It's a HTTP header known as &lt;i&gt;ETag&lt;/i&gt; (Entity tag).

ETags are also used to identify whether a resource has changed, and can be created a number of ways, including taking a hash of the response body or serializing the Last-Modified timestamp.

The same workflow is used for ETag processing, but with a couple of different headers:

&lt;pre&gt;
GET /some-resource.html

HTTP/1.1 200 OK
ETag: "1234567890"

&amp;lt;html&amp;gt;
   &amp;lt;head&amp;gt;&amp;lt;title&amp;gt;Some Resource&amp;lt;/title&amp;gt;&amp;lt;/head&amp;gt;
   &amp;lt;body&amp;gt;&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
&lt;/pre&gt;

&lt;pre&gt;
GET /some-resource.html
If-None-Match: "123456789"

HTTP/1.1 304 Not Modified
ETag: "1234567890"
&lt;/pre&gt;

&lt;pre&gt;
GET /some-resource.html
If-None-Match: "123456789"

HTTP/1.1 200 OK
ETag: "0987654321"

&amp;lt;html&amp;gt;
   &amp;lt;head&amp;gt;&amp;lt;title&amp;gt;Some Updated Resource&amp;lt;/title&amp;gt;&amp;lt;/head&amp;gt;
   &amp;lt;body&amp;gt;&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
&lt;/pre&gt;

Notice, same workflow, different headers. The difference in this case is that IE handles the 200 as expected, replacing the cached version with the new content and updating the ETag metadata in the cache for this resource. So to properly handle caching in IE all we have to do is set the ETag for JavaScript files. But how do we do that...

Well, the title of the post mentioned Tomcat, and this is where we actually talk about it. As it turns out, there's a "dark side" to ETag processing. Something called a &lt;i&gt;Weak ETag&lt;/i&gt;. Weak ETags are prefaced with a "W/" and would look like this from our above example:

&lt;pre&gt;
ETag: W/"1234567890"
&lt;/pre&gt;

The notion of a "Weak" ETag as it states in the &lt;a href="http://www.ietf.org/rfc/rfc2616.txt"&gt;spec&lt;/a&gt; is

&lt;blockquote&gt;
a weak value changes whenever the meaning of an entity changes
&lt;/blockquote&gt;

As I interpret it, let's say that you're downloading a Java source file via HTTP. You could take a hash of the program, excluding comments and whitespace and return this as a Weak ETag. Subsequent updates to the comments or formatting of the document would not change the actual "meaning" of the returned result. If the code itsself changed, however a new Weak ETag would be generated and returned. Weak ETags are not, as far as I can tell, very well supported by browsers.

The problem is that Tomcat shows loyalties to the dark side when it comes to static content. Tomcat's FileDirContext class does not populate the ETag for static content, leaving the decision about an ETag to DefaultServlet. DefaultServlet simply generates a Weak ETag (by concatenating the content length and the last modified time in milliseconds), sending it back to the browser to basically be ignored.

In my quest to figure out how to prevent these cache problems I turned to Google and the Tomcat source for help. I was hoping to find a configuration setting to prevent the Weak ETag behavior I was seeing for static content but turned up nothing. Instead I found a little gem hiding in the &lt;i&gt;context.xml&lt;/i&gt; configuration file.

&lt;b&gt;The Resources Element&lt;/b&gt;

As it turns out, you can configure your own context for serving static content using the &lt;i&gt;Resources&lt;/i&gt; element in context.xml It looks like this:

&lt;pre&gt;
&amp;lt;context&amp;gt;
     &amp;lt;Resources className="org.example.StrongETagDirContext" /&amp;gt;
     ...
&amp;lt;/context&amp;gt;
&lt;/pre&gt;

I extended the FileDirContext class and overrode the &lt;i&gt;getAttributes()&lt;/i&gt; method:

&lt;pre&gt;
public Attributes getAttributes() {
   ResourceAttributes r = (ResourceAttributes) super.getAttributes();

   int cl = r.getContentLength();
   long lmt = r.getLastModifiedTime();
   
   String strongETag = String.format("\"s%-s%\"", cl, lmt);
   r.setETag( strongETag );
}
&lt;/pre&gt;

This associates a strong ETag (instead of Tomcat's default Weak ETags)  with each static resource served up. Now we have a conditional GET request that behaves well in all browsers and we can get rid of those hacks we've been using forever. I'm not sure Tomcat is doing the right thing with using Weak ETags by default, and I'll probably post some of these comments to the Tomcat mailing list for consideration, but for now I've got caching behaving as I would expect.

</description>
      <pubDate>Fri, 19 Oct 2007 22:49:00 -0700</pubDate>
      <guid isPermaLink="false">urn:uuid:81add26d-9c8d-4073-bc9f-978bea586fc2</guid>
      <author>Brandon Carlson</author>
      <link>http://blog.bcarlso.net/articles/2007/10/19/tomcat-weak-etags-and-javascript-css-caching</link>
      <category>tomcat</category>
      <category>caching</category>
    </item>
    <item>
      <title>"Tomcat, Weak ETags, and JavaScript/CSS Caching" by Brandon</title>
      <description>@ptys Good point.

From the same spec: "Entity tags are used for comparing two or more entities from the same requested resource".
&lt;blockquote&gt;
It's very likely for two entirely different files
to be of same size and modified time.
&lt;/blockquote&gt;

True. But two entirely different files to be of the same size and modified time for a given resource? I think that's a bit less likely. I chose to usurp TC's ETag implementation as I deemed it "Good Enough".
&lt;br /&gt;

Thanks for the clarification.
</description>
      <pubDate>Mon, 05 Jan 2009 16:46:14 -0700</pubDate>
      <guid isPermaLink="false">urn:uuid:c9333b89-ba3b-45c2-9a90-f0ccdba7f7e9</guid>
      <link>http://blog.bcarlso.net/articles/2007/10/19/tomcat-weak-etags-and-javascript-css-caching#comment-20</link>
    </item>
    <item>
      <title>"Tomcat, Weak ETags, and JavaScript/CSS Caching" by ptys</title>
      <description>...A "strong entity tag" MAY be shared by two entities of a resource only if they are equivalent by octet equality....

The way the etag is calculated in your example represents a weak etag, and what tomcat does by default. It's very likely for two entirely different files to be of same size and modified time.

You need to use some digest algorithm to claim a strong etag.</description>
      <pubDate>Mon, 05 Jan 2009 14:24:22 -0700</pubDate>
      <guid isPermaLink="false">urn:uuid:9750c9b6-6685-42d3-8d96-dabf7b6244fa</guid>
      <link>http://blog.bcarlso.net/articles/2007/10/19/tomcat-weak-etags-and-javascript-css-caching#comment-19</link>
    </item>
    <item>
      <title>"Tomcat, Weak ETags, and JavaScript/CSS Caching" by mmo</title>
      <description>Great tips. Just started looking into Rails and it has many great applications</description>
      <pubDate>Thu, 18 Sep 2008 20:36:20 -0700</pubDate>
      <guid isPermaLink="false">urn:uuid:645fec99-c642-41d6-b2f8-f7bd2501b015</guid>
      <link>http://blog.bcarlso.net/articles/2007/10/19/tomcat-weak-etags-and-javascript-css-caching#comment-18</link>
    </item>
    <item>
      <title>"Tomcat, Weak ETags, and JavaScript/CSS Caching" by Brandon</title>
      <description>Patrik,

We were running TC 5.x.</description>
      <pubDate>Sun, 15 Jun 2008 08:49:27 -0700</pubDate>
      <guid isPermaLink="false">urn:uuid:1586436d-0dec-40d0-bcd9-013a125ae3ce</guid>
      <link>http://blog.bcarlso.net/articles/2007/10/19/tomcat-weak-etags-and-javascript-css-caching#comment-13</link>
    </item>
    <item>
      <title>"Tomcat, Weak ETags, and JavaScript/CSS Caching" by Patrik</title>
      <description>Interesting post! Which version of Tomcat are you running on? - I tried the code myself but there is no method called getAttributes() . At least none that doesn't take any arguments.</description>
      <pubDate>Wed, 11 Jun 2008 08:36:54 -0700</pubDate>
      <guid isPermaLink="false">urn:uuid:1d4a9ab7-804b-44c7-97f9-8a683ca1421b</guid>
      <link>http://blog.bcarlso.net/articles/2007/10/19/tomcat-weak-etags-and-javascript-css-caching#comment-12</link>
    </item>
  </channel>
</rss>
