Senin, 20 Agustus 2012
The Problem: JavaScript Concurrency
There are a number of bottlenecks preventing interesting applications from being ported (say, from server-heavy implementations) to client-side JavaScript. Some of these include browser compatibility, static typing, accessibility, and performance. Fortunately, the latter is quickly becoming a thing of the past as browser vendors rapidly improve the speed of their JavaScript engines.
One thing that's remained a hindrance for JavaScript is actually the language itself. JavaScript is a single-threaded environment, meaning multiple scripts cannot run at the same time. As an example, imagine a site that needs to handle UI events, query and process large amounts of API data, and manipulate the DOM. Pretty common, right? Unfortunately all of that can't be simultaneous due to limitations in browsers' JavaScript runtime. Script execution happens within a single thread.
Developers mimic 'concurrency' by using techniques like setTimeout(), setInterval(), XMLHttpRequest, and event handlers. Yes, all of these features run asynchronously, but non-blocking doesn't necessarily mean concurrency. Asynchronous events are processed after the current executing script has yielded. The good news is that HTML5 gives us something better than these hacks!
Introducing Web Workers: Bring Threading to JavaScript
The Web Workers specification defines an API for spawning background scripts in your web application. Web Workers allow you to do things like fire up long-running scripts to handle computationally intensive tasks, but without blocking the UI or other scripts to handle user interactions. They're going to help put and end to that nasty 'unresponsive script' dialog that we've all come to love:
Unresponsive script dialog Common unresponsive script dialog.
Workers utilize thread-like message passing to achieve parallelism. They're perfect for keeping your UI refresh, performant, and responsive for users.
Types of Web Workers
It's worth noting that the specification discusses two kinds of Web Workers, Dedicated Workers and Shared Workers. This article will only cover dedicated workers and I'll refer to them as 'web workers' or 'workers' throughout.
Getting Started
Web Workers run in an isolated thread. As a result, the code that they execute needs to be contained in a separate file. But before we do that, the first thing to do is create a new Worker object in your main page. The constructor takes the name of the worker script:
var worker = new Worker('task.js');
If the specified file exists, the browser will spawn a new worker thread, which is downloaded asynchronously. The worker will not begin until the file has completely downloaded and executed. If the path to your worker returns an 404, the worker will fail silently.
After creating the worker, start it by calling the postMessage() method:
worker.postMessage(); // Start the worker.
Communicating with a Worker via Message Passing
Communication between a work and its parent page is done using an event model and the postMessage() method. Depending on your browser/version, postMessage() can accept either a string or JSON object as its single argument. The latest versions of the modern browsers support passing a JSON object.
Below is a example of using a string to pass 'Hello World' to a worker in doWork.js. The worker simply returns the message that is passed to it.
Main script:
var worker = new Worker('doWork.js');
worker.addEventListener('message', function(e) {
console.log('Worker said: ', e.data);
}, false);
worker.postMessage('Hello World'); // Send data to our worker.
doWork.js (the worker):
self.addEventListener('message', function(e) {
self.postMessage(e.data);
}, false);
When postMessage() is called from the main page, our worker handles that message by defining an onmessage handler for the message event. The message payload (in this case 'Hello World') is accessible in Event.data. Although this particular example isn't very exciting, it demonstrates that postMessage() is also your means for passing data back to the main thread. Convenient!
Messages passed between the main page and workers are copied, not shared. For example, in the next example the 'msg' property of the JSON message is accessible in both locations. It appears that the object is being passed directly to the worker even though it's running in a separate, dedicated space. In actuality, what is happening is that the object is being serialized as it's handed to the worker, and subsequently, de-serialized on the other end. The page and worker do not share the same instance, so the end result is that a duplicate is created on each pass. Most browsers implement this feature by automatically JSON encoding/decoding the value on either end.
The following is a more complex example that passes messages using JSON objects.
Main script:
doWork2.js:
self.addEventListener('message', function(e) {
var data = e.data;
switch (data.cmd) {
case 'start':
self.postMessage('WORKER STARTED: ' + data.msg);
break;
case 'stop':
self.postMessage('WORKER STOPPED: ' + data.msg + '. (buttons will no longer work)');
self.close(); // Terminates the worker.
break;
default:
self.postMessage('Unknown command: ' + data.msg);
};
}, false);
Note: There are two ways to stop a worker: by calling worker.terminate() from the main page or by calling self.close() inside of the worker itself.
Example: Run this worker!
The Worker Environment
Worker Scope
In the context of a worker, both self and this reference the global scope for the worker. Thus, the previous example could also be written as:
addEventListener('message', function(e) {
var data = e.data;
switch (data.cmd) {
case 'start':
postMessage('WORKER STARTED: ' + data.msg);
break;
case 'stop':
...
}, false);
Alternatively, you could set the onmessage event handler directly (though addEventListener is always encouraged by JavaScript ninjas).
onmessage = function(e) {
var data = e.data;
...
};
Features Available to Workers
Due to their multi-threaded behavior, web workers only has access to a subset of JavaScript's features:
The navigator object
The location object (read-only)
XMLHttpRequest
setTimeout()/clearTimeout() and setInterval()/clearInterval()
The Application Cache
Importing external scripts using the importScripts() method
Spawning other web workers
Workers do NOT have access to:
The DOM (it's not thread-safe)
The window object
The document object
The parent object
Loading External Scripts
You can load external script files or libraries into a worker with the importScripts() function. The method takes zero or more strings representing the filenames for the resources to import.
This example loads script1.js and script2.js into the worker:
worker.js:
importScripts('script1.js');
importScripts('script2.js');
Which can also be written as a single import statement:
importScripts('script1.js', 'script2.js');
Subworkers
Workers have the ability to spawn child workers. This is great for further breaking up large tasks at runtime. However, subworkers come with a few caveats:
Subworkers must be hosted within the same origin as the parent page.
URIs within subworkers are resolved relative to their parent worker's location (as opposed to the main page).
Keep in mind most browsers spawn separate processes for each worker. Before you go spawning a worker farm, be cautious about hogging too many of the user's system resources. One reason for this is that messages passed between main pages and workers are copied, not shared. See Communicating with a Worker via Message Passing.
For an sample of how to spawn a subworker, see the example in the specification.
Inline Workers
What if you want to create your worker script on the fly, or create a self-contained page without having to create separate worker files? With the new BlobBuilder interface, you can "inline" your worker in the same HTML file as your main logic by creating a BlobBuilder and appending the worker code as a string:
// Prefixed in Webkit, Chrome 12, and FF6: window.WebKitBlobBuilder, window.MozBlobBuilder
var bb = new BlobBuilder();
bb.append("onmessage = function(e) { postMessage('msg from worker'); }");
// Obtain a blob URL reference to our worker 'file'.
// Note: window.webkitURL.createObjectURL() in Chrome 10+.
var blobURL = window.URL.createObjectURL(bb.getBlob());
var worker = new Worker(blobURL);
worker.onmessage = function(e) {
// e.data == 'msg from worker'
};
worker.postMessage(); // Start the worker.
Blob URLs
The magic comes with the call to window.URL.createObjectURL(). This method creates a simple URL string which can be used to reference data stored in a DOM File or Blob object. For example:
blob:http://localhost/c745ef73-ece9-46da-8f66-ebes574789b1
Blob URLs are unique and last for the lifetime of your application (e.g. until the document is unloaded). If you're creating many Blob URLs, it's a good idea to release references that are no longer needed. You can explicitly release a Blob URLs by passing it to window.URL.revokeObjectURL():
window.URL.revokeObjectURL(blobURL); // window.webkitURL.createObjectURL() in Chrome 10+.
In Chrome, there's a nice page to view all of the created blob URLs: chrome://blob-internals/.
Full Example
Taking this one step further, we can get clever with how the worker's JS code is inlined in our page. This technique uses a