Community Mapping
Heat Maps
Churn Rate Graphs

Blog

API-Driven Development

When many people hear “API” they think about interfaces web services—such as Twitter and Facebook—provide for 3rd party developers.

What is an API? API stands for Application Programming Interface. APIs serve as interfaces between different software programs and facilitate their interaction, similar to the way user interfaces facilitate interaction between humans and computers. The key thing to understand is that APIs don’t have to be an interface between your application and the outside world. An API can be incredibly useful for applications running under the same roof.

At Data Realization, we strongly encourage creating simple, flexible APIs around data. What does this mean? Let’s think about charts and graphs. You can grab a free charting widget online, and there are typically two ways of delivering data to it: in the HTML where the widget is instantiated, or available at a specific URL. Either way, the chart will typically grab the data from where you point it, display it, and sometimes allow you to interact with it (e.g. mouse-over to show a number). This can be sufficient if all you need to do with your data is plot it using a free chart widget, but not for more advanced analytics.

With advanced analytics and data visualization, you’re creating an application that runs on a client computer (often in a browser), and has a conversation with a server. The conversation takes place via an API, where the client tells the server what data it wants, and the server responds with the data. The server becomes a simple data provider, focusing on security/authentication and scaling/caching, while ignoring presentation.

Let’s say you have an interactive chart that allows you to view a year of data or zoom in to view an hour of data. You don’t want to load a year’s worth of data in a one-minute resolution. Instead, the front-end application can say to the server, “I want a year’s worth of data in one-week resolution.” If the user zooms in, the application makes a new request, saying, “I want a month’s worth of data in one-day resolution.” On the back-end, preparing for this kind of use-case is trivial, where you would create roll-up tables in the database that house data at several resolutions. The API would then take the requested resolution and find the most sensible table to pull from.

The end user experience is an application in which they can put their hands on the data, and zoom in with animations to help give additional context to the data.

APIs for data are easy (in most cases).

Database records are easily serialized (JSON, XML, etc). Most data is immutable and easily cache-able and pre-cache-able. Taking the conversational approach is easier than it sounds, as request parameters are validated and translated directly into database queries. Providing data through an API encourages modularity and separates data processing from presentation.

What’s the Question?

Data visualization isn’t about the tools. It is about learning. The tools should not be selected or designed until there is a clear understanding of what questions are going to be answered.

One of the most common visualization tools is the line graph. It’s simple, intuitive, and helps answer the question, “How did this value change over time?” Typically, the value is pulled straight from the database or logs.

Plotting a value over time on a line graph is taking the engineering approach: examine the input format and find a matching output format. When you start from a user’s perspective, you’ll hear phrases like “the relationship between,” and “the rate of,” and “how does X affect Y?” These are not as easy to drag and drop into a graphing library, but the results are much more profound.

The best answer isn’t always a number. The answer can be a shape, a gradient, or a trend.

It all depends on the question.

Heat Maps in Flash

Heat Map rendered in Flash

Heat Map rendered in Flash

Three years ago, I rendered a large heat map representing almost 100,000 Digg users and the 300,000 friendships between them. I used PHP/GD2 to render the image and it took quite a while to render. Due to the difficulty of redrawing it, the heat map was never updated. It would have been great if the heat map reflected current information, but instead it got to live on as a snapshot taken in 2007.

Since then, I’ve wondered about ways of rendering heat maps on the fly, ideally using the visitor’s CPU. Finally, in November, I was able to hack together a highly optimized Flash application that produces pretty awesome-looking heat maps (if I do say so myself) within a second or two.

Check it out