7 January 2009
Spatial Transformations of Maps

Dan Keim, Stephen North, Christian Panse, Mike Sips

Cartogram of Proportion of Endangered Species in the U.S.

Many data sources are geographic: census statistics, public health records, environmental data and business transactions. When data has geographic coordinates, they are often a key to understanding trends, clusters and other patterns. Cartographic software and GIS (Geographic Information Systems) offer practical methods of exploring geo-related data.

Conventional maps only show data in relation to land area, not population or data set size. For visual data mining, we are interested in map transformations that show data in proportion to the size of the data set under study.

Intentional spatial transformation is actually part of any map of a section on the surface of the earth's sphere projected on a 2-D page or screen. This implies some nonlinear distortion, though usually the goal is to minimize its effects when showing land features such as land area, mountains, rivers and cities. On the other hand, to visualize geospatial statistical data sets, cartographers have proposed intentionally distorting individual map regions (such as states, provinces, and counties) so that their areas are proportional to an input parameter such as population, wealth or occurence of disease. These spatially transformed maps are called cartograms. In a cartogram, the size of map regions depends on the data under study, not raw land area. This is clearly relevant to visual data mining with maps.
The challenge in making cartograms automatically is to preserve the shapes of the input map (individual regions as well as the overall map) while making each region's area close to its statistical target and not changing the connectivity between regions. This may be formulated as a nonlinear optimization problem, where the objective is to minimize some function of the total shape and area error, subject to constraints that preserve the input map's topology. Approaching the solution by general optimizaton techniques, though, has been unsuccessful due to the computational complexity of this problem. (In fact, it is infeasible in general- a "perfect" solution is not even possible, so some constraints must be relaxed to get an approximate answer.)

USA relief map
M-Carto relief map

In 2001-02, Keim, Panse and North developed the CartoDraw heuristic. This method stretches or shrinks parts of a map with respect to scanlines placed through some of the map's regions. The heuristic is effective because candidate adjustments are inexpensive to compute, and a high enough proportion actually improve the solution without violating topological constraints. Scanlines can be generated automatically, or even placed interactively - the interactive option permits some manual control over the optimization. CartoDraw is scalable enough to make a cartogram of the 3000 counties of the United States. (This scale is well beyond previous techniques that explicitly compute shape error, though admittedly, with 3000 regions, the output starts to resemble the continuous case that can be dealt with by simpler "rubber-sheet" models that ignore shapes.)

Note that rectangular cartograms are a valuable alternative to classical shape-preserving cartograms. They make it easier to visually compare areas, and they avoid visual noise by drawing each region cleanly. They may relax some of the topological properties of the original map, allowing some adjacent regions to be separate. Dan Keim's group recently contributed a rectangular cartogram heuristic, RecMap, to the CartoView system. (An impressive topology-preserving approach proposed by van Kreveld and Speckmanm demonstrates the difficulty of this problem and the value of fast heuristics.)


PixelMaps - Revealing Clusters in Dense Point Sets

Raw NY State Year 2000 Household Median Income Pixelmap of NY State Year 2000 Household Median Income

Many real-world data sets are much too large to show completely. Worse, they are highly non-uniform, with interesting patterns concentrated in very dense areas. Some of our recent work addresses problems of non-uniformity and scale in geographic data sets. Occlusion of data items due to overplotting is a significant problem. One common way to avoid that is by aggregating items. For example, when showing household income, we can aggregate households up to zip-5 (five digit postal code) regions. The drawback is that interesting small clusters and outliers are lost. For example, a small low or high income cluster may not be visible at the postal code level.
Daniel Keim, Christian Panse, and Mike Sips (of the University of Konstanz), with Stephen North, proposed PixelMaps as a way of overcoming this difficulty. The PixelMap heuristic is based on clustering. Each point is assigned to a unique pixel, so data is not lost. Pixel placement is adjusted locally, but in a way that not only respects but intentionally "pulls out" clusters. For example, if there is a small cluster of outbreaks of disease within a large data set, we can form those points into a distinct, spatially coherent cluster so the cluster is more noticable and not lost in the noise. The PixelMap heuristic allows setting the tradeoffs between absolute position preservation of related data points, relative position preservation, and clustering factors (size and affinity).

The above examples show the year 2000 census block-level median household income in New York State. Note how it is difficult to find income patterns in the raw data (left). In the PixelMap view (right), one can view and compare patterns in the entire state. Income clusters on the east side of Central Park and Long Island can be identified and compared with others in Syracuse and Buffalo.

<< Back to Projects & Software

 


Terms & Conditions | Privacy Statement | Copyright © 2009 AT&T