Using Gestalt Principles for Natural Interactions

Carolann Bonner

Gestalt is a term used in psychology which expresses the idea that the whole of something is more important to our understanding than the individual parts. The Gestalt principles describe the way our mind interprets visual elements.

The principles I find most helpful day-to-day are:

  • Similarity
  • Enclosure
  • Continuation
  • Closure
  • Proximity
  • Figure-Ground

We’ll look at a few examples of each principle and break down how it informs the way you interact with an interface.


Perceiving objects that are similar to be part of a group or pattern

You can see similarity being used in Van Gogh’s “Starry Night”. We are able to distinguish the stars from the night sky because of two contrasting attributes:

  • The circular orbs that we perceive to be stars are all the same color, yellow.
  • The direction of the brush strokes making up the stars are all moving in the same, circular direction.

Van Gough Starry Night

This tells us that all of the elements with those two attributes are the same. It also tells us that they are separate from the night sky.

Let’s take a look at an example of similarity used in an interface. In this example from Tumblr, we perceive the links, represented by icons, to each act as a link to create a blog post.

Tumblr Post Icons

What are the similarities?

  • Each option is represented by an icon with text beneath.
  • Each icon and text beneath the icon are the same size.
  • The icons are evenly distributed in the space, each given equal treatment.

What does it tell us about the process of creating a blog post?

  • As a user, we know that any one of these icons represent a means to a similar end – creating a new blog post.
  • We know exactly where to go, or what UI elements to look for when we need to create a new blog post.


Things that appear to have a boundary around them are perceived to be grouped, and therefore related.

This example of a Facebook post has 3 instances of enclosure that afford the clarity of this interface.

Facebook Post

The first enclosure is the post as a whole (highlighted in the screenshot below). Each Facebook post is enclosed in a rectangle with a white background, thin gray border that distinguishes it from the light gray background.

Facebook Post

The second enclosure is the representation of the link within the post. The photo followed by the title and description of the link appear to be grouped together, and therefore related. I now know the organization and separation of information.

Facebook Post

The third enclosure is the area at the bottom of the post. Everything related to social interactions is enclosed inside of the light-blue background. As a user, this allows me to know exactly how and where to interact with this post.

Facebook Post

These enclosures provide affordances that allow me to group and interpret the information accurately.


The eye creates momentum as it is compelled to move through one object and continue to another.

Here’s a screenshot of Google Maps walking directions. Rather than a series of blue dots, we perceive this as a single line.

Google maps walking directions

We also understand we are to physically walk in the direction of this “line”. Nothing in the interface explicitly tells us that the dotted line indicates direction. A small icon of a person walking and the blue dots create the idea of momentum and direction.

Another common application of continuation is the timeline of a media player.

Rdio audio player

This line represents the duration of the track. As the track plays, the color of the line changes.

The second color is perceived as a second line. As that line grows, we perceive the passage of time. We don’t expect the second line to continue past the end point of the first line.

This gives us the understanding that when the second line reaches the end of the first line, the track has played to completion. You don’t imagine that the second line could extend past the first.

The interface does not need to offer hints in the form of visual “nouns” (e.g. an arrow indicating duration or time) because the visual “verbs” (e.g. the animation/interaction when the track is playing) teaches users very quickly when the track begins and to anticipate when it finishes.


When an object is incomplete, but enough of the object is indicated, the mind perceives the object to be whole by mentally filling the information.

Take a look at the Notifications icon in Twitter’s interface. When you have a notification, a number enclosed in a square is placed over the icon.

Twitter Notification Icon with Notification

There is enough of the bell visible for our mind to still read this icon as the bell.

Let’s look at another example of closure being used to complete an interaction. In the Urban Outfitters online store, notice what happens when an item is added to my “shopping cart”.

Urban Outfitters Online Shopping Example

In this interaction, once the “Add to Bag” button is clicked, a few things happen:

  • The text inside of the button changes to “Added!”.
  • A number appears next to the shopping cart icon in the navigation.
  • A modal slides down from the shopping cart icon which confirms, again, the item has been added to my shopping cart.

The fact that the item has been added to our shopping cart is implied through the interface. We didn’t go to my shopping cart page to see the items in it. We also did not need to use a drag and drop interaction (which is often more work for the user) to create this reassurance. We receive enough visual feedback in the interface to assume that the item has been added.

All of this information is understood without actually having to go to the shopping cart page.

Proximity (or Grouping)

When elements are close together, we perceive them to be part of a group.

Let’s take a look at the layout of Twitter’s profile information: Twitter example

The avatar, cover photo, display name, and user name are placed close together. Because they are close together spatially, we read this information as a group, and thus, being related.

The stats associated with the Twitter account are located a few pixels below the grouping of personal information.

Twitter example

The pink line in this screenshot highlights the negative space separating the two groupings and creates the boundary separating their proximities.

We can look at another example from Twitter that utilizes the same principle:

Twitter example

The elements that allow you to interact with this tweet are close together and are located farther down, vertically, than the rest of the content and elements in the enclosure.

Twitter example

The highlighted areas expose the groupings created by the layout. You can see how the proximity of a number to their respective interaction icons indicates the relationship between the number and the icon.

A note about white space

You may hear designers say things like, “we need more white-space” from time-to-time. White space is a synonymous term for “negative space”.

In many cases (not all), white space is used as a sort of enclosure (reference the enclosure principle above). The negative space acts as an invisible border. By doing this, it actually defines a region of proximity, thereby adding meaning to something that might otherwise look too “busy” or cluttered to make sense of.

The really cool thing is that this meaning is created without having to add lines, colors, or other visual elements. The areas where visual elements are absent (the white or negative space) actually creates just as much meaning than the presence of visual elements (the positive space).

This is the same idea as grouping digits of a phone number. It’s easier to read and remember if the numbers are grouped. The grouping is visualized by adding negative space between the numbers.

E.g. 555-555-5555 vs 5555555555

Figure Ground

Perceiving certain objects as being in the foreground and other objects as being in the background.

A common example of figure-ground is the interaction of opening up a modal.

New York Times Modal

In the New York Times example above, the figure-ground relationship is manipulated by:

  • a white, transparent background that softens the appearance of the original content you were focused on.
  • a border and subtle drop shadow around the box containing the log in fields.

The figure-ground relationship allows us to understand this interaction. You perceive the modal to be in the foreground and the New York Times home page to be in the background. This tells us that we have not left the page we were on because we can still see it “beneath” the transparent white background. However, the context has changed, as it now appears to have moved to the background, and new elements are in the foreground.

A note about minimal styles

Notice the minimal visual styles applied to the interface in our New York Times example. The border around the login modal is about 1 pixel wide, the drop shadow has a very limited spread and a light color so as not to create too much contrast.

When creating something to be minimal, (which is different from Minimalism), we want to know: what is the least amount of detail that can be added to create the necessary impact?

Deiter Rams\‘ final principle for good design states that design “is as little design as possible”. Understanding how Gestalt principles are are applied allows us to create the essential meaning in our products without excess design, styles, or steps.


An interface should be more than a collection of isolated interactions. Our minds want to perceive that smaller interactions are related to each other and work together to complete a larger task.

If we’re not able to perceive this, the disconnect leaves room for confusion. People need to see that everything is somehow integrated into the larger goal at hand.

You can use these principles to build a more intuitive interface, identify problems, and find solutions in an existing interface.