Overview

If there is one experience to get right in an AR app - it’s the camera experience.

At the get-go, we knew we had two camera experiences to design:

“See What I See” (SWIS) - our remote assistance experience. This served our most common use case and was our bread and butter. It had to work.
Our discovery experience - a more open-ended camera experience, which carried all the promises of AR.

Camera Experiences on Mobile Devices

As both experiences are dependent on the camera, there were common constraints to take into account:

Takeaways

Define modular IA and navigation as much as possible to accommodate the numerous, possible nav & feature combination sets that inherently come with low-code authoring and workspace configuration
Create flexible navigation and UI patterns that can absorb future content types (and their intrinsic interaction requirements)
Ensure discoverability, while keeping persistent elements to a minimum (expanding menus?)
In camera experiences, enable users to manipulate the relative, proportional size of their viewport or camera feed in real-time, to serve their current needs (e.g. Remote Expert A wants to see a tech’s camera feed as large as possible while providing assistance, whereas Remote Expert B wants to see the camera feed while simultaneously referencing that machine’s service history)

While we did explore options that combined these two experiences, we ultimately decided they were serving different enough use cases (most users came for one or the other) and had significantly divergent constraints. Remote assistance is primarily focused on communication and creation (e.g. live annotations), while self-assist/discovery is centered around consumption and AR interactions. They are different enough to justify separating and treating as discrete/distinct experiences.

“See What I See” (SWIS) - Video Remote Assistance

What is the anatomy of remote assistance in our use cases? It is an inherently 2+ person, collaborative experience meant to solve a specific issue or to provide support.

Users coming to our product for SWIS were distinct and they were coming to do something specific.

They know what they need or want
They need easy access to 2-3 specific actions and become frustrated if not immediately accessible
They rely on a clearly defined set of actions that are all of pretty equal importance

Our goal was to facilitate communication between experts and learners, minimize UI distractions and provide quick access to critical actions.

The drawer approach provided the user with the flexibility they needed and provided us with a paradigm that could accommodate all required actions today, while still providing flexibility in the future.

Self-Assist/Discover

Our discovery experience on the other hand has minimal scope now - but has potential and big plans.

In our ideal discovery experience a user identifies an object via marker or marker-less recognition and is then served contextual resources and actions.

This experience acts as the window into an augmented world and can be quite open-ended. Without relevant prompts, users don’t know what to do. Responsiveness is critical, as is understanding context as quickly as possible in order to provide tailored direction. We also need to think through how we give users control over what they’re viewing. This gets tricky when considering simultaneous access to 2D and 3D content, since we’re dealing with everything from viewing manuals in 2D to interacting with and manipulating 3D model/objects overlaid on the environment in AR.

Some Interesting Decisions

As we were approaching our Discovery designs, we tried splitting up the experience in a few ways: splitting by category of action, by user role, by specific stage or phase - dependent on user actions, etc.

Split Pre/Post-Identification

One issue we faced with almost every iteration was that providing too many options to users was as confusing as providing only absolutely critical ones. Instead of trying to come up with a navigational paradigm to support such a diverse range of tools that are always accessible, we explored splitting the nav into two distinct stages: 1) pre-identification and 2) post-identification.

Users could scan the area and, once a marker or object was identified, they would then be presented with context-specific content and actions. Based on the configurable feature sets, defined at the workspace level (e.g. some customers only want to support scanning barcodes) we could also use this split to help us create a modular navigation. This would allow us to simplify the initial experience and provide progressively more targeted interactions and content.

2. Removing Target-Specific Scan Modes

Another critical decision was how our scan functionality worked. In a lot of AR apps today, users are prompted to pre-select what they are scanning for (e.g. Google Lens). From what we had found through exploration and prototyping, separating modes didn’t actually make the experience more intuitive for the user; but it did improve detection accuracy. It was also the approach Engineering specifically wanted.

We were designing with this as a constraint for a long time, before we realized that it wasn’t really making the experience better for our users. Improved accuracy didn’t matter if they didn’t even know they should start scanning. We explored the target-specific scan optionality and instead tried having scan mode on by default when a user entered this experience. This helped more users find and interact with content in AR much faster and with less confusion. In the end though, we ended up splitting up the scan modes anyway in a compromise with engineering. It was, after all, a pretty understandable tradeoff given the engineering impact/scope of implementation.

3. Context Specific Interactions and Display

One thing (of many) I learned from working on this low-code platform is that at the end of the day, you need to define every type of content a user could interact with and then design the interaction experience per content type, per device type, and even per situation in some contexts. This was true at least for creating the initial bucket of building blocks to support the low-code authoring.
An interesting example of this is an alarm. Let’s say I’m looking down a line of machines. One machine is a little warmer than usual, I want to see an alert that lets me know that this machine needs service. Let’s say this warning can show up as a piece of AR, IoT-triggered content floating next to the relevant machine. But what if meanwhile, another machine elsewhere in the warehouse - outside of my field of view - starts overheating to extreme temperatures. I can’t just display a piece of AR content pinned to that machine. The user might not see it. Therefore, in an urgent situation, this alarm should manifest as a push notification instead - so the user can’t miss it.

You end up having to define the visual and interaction experience for each type of content, while also taking into account safety and compliance along with the usual suspects: responsiveness, internationalization, localization, accessibility, etc. With every introduction of a new configuration, many new scenarios on the UX/UI side need to be thought through. Much of this can be configured on the backed, constantly creating constraints which provide helpful — although ceaseless — checks and balances on the designs, based on what can and can’t be configured.