Podcasts could be much more interesting to the audience and the speakers if they had a visual aid that mapped the conversation flow and points dubbed by the speakers they'd like to return to.

When I tune into podcasts, the most engaging ones are structured differently. 

You know it through their unique expression and structure:

  • It’s barely post-produced

  • It's a long-form, usually at least an hour or two

  • It’s a conversation, not an interview

  • It’s a debate, not a fight

Try getting a sense of what I mean by exploring these podcasts

These podcasters engage in some form of Socratic dialogue, which makes learning and staying engaged easy.

However, it’s easy to get lost in conversation.
It’s very fluid, which seems to be a tradeoff for the Socratic structure.

And it’s not just me.
Both the interviewer and interviewee find themselves lost periodically.

You’ll hear phrases like “let’s put a pin in it for now”, “let’s revisit this”, “I’ll get to this point soon, so let’s hold off for now”

I think these are the symptoms of many ideas and their relationships causing us to hit a sort of cognitive ceiling. It’s difficult to hold it all in your head.

Stephen C. Levinson articulates well in the abstract of Turn-taking in Human Communication - Origins and Implications for Language Processing. Trends in Cognitive Sciences, 2015

“Most language usage is interactive, involving rapid turn-taking. The turn-taking system has a number of striking properties: turns are short and responses are remarkably rapid, but turns are of varying length and often of very complex construction such that the underlying cognitive processing is highly compressed.

The bulk of language usage is conversational, involving rapid exchange of turns. New information about the turn-taking system shows that this transition between speakers is generally more than threefold faster than language encoding.

 To maintain this pace of switching, participants must predict the content and timing of the incoming turn and begin language encoding as soon as possible, even while still processing the incoming turn.

 This intensive cognitive processing has been largely ignored by the language sciences because psycholinguistics has studied language production and comprehension separately from dialog.”

We need a solution that reduces the cognitive load of all the parties involved while not eliminating the richness that is created by the complexity of ideas.

And I think we can draw inspiration from messaging interfaces to form a framework for what a solution could look like.

The podcast’s flow of conversation mustn’t be interrupted.

Yes, it makes for an awkward show, but it also renders trains of thought void that otherwise would have been fleshed out.

So the solution needs to be mostly automated to retain text and context, therefore preserving as much “meaning” as possible.

I'm imagining this as a messaging interface where the words are transcribed live on screen, visually denoting which words are coming from who.

Think about your traditional messaging format.

A live transcription software that maps words to the speaker who said them isn't of much help to the podcasters. They just said the words.

It’s mostly for the audience if they want to go back to some spot in the conversation.

Next, it's common for speakers to want to return to a part of the conversation.
Remember our phrases like “let’s put a pin that”?

We need functionality to allow the participants to put a denote a single thought or collection of thoughts.

Something like "I have something to say on that last statement, but I think we should come back to it later".

That pin should be short, only 3-4 words, not unlike a sticky note you would write for yourself.

The point of the pin is a reminder to return the conversation and context that led up to that point, to be expanded on later in the podcast.

Just enough context to re-prompt the speaker to expand on it later in the conversation. We want to retain both the “meaning” of the thought in that particular moment and have an external memory that reduces cognitive load.

As a conversation progresses, focus shifts, and "pins'' are created, the speakers will have increasing difficulty holding the whole conversation in their head. They’ll forget important points they meant to revisit.

 This creates loops that are never closed.

At some point, it'll make sense for a speaker to revisit that "pin".

This happens in a couple of situations, like when the conversation is ending and you want to close all the "open loops" that the "pins" represent in the mind of a speaker, or during a seemingly unrelated piece of the conversation, it happens to trigger a speaker to connect the current thoughts back to a "pin".

Thus, the "pin" acts as a reminder but also a cognitive hook to hang pieces of the conversation on.

Leading to lateral move to lower cognitive load while retaining the full complexity and “meaning” of the conversation.

Think of it as a visual-spatial aid.
Unlike note-taking, it won’t interrupt the natural flow.
It acts like a mediator connecting the conversational dots.

UX designers would say a feature like this reduces cognitive friction. If right, it’ll allow the conversation to be richer, more valuable, and enjoyable for everyone.

This new idea is closely related to Crawford & Dombkowski's "Nonlinear conversation medium", but instead of augmenting written dialogue like a text message conversation, it augments auditory conversation.

Traditional messaging systems have a complete memory of the conversation. But most are written conversation, which is procedurally represented, one thought after the other.

The ideal system would provide a visual aid that takes us from procedural to nonlinear, like a natural verbal conversation.

Procedural conversational design is linear, top to down, like our messaging systems.
A juxtaposition to the left or right of that conversation, using a different spatial dimension, left and right of the main flow, using “pins” could be the new form.

The juxtaposition allows you to see the flowing conversation, while also seeing "all" the pins created, allowing the speakers to retain object permanence resulting in reduced cognitive load.

And when a pin is revisited, it will be emphasized or visually emboldened to communicate to the speaker and audience that we are currently revisiting that pin.

Allowing us to figure out where “we” are in the conversation.

While this idea is loosely developed, I think it has legs.
I pulled inspiration from many places but heavily from here

I’d love to hear your feedback, constructive criticisms, and requests for clarifications.

