It seems like every day a new social audio platform emerges. Such is the power of social media’s latest darling – everybody wants to get in on the action.
The hype that started with Clubhouse quickly spurred growth of audio-only or audio-centric platforms, from Big Tech’s Facebook and Twitter to Spotify and even Slack to dozens of social audio startups hoping to find their place under the spotlight.
Is this hype warranted, though? Are people that much into social audio and will these hype levels sustain in the months to come?
As an audio freak, I find the answer is a bit complicated.
The good side
It’s fairly easy to see why social media users were instantly enamored with audio, and will likely remain in love with it.
For starters, it emerged as a welcome break in a most opportune time. Because of the lockdown and social distancing, the audio-only approach to social media offered a welcome break from the usual screen fatigue that only grew worse by being tied down to a couch.
Arguably, the biggest appeal here is the way conversations are held: through genuine speech.
When I say genuine, I mean actual people. Speech communicates ideas and plays a powerful role in communication. As a form of social behavior, speech also forms relationships. Through ways of speaking, we signal and create in our minds the relative status of speakers and a certain level of rapport.
It’s one thing to read a comment by a random username or even a person. Hearing their voice allows for a deeper connection as you have something to tie to their online persona. You get the feeling you know the people you follow on social media better.
Very few things (if anything) can replace or mimic the intimacy of the spoken word. Everything we say is said in a certain way: tone of voice, rate of speed, and specific level of loudness. The speaker is literally inside a listener’s head so when they are talking about something in an audio-only environment, they really have the attention of their audience.
What adds flavor is the fact that every person has a characteristic speaking pattern. This includes various elements such as being direct or indirect, pacing, pausing, accentuating, vocabulary, as well as the use of figures of speech, jokes, slang, and so on.
In other words, each one of us has a set of culturally learned signals by which we do three things: communicate what we think, interpret the meaning of other people’s responses, and evaluate one another as people.
That is the beauty of social audio: all the nuances of the way we talk.
It might look like it but communication isn’t as simple as just saying what you mean. How you say it is crucial, and differs from one person to the next as everyone behaves differently socially. How we talk and listen matters.
Also, long-form conversations are hard to pull off via text, as are group conversations. The intimacy of our voices makes audio social media that much more appealing, especially when social distancing is a thing.
There’s the benefit of comprehension based on the speaker’s inflections or intonations as some nuances can only be communicated via audio. Implementing the immediacy and raw emotion of the medium into the core experience is driving people to connect via voice again, effectively coming full circle from the initial promise of cellphones.
One story that really resonated with me was Israelis and Palestinians bonding via a Clubhouse room during times of most recent turmoil. A private chat quickly turned into a meaningful, emotional conversation where both sides had open and impactful conversations from their own perspectives.
As a person smack in the middle of this seemingly never-ending conflict, you can see why this has hit close to home. Audio, in general, has that kind of influence: to get to you in a deeply personal and liberating way.
In my case, that impact was only amplified. Social audio works incredibly well in that regard due to the medium’s intimate and immersive nature that leverages storytelling. The latest craze in social media has proven that it is more than able to engage people in deeper ways, perhaps even more than digital media has traditionally been able to.
The bad and potentially ugly side
Generally speaking, the only con of social audio I can think of is a big one: live moderation.
Moderating for misinformation, hate speech, and harassment is a task that most platforms perform inadequately, at best. Add in the real-time element and the job becomes significantly harder.
Rooms with hundreds or thousands of people listening in, where anyone can come and go as they please, are tough to police.
Right now, Clubhouse is doing it by empowering its moderators to mute and flat-out remove other speakers if any form of abuse surfaces. Discord relies on community moderators on a server level who are in control over who stays, who is blocked, and what topics are and aren’t okay to talk about. Plus, there is the trust and safety team as a backup.
Some, like Twitter, go over audio recordings if a room or user were flagged.
Bottom line is that details on proactive measures are virtually non-existent because the current tools designed for content moderation are mostly built around text. This makes identifying problematic audio content a far more difficult and cumbersome process because it involves transcribing chats and then examining them.
Ideally, there would be tools with advanced natural language processing that can identify such content momentarily and automatically without users flagging it. That is obviously easier said than done as indicated by the lack of information on Facebook’s and Twitter’s approaches, the platforms with less than stellar records in content moderation.
Live moderation is really a separate topic that deserves a much more thorough discussion. Still, the fact that these two giants haven’t cracked the technology yet, speaks volumes about a problem that is hard to solve. I want to believe there is both time and will to figure out how to moderate voice at scale, before bad rep catches up.
Where does social audio go from here?
In my mind, social audio has a quantity over quality problem.
There are just too many platforms offering the same USP, and there aren’t that many conversations to go around. In a way, social audio will be at a crossroads in the next few months as audiences filter out the platforms that matter to them.
In the same way that pandemic-induced boredom spurred massive interest, we shouldn’t underestimate the novelty effect that is slowly wearing off. It’s been almost ten months since Clubhouse exploded on iOS devices and it’s neither a shiny new thing anymore nor that unique of a platform.
If I had to make a bet, I’d bet that social audio will serve best for niche discussions.
I can totally see groups like Voicebot Community Discord being relevant for a long time because it focuses on the voice AI industry, a specific segment that knows what it can expect in terms of value.
In the same way, Spotify’s GreenRoom can act as a discussion board for fans of a certain artist or genre, host debates for podcasts and specific episodes, and so on. I see this happening more on Spotify than on Twitter Spaces, for instance, or at least more on the former platform due to its specificity.
On a broader level, Spotify can use these rooms to get live feedback for their newest features and updates, or host live promotions and discussions with artists.
There are numerous possibilities where audio conversations can be seamlessly leveraged. I just don’t think we’ll have a centralized platform for it like we do with Twitter for quick bursts of information, for example.
As it grows and evolves, social audio will create a few unique opportunities.
For the most part, these will be reserved for new content creators.
Besides transforming the experience for audiences and incorporating a more active role for them, social audio is enabling more people to become creators. From niche industry experts to plain regular Joes and Janes, many are literally and figuratively discovering their voices through their communities via a format that is less intimidating than starting and managing a podcast, for instance.
This will all likely be followed by influencer marketing as more brands embrace this social facet, possibly more on the B2B side. The accessibility of this format is what is enabling business leaders to create more easily, unlike, say, YouTube or even LinkedIn.
This way, hosting online summits and panel conversations within their business communities is easier. With less formal conversations focused on various subject matters, social audio can convincingly replicate the experience of a conference and its value proposition. The added bonus is the ability to directly connect executives and thought leaders with the right decision makers.
For B2B brands, this means they can have their employees and brand evangelists actively engage on social audio, as well as partner with others to promote their messaging by sponsoring a room or paying to host it with industry authorities.
Then, there is the eventual introduction of voice tech in this space. I reached out to Bradly Metrock, CEO of Project Voice and one of the leading voices in the audio and voice landscape, to get his two cents on social audio. Here’s what he said:
“Like a multitude of things borne out of the pandemic, such as wiping down elevator buttons and working from home, social audio’s purpose in a post-pandemic world will have to be defined and simply can’t be assumed.
The intersection between social audio and voice technology is an interesting one. If we’re having audio-only conversations with each other in real time, there’s no reason those can’t be interconnected with voice assistants and conversational AI. To me, this is one of the most exciting spaces that is yet to be explored.”
I agree with Bradley as the combination makes perfect sense. For instance, an upgrade of Spotify’s voice control could improve discoverability and accessibility of social audio content, much in the same way users now use built-in voice commands to play music and podcasts.
We’ll see who makes the first tangible move and gains the competitive edge, arguably something that will be very much needed in the sea of Clubhouse clones and big name platforms.
There is no doubt that social audio is slowly bringing about a major change in social media. Still, it’s too early to tell how much of this virtual socialization will stick once the novelty effect fully wears off.
Right now, I’m in the waiting mode to see which platforms get filtered out for specific conversations. It will take some time for platforms to learn what creators and listeners prefer from their live audio experiences.
However, it’s clear that one obvious path to integrating social media content into daily life is through the eardrums, at least for the time being.
The proof that audio is the next frontier is not just in social media, it’s in media in general. Netflix’s audio-only background playback on Android and YouTube’s audio-only ads when listening to music and podcasts in the background on desktop are clear indicators how being audio-centric as much as possible is a growingly important part of the broader media ecosystem.
Furthermore, a user-driven audio platform poses a significant threat for the digital publishing industry, especially those publishers who have invested resources in developing original audio content. They need to consider the impact of such user-generated content and act more deliberately regarding their audio strategy.
On the other hand, this can and should be the impetus for other publishers who are still unsure about their approach to audio. In a media environment where players include Big Tech and popular audio platforms such as Clubhouse, the balance of power can shift quickly and irreversibly.