Voice in a Multicultural DC/Warehouse
A sound solution for building productivity and performance
By Tom Upshur
PITTSBURGH, PA (Sept. 25) - Today's distribution center (DC) and warehouse labor landscape is infinitely different from that of
previous generations. Nowhere is change more evident than in the proliferation of different native
languages being spoken within an individual distribution center. With the continuing globalization of
business, multinational companies face a variety of challenges in managing their supply chain across
multiple languages and countries.
According to a March 2009 report from the U.S. Bureau of Labor Statistics, in 2008, 24.1 million
persons, or 15.6 percent of the U.S. civilian labor force age 16 and over, were foreign-born. The study
found that foreign-born workers are more likely than their native-born counterparts to be employed in
production, transportation and material-moving occupations (16.4 versus 11.5 percent). This reality
has enormous implications for the world of logistics, validating what most distribution center and
logistics leaders have known for some time.
Technology has come a long way in improving productivity and performance in the DC/warehouse.
While there are many different types of technology applied, one solution in particular has a special
niche to play in addressing this multi-language aspect of human beings working together — voice.
This article will examine the merits of voice in the DC for supporting multiple native languages.
Voice, Defined
But first, what exactly is voice, and how does it work? In a voice-directed DC/warehouse, employee
assignments are sent via Wi-Fi from a warehouse management system (WMS) to a lightweight,
battery-powered computing device worn or held by the worker. Once received by the device, the work
assignments are converted into a series of discrete verbal commands, which the worker hears through
a headset. The instructions direct the employee to an aisle/section and slot location. Once there, the
employee confirms he or she is at the proper location and completes the task by speaking into the
headset. The worker's words are recognized by the speech recognition software running on the device,
which translates the spoken response into data and sends those data back to the WMS. The WMS
issues the next assignment and the process repeats itself.
By replacing labor-intensive, error-prone systems such as paper, screens, scanners and keyboards for
data entry, voice has been shown to increase productivity, accuracy and job satisfaction — all
resulting in improved business performance. Voice helps front-line employees succeed at their jobs by
leveraging the most human approach to communication, a two-way dialogue. Literally talking their
way through their daily tasks, workers achieve greater productivity, accuracy and safety — the keys
to improved job satisfaction.
Voice in Multiple Languages
While it would be technically possible to have as many as 10 or more languages running
simultaneously in a given DC/warehouse, those companies with extremely multicultural workforces
often tend to limit the number of languages spoken on the voice system to three or four to avoid
encouraging cliques based on country of origin.
In most cases, U.S. companies choose to provide the instructions in English and allow their people to
answer back in their native tongue. But if management wanted to, instruction could come in an
employee's "mother tongue," and the response back could be in the mother tongue, too. Or, every bit
of two-way dialogue could be in English. It is all a matter of how the company wishes to conduct its
business.
Leveling the Playing Field
Today voice is used daily by hundreds of thousands of workers around the globe to drive business
results. Closer to home, as borne out by the above U.S. Department of Labor Statistics report, many
distribution operations are challenged by a multi-language workforce. Those using voice have found a
strong benefit for their multicultural workforce: it gives all of their workers a level playing field.
According to Robby Dhesi, director of distribution with Fox Racing Inc., a leading sport apparel
manufacturer based in Morgan Hill, Calif.: "Regardless of today's tough economy, it's important to
consider quality of work life. For our non-English speaking DC workers, using voice helps to improve
their quality of work life, because being able to answer back to the system in the language they are
most comfortable with makes them feel more in control of their job; thus, it is easier to perform well."
Fox Racing worked with HighJump Software and its authorized business partner Vitech Business
Group, Inc. to optimize its warehouse management system (WMS), and with my company, Vocollect,
to offer a voice solution.
For organizations with a multicultural workforce, voice provides distinct competitive advantages:
-
Better productivity, because workers adapt quickly and comfortably to the new system. And in
the case of a speaker-dependent system, the system itself adapts to them.
-
Less training cost and time — DC/warehouse workers, including casual workers, can be trained
in less than an hour and be working productively within a day. This is particularly critical for
seasonal workers and retail distribution, where high levels of temporary staff are utilized.
-
Familiar, two-way communication using natural language patterns improves efficiency, because
voice helps all workers perform their best, regardless of what language they speak.
-
Workers feel that their company leadership has positioned them to best succeed in their
position. This boosts their personal job satisfaction and helps the organization achieve longerterm
employee retention.
Even as a new immigrant speaking minimal English, non-native workers have every opportunity to
reap the same rewards in company incentive programs, because language isn't a barrier. The mobile
device is automatically set to recognize individual accents and regional dialects. Workers can "train" the
software to understand their chosen vocabulary and preferred colloquialisms, which provides a
more natural work environment.
Obviously, it takes quite sophisticated management software to allow a company to manage multiple
languages in the DC. For example, Vocollect's fully automated management software has loaded on it
the voice templates of the correct text-to-speech (TTS) engines (speech-out) and the task (e.g.,
picking, replenishment, put-away), as well as each individual worker's voice templates. The individual
worker does a one-time recording (training) of voice templates. So whether the voice system is in
Peoria, Paris or Tokyo, the voice application software supports the process.
Says Dhesi: "Many times you have a supervisor who oversees both English-speaking and non-English
-speaking teammates. With the newest versions of management software, supervisors can actually see
the dialogue words on the screen so they can follow what the system is telling the employee to do and
how he or she is responding for training purposes."
Smith Drug Company, headquartered in Spartanburg, S.C., is a full-line, full-service distributor of
pharmaceuticals and over-the-counter merchandise, serving customers in more than 15 states. In the
company's main DC are workers of Russian, Asian and Latin American descent, in addition to those
who are American-born. Smith Drug allows workers on the voice system to receive the instructions in
English, and they can respond back in their preferred native language. Says Director of Information
Systems Randy McConnell: "We haven't had any performance challenges because of language skills
with our 12 or so foreign-born workers. The voice system helps them be just as effective at picking as
anyone else."
Continues McConnell: "In our DC operations, in the hiring process, nationality is no longer an issue;
if their background fits the job, they can be a strong performer here, whether or not they can speak
good English, as long as they understand basic warehousing terms in English. The voice system
enables us to hire people who might otherwise have employment challenges simply because of a lack
of proficiency in English."
In fact, says McConnell, there is one American-born worker on his team who is in the process of
learning Spanish, so she uses a lot of Spanish words in the voice system to help her learn better
Spanish.
Voice, Deconstructed
In order to facilitate multiple languages within the voice system, two aspects of language must be
considered: the user recognizing the system's speech ("speech-out") and the system recognizing the
user's speech ("speech-in"). Speech-out can be performed either by recorded speech or a TTS (text-tospeech)
engine. The TTS engines are language-specific, so an important consideration when
evaluating voice suppliers is to make certain they offer the speech-out languages needed for a
particular DC operation. In addition, for an application to perform speech-out in a language, the
prompts in the application/task must be translated to the language.
The speech-in capability of a voice system is typically structured in either speaker-dependent or
speaker-independent modes. Both include all the standard vocabulary in a distribution environment,
such as "pick," "go to aisle…" and the like.
In a highly multicultural environment such as those we see today across the European Union, where
three or more native languages are commonly spoken in warehouses via a voice system, many
companies have found that a speaker-dependent voice system is best suited to account for the many languages,
dialects, accents and colloquialisms that people use in the course of their work.
A typical warehouse application requires a vocabulary of fewer than 100 words and sometimes as few
as 40 words. A small-vocabulary recognizer, such as that needed for a DC/warehouse, can utilize a
computer's perfect memory to improve performance dramatically for every user, even those with
unusual speech patterns or strong accents. It can store each user's unique voice patterns for every
word the recognizer will be required to understand. Although Employee #123 will not say "one" in
exactly the same way every time, if the recognizer knows that Employee #123 is speaking, and has
access to her personal voice patterns, it will be able to transcribe her speech much more accurately
than if it tried to compare her speech to all ways of saying "one" or to an average of how most people
say it.
Thus, the system's accuracy depends on its knowing who is talking to it. A speaker-independent
system does not make use of that knowledge and is therefore inherently less accurate and much more
susceptible to background noise. Further, it often requires the use of "anchor words" — additional
words that serve as cues to the voice system to begin or conclude a dialogue. Every word that is added
to the mix takes time, and, of course, time is money.
The process of allowing a speech recognizer to "practice" with a user's speech patterns is called
training. Speaker-dependent recognizers are, therefore, sometimes called trained systems, and speaker
-independent recognizers are referred to as untrained systems.
A speaker-dependent recognizer can typically be trained in 15 to 45 minutes. As DC workers
collectively typically say these words thousands of time per day, the increased recognition accuracy
of a speaker-dependent system over a speaker-independent system results in higher productivity. This
higher productivity rapidly pays back the initial time spent to train the recognizer. And the increase in
productivity is highest for non-native speakers because, for them, recognition accuracy on speakerindependent
systems is often much worse than it is for native speakers.
A speaker-independent system cannot just accept "uno" in place of the word "one." Users must
conform to the system's expectations of the speech patterns of the language they are using. For the
workforce on a factory or warehouse floor where a wide variety of accents and languages are the
norm, this may not be practical. In a speaker-dependent voice system, when prompted with "one," the
worker is free to train the word as "one," "uno," "um," "eins," or any other preferred word for one.
Adaptive recognition further increases recognition accuracy, and thus productivity, by automatically
adjusting to the way the user speaks. When a user starts using any new system, he or she may be
tentative and unsure, and this may affect speaking patterns. This may be particularly true for nonnative
speakers. As they become more familiar with the system, their speaking patterns may change,
becoming quicker and more natural. Some voice suppliers, such as Vocollect, incorporate an adaptive
recognition feature that automatically senses and adjusts to these changes while the user uses the
system, further improving recognition accuracy and productivity.
Having to continually repeat dialogue unnecessarily because of poor recognition accuracy slows down
productivity all across the line, and that can spell significant financial cost for supply chain
businesses. When speech recognition accuracy is maximized, there are significantly fewer repeats.
This helps to ensure the company achieves the strongest level of business performance from the voice
system.
Only the Beginning
The full potential of voice across the supply chain and beyond is yet to be realized. Vocollect, for
example, has already made strong inroads into utilizing voice in healthcare settings through its
Vocollect Healthcare Systems. Its voice-assisted care solution is used by nurse aides for the
management and documentation of patient care in skilled nursing facilities (SNFs). Organizations
benefit from reduced paper processing, lower operating costs, increased reimbursements and
improved quality of care.
Considering the many other types of mobile workers there are in businesses beyond the supply chain,
one can picture that voice-directed efficiencies have a bright future in helping unleash higher business
performance. The proven capacity of voice to operate successfully in a multi-lingual environment is a
critical component of the ability to bring voice into other realms of business operation.
About the Author:Tom Upshur is vice president of product management and marketing with
Vocollect, Inc., a provider of voice solutions for mobile workers. For more information, visit: www.vocollect.com.
About Vocollect
Vocollect delivers proven gains in productivity, accuracy, safety and job satisfaction to
companies seeking to improve their supply chains. The Vocollect Voice portfolio features a
complete range of hardware, software and services built for voice performance, reliability and
adaptability in the global supply chain and logistics marketplace. Every day hundreds of
thousands of people on six continents rely on Vocollect and its worldwide network of more
than 100 certified supply chain resellers and channel partners to improve work. For more
information, visit: http://www.vocollect.com.
Susan Muttart
Vocollect
scmuttart@vocollect.com
412-349-2543
About Vitech Business Group, Inc.
At Vitech, we use our knowledge, experience and passion to achieve exceptional levels of
supply chain performance for our clients. We are focused on providing our clients
comprehensive solutions that solve their supply chain challenges and enable their operation
to perform to its full potential. Vitech forms strategic partnerships with the industry’s leading
software, hardware and service providers to deliver complete supply chain solutions.
Commitment to our customers’ success ensures they achieve the best investment-to-return
ratio in the industry. To learn more about Vitech, visit: http://www.Vitechgroup.com
COMPANY CONTACT:
William Ryan
Vitech Business Group, Inc.
info@vitechgroup.com
360.647.1622