Voice System Technologies & Architecture

By Roger Byford


Overview

This white paper reviews some of the major technology and architecture decisions designers must make in creating voicepowered solutions for warehousing and industrial applications. We assume in all cases that the voice system includes wearable voice computers, communicating with their operators via microphoneequipped headsets, and with remote information systems via a radio frequency local area network (RF LAN).

We start by reviewing information flow from the voice system to other systems. All voice-powered systems must communicate with customer-owned software packages to transfer information. This section concentrates on warehouse applications, for which the relevant software package is the warehouse management system (WMS) or its equivalent. Next we discuss the following technologies, each critical to the success of any application:

• Speech recognition, which enables a voice computer to convert its operator’s speech into text.
• Speech synthesis, which enables a voice computer to talk to its operator.
• RF LAN technology, which allows a wearable voice computer to communicate in real time with remote information systems.

Finally we review thick client and thin client architectures. In a thick client architecture all processing takes place on the wearable computer. In a thin client architecture some processing takes place on a central server.

Information Flow

In this section we concentrate on warehouse operations in general, and order selection in particular. We use the abbreviation WMS (warehouse management system) to refer to whatever software system (or systems) provides the information necessary to perform order selection. This may be a true WMS or an order entry system working in conjunction with a simple stock locator program, or some other combination of programs. We use order selection as our application example because it is the most commonly implemented today. The ideas that follow, however, can readily be extended to other warehouse applications.

The information a voice directed order selection system requires includes two components:

• What products are to be selected – the information contained on a paper pick list, and
• How the selection process should be performed – priorities, splitting and combining of orders to make up effective pick trips, assignment of some orders to particular operators, etc. This information determines what pick trip will be assigned to an operator who notifies the system that he or she is ready to work.

All WMSs can provide the first category of information, but not all can provide the second. In many cases, where a WMS prints a stack of pick lists at the beginning of a shift, the second category is managed by distribution center staff, who make decisions about priorities and work assignments, and perform operations such as splitting and combining orders. Although in theory it would be possible to continue to perform this work manually after installing a voice directed selection system, in practice that would negate many of the benefits. The solution is to add a “middleware” (i.e., a software package acting to join two systems that cannot communicate directly) component to the system. This software package must accept the pick list information from the WMS, and must be able make the decisions (perhaps with human assistance and/or override capability) that will allow it to provide the second category of information to the operators’ voice terminals.

We can refer to WMSs that can provide both categories of information as supporting realtime selection. Those that cannot are often called batch systems.

Similar issues arise when considering a voice directed system for other warehousing operations. Can the WMS directly provide all the information required, or must another piece of software be added?

A second issue is the transfer, from the voice system to the WMS, of information concerning what products have been selected. For a batch WMS using paper pick lists, this information is usually keyed in manually on an exception basis (only items that were not selected as expected). Depending on the volume of information to be entered, and therefore the clerical hours required, it may make sense to continue to enter this information manually, from printouts created by the middleware package. Alternatively, depending on the capabilities of the WMS, the middleware package can either generate data files to be imported by the WMS, or transfer the information directly.

A prospective customer for a voice directed warehouse operations system must therefore consider its own WMS as well as the voice system. Questions to be answered (by some combination of the customer, the WMS provider, and the voice system provider) include the following:

• Does the WMS support real-time order selection?
• If so, have the voice system and WMS vendors already developed a direct link between their systems?
• If not, what will be the terms and technological means under which they will connect? From the technology viewpoint, several connection mechanisms are possible, including a direct link (generally using sockets), terminal emulation (in which the voice terminals pretend to be screen/keyboard terminals), and file transfers (mostly used for rapid prototyping). Of these, by far the most flexible and reliable is the direct link.
• If a middleware component is required, is the voice system provider’s offering flexible enough to meet the facility’s requirements? And does it meet the standard criteria of demonstrated reliability, capability, etc. for systems of the size and scope being contemplated?

Vocollect’s Design Decisions

Vocollect offers a variety of direct-to-WMS connection schemes, including sockets, terminal emulation, and file transfers. In addition, we offer a capable and very flexible middleware package called Pick Manager, which currently runs on a Microsoft® Windows NT®/SQL Server platform. Pick Manager is in use today at multiple sites, supporting in some cases as many as 200 wearable devices.

Technologies


Speech Recognition

Effective speech recognition (transcribing human speech into text) is critical to the success of any voice powered industrial system. The critical measure of a speech recognizer’s performance is accuracy – does it correctly transcribe what it hears? And getting a computer to perform speech recognition as accurately as a human does under all circumstances remains an unsolved problem.

So if we want a computer to achieve high recognition accuracy, we must simplify the problem. Fortunately, there are a number of ways designers can do that while still meeting the needs of the users. The simplifications that designers choose are based on the intended application of the system. In almost all cases, however, they are making trade-offs between constraining the problem and accuracy.

Another possible design trade-off is accuracy versus time. We make the assumption here that the designer must create a system that can, in effect, hold a real-time conversation with its users. Processing for five minutes to transcribe two seconds of speech is acceptable only in a very limited number of applications.

In what follows we use the word “understand” to mean, “be able to transcribe from speech into text.”

Large Vocabulary and Small Vocabulary

The words that a speech recognition system is expected to transcribe comprise its vocabulary. Human beings have a very large vocabulary. We can understand many thousands of words. A speech recognizer capable of taking dictation on a computer must also have a vocabulary of thousands of words. Such a recognizer is called a large vocabulary system. At the other end of the spectrum is a recognizer designed only to tell whether a user has responded to a question by saying “yes” or “no” – clearly a small vocabulary issue. The distinction between small vocabulary and large is arbitrary, not rigid, but one thousand words is a reasonable dividing line. Systems with vocabularies of between a few hundred and a couple of thousand words are sometimes described as medium vocabulary.

In an ideal world, designers would create only large vocabulary speech recognition systems. But there are two trade-offs in doing so. Most importantly, it is much more difficult to create a high accuracy large vocabulary speech recognizer than to create a small vocabulary one. Correctly recognizing one word from a vocabulary of fifty thousand or more is far harder than distinguishing “yes” from “no.” Second, a large vocabulary recognizer generally requires much more computing horsepower (and memory) than does a small vocabulary recognizer. So, for example, it may be difficult to incorporate a large vocabulary recognizer into a portable or wearable device.

Fortunately, industrial speech recognition systems do not generally require large vocabularies. A typical warehouse application requires a vocabulary of less than one hundred words, while an inspection application may require up to about one thousand.

One other vocabulary issue is often important to industrial users. A small vocabulary may be perfectly acceptable, but a fixed small vocabulary is not. System design, and creating a positive experience for the system’s operators, is much harder if the speech recognizer places constraints on choice of words: “You can’t use this word – you must use that one.”

Continuous and Discrete

Human beings easily recognize speech in which individual words are run together, with no audible gaps between them, to form a phrase or sentence. To do this we have to not only be able to understand the individual words, but also to decide where the boundaries are between them. Sometimes we must use very high-level knowledge to perform this task. Consider these two sentences:

“Six teen idols were cavorting on stage.”

“Sixteen idols were cavorting on stage.”

Only a deep understanding of the generally accepted use of the word “idol” allows us to understand that the first sentence is much more likely to be correct than the second.

Early speech recognizers functioned only with discrete speech. Users had to pause perceptibly between each word. Today this constraint is generally applied only in very inexpensive recognizers (e.g., for toys), or to make very hard problems easier (e.g., recognizing one of many thousands of company names to provide stock quotes). In either case, the system must make it clear to the user that a single word is called for: “Say the name of the company.”

For industrial applications, a discrete recognizer is not acceptable. Having to pause between words (for example, when entering a sequence of digits) is slow and frustrating. All speech recognizers for industrial applications should be capable of understanding continuous speech. Although this makes the recognizer’s task more difficult (i.e., makes it harder to keep accuracy high) it is possible today to create very high accuracy continuous speech recognizers for industrial applications.

Speaker Dependent and Speaker Independent

It is easier to understand someone’s speech if the listener knows who is speaking and is used to hearing him or her talk – particularly if the speaker has an unusual speech pattern or a strong accent. That statement is even truer for computer-based speech recognizers than for people, and for some applications, we can make use of this fact to improve recognition accuracy dramatically.

In some applications, it may not be possible to require users to identify themselves to the system, or to give the system time to practice listening to them, like when calling an automated telephone attendant. In the warehouse, we can do both. In fact for a small vocabulary recognizer we can make use of a computer’s perfect memory to improve performance dramatically for every user, even those with unusual speech patterns or strong accents. We can store each user’s voice patterns, for every word the recognizer will be required to understand. Although Judy will not say “one” in exactly the same way every time, if the recognizer knows that Judy is speaking, and has access to her personal voice patterns, it will be able to transcribe her speech much more accurately than if it tried to compare her speech to all ways of saying, “one,” or to an average of how most people say it. Such a system is referred to as speaker dependent, meaning that it depends for its accuracy on knowing who is talking to it. A speaker independent system does not make use of that knowledge, and is therefore inherently less accurate.

The process of allowing a speech recognizer to “practice” with a user is called training. Speaker dependent recognizers are therefore sometimes called trained systems, and speaker independent recognizers are referred to as untrained.

For a small vocabulary recognizer the training process generally consists of having the user speak to the system (one or more times) all of the words in the recognizer’s vocabulary. This is sometimes referred to as enrollment training. For a large vocabulary system speaking every word during training is not practical. Speaker dependent large vocabulary systems generally use an adaptation process, in which the user reads known passages of speech to the system, and the system draws conclusions about the user’s speech patterns.

Note that a speaker dependent system must allow for storing users’ voice patterns, and, for a system involving multiple wearable computers, for retrieving them on demand so that any user can log on to any of the computers. Today this is easy to do, so we do not consider it as much of a design tradeoff.

How does a designer choose between creating a speaker dependent and a speaker independent system? A trained recognizer will typically have at least twice the accuracy of an untrained one. The trade-off is between the time the user must take to train the system and the time he or she will gain from the increased accuracy that training brings. For a small vocabulary industrial speech recognizer, this trade-off calculation is very clear. If training the system requires an investment of about fifteen minutes (typical for a warehouse application), and the user will operate the system for perhaps two thousand hours in the course of a year, even a tiny improvement in performance as a result of training will pay for itself very quickly in the form of increased productivity through increased accuracy. And it is generally accepted that a trained recognizer will typically have at least twice the accuracy (half the error rate) of an untrained one.

A major advantage of a trained recognizer for industrial applications is that a trained system does not care about unusual speech patterns, accents, or even language. It is simply comparing the speech patterns it recorded during the training process with the ones it hears during use. A speaker independent system, however, just cannot accept (for example) “uno” as another form of the word “one.” Users must conform to the system’s expectations of their speech patterns. For the workforce on a factory or warehouse floor, where a wide variety of accents and even languages is the norm, this may not be practical.

Casual or Full-Time Users

Another set of design decisions for speech recognizers revolves around casual versus full-time users. For casual users, the speed of data entry is less important than coping with extraneous speech (speech the user wants the recognizer to ignore), while for full-time users the reverse is true.

A speech application designed for casual use may, for example, require the user to start and end each utterance with a specific word (“ready, 1, 2, 3, enter”), and will reject any user speech that is not in exactly the right format. The same application designed for full-time use would expect an utterance of the form “1, 2, 3,” with the trade-off that it might be easier to interpret an extraneous utterance as a sequence of digits. In industrial and warehousing applications, where the user may be entering hundreds of digit strings per hour for ten or more hours per day, the 40% reduction in the amount of speech required to enter the data is vastly more important than a modest improvement in the ability to reject extraneous utterances.

A discrete speech recognizer may also be appropriate for casual users (see the discussion of discrete and continuous recognizers above). The fact that each word must be surrounded by silence allows the recognizer to reject any utterance that consists of multiple words. Again, however, for any full-time user, the inability to enter a sequence of words spoken continuously would be a productivity killer and a source of unbearable frustration.

A speaker independent recognizer may also be more appropriate for casual users, if it is not possible for them to devote the fifteen minutes or so required to train a speaker dependent recognizer. But for both casual and full-time users, since a speaker independent recognizer must correctly detect and process a wide range of accents, dialects, and speech properties, it is naturally more prone to misinterpreting extraneous speech (or other sounds) as words that should be recognized.

An option that can provide very nearly “best of both worlds” performance is to allow the operator to change the mode of the recognizer with simple, intuitive phrases. In Vocollect's case, for example, our recognizer typically operates in full-time user mode, minimizing the amount of user time and speech required to enter data. By simply saying, “Talkman, sleep,” however, the user can put the Talkman into a special casual user mode, in which the only phrase that returns it to normal operation is “Talkman, wake up,” with that phrase preceded and followed by brief silences. While it is “asleep” the recognizer is almost totally immune to any accidental activation from extraneous speech or other outside noise.

Vocollect’s Design Decisions

Vocollect’s products use a continuous, small (and variable) vocabulary, speaker dependent recognizer. Users can speak naturally, without pauses, because the recognizer is continuous. The recognizer is small vocabulary because that offers higher performance, and industrial applications do not require a large vocabulary. The vocabulary is variable, and can therefore be modified for each application (or even by each user). And the recognizer is speaker dependent because the small investment in training a speaker dependent recognizer is paid for many times over by the improved productivity, and the user satisfaction created by the increased accuracy (and extraneous speech rejection) that training the recognizer provides.

Finally, our products and applications are clearly designed and built for full-time, not casual, users. We strongly emphasize productivity and ease of use (reduction in the amount of speech required of the user). While it is possible, for example, to create Talkman dialogues that require specific words to start and end each utterance, we very rarely recommend doing so. At the same time, our speech recognizer uses numerous techniques to reject both nonspeech and extraneous speech sounds with high confidence. And the ability to put the recognizer to sleep (see above) and wake it up with simple phrases provides a simple, intuitive and virtually bulletproof mechanism for permitting sidebar conversations. For the rare occasions when these techniques fail (and for the much more common ones when the user mis-speaks!), a combination of built-in Talkman features and dialogue design techniques permit easy editing of erroneously entered data.

Speech Synthesis

As speech recognizers make human speech intelligible and meaningful to computers, speech synthesis technology permits computers to speak to humans. There are two distinct speech synthesis techniques available to systems designers.

Digitized Speech (also called Record and Playback)

Digitized speech is what we hear when an automated attendant or answering machine speaks to us over the telephone. The computer is essentially acting like a tape recorder. At some time in the past a human spoke into a microphone, and the speech was converted to numbers (digitized) so the computer could store it. On demand, the computer recovers the digitized speech samples and reconstitutes them into sound.

Digitized speech can be of very high quality. However, someone must record, and the computer must store, every word or phrase the computer will have to speak during operation. This may present a storage problem for large applications, and it invariably presents a maintenance concern. If the application is to be modified, is the original speaker still available to record new words and phrases? And if not, will multiple voices be acceptable, or must the entire application be re-recorded? Also, creating high quality voice recordings generally requires both sophisticated equipment and, perhaps more difficult, a professional announcer.

A significant limitation of digitized speech in some applications is that the computer can only speak phrases that have been prerecorded (or that can be created through concatenation). It is therefore functionally impossible to create an application in which the computer speaks, for example, product descriptions, or in which the computer can speak to its operator unpredicted text messages that are sent to it from another machine (e.g., a supervisor typing in a message to be spoken to an operator).

Text-to-Speech

A computer with text-to-speech (TTS) software can convert computer text directly into spoken sounds. TTS removes all the constraints and maintenance headaches of digitized speech, as the computer can speak any text presented to it (e.g., this document) with no prior knowledge, and it is not necessary to have anyone record or maintain speech phrases. A computer using TTS, however, does sound like a computer speaking. It is clearly not human. In some applications this may present a problem. It is not always easy to understand someone with a new accent the first time you hear him or her. But as listeners we humans are extremely adaptive. Give us a few minutes and we can easily decipher even very strong accents. The mild accent of a computer using TTS is very easy to understand, especially for industrial applications, in which users typically hear very similar phrases many times each day.

Vocollect’s Design Decisions

Vocollect offers both text-to-speech and digitized speech in its products. We strongly believe, however, that the advantages of TTS far outweigh the slight loss in speech quality. We therefore recommend that our customers employ the TTS option, and today every one of our many customers does so.

Radio Frequency Local Area Networks (RF LANS)

RF LANs allow portable or wearable computers to communicate wirelessly, at high speed, and in real time with remote information systems. An RF LAN is the wireless equivalent of an Ethernet wired computer network.

RF LANs operate like miniature cellular telephone systems. Throughout a large facility, multiple access points (like cell phone towers) are mounted. As a portable computer equipped with an RF network card moves around the facility, it is automatically handed off from one access point to another, just as a cell phone in a moving automobile is transparently handed off from one cell tower to the next.

For many years there were multiple competing RF LAN technologies, and devices from different manufacturers could not communicate with one another. Today there is a single emerging standard, and one other technology that may be considered for some applications.

802.11b

The new technology standard is known as 802.11b (pronounced “eight oh two dot eleven bee”), which is the number of the Institute of Electrical and Electronics Engineers (IEEE) standards committee that created it.

Radios using the 802.11b standard operate in the 2.4 GHz range, and use the direct sequence spread spectrum technology to achieve high data bandwidth. A single portable device, or access point, has a theoretical maximum bandwidth of 11 megabits per second (Mbps). Since all portable devices communicating with a single access point must share this bandwidth, congestion may occur under some circumstances. Multiple access points can, however, be configured with three noninterfering channels covering a single area. We believe that 802.11b will remain the standard of choice for industrial RF LANs for at least the next three to five years.

Other Standards

Another RF communications standard that is receiving a lot of press is Bluetooth™. Bluetooth radios are designed to offer moderate bandwidth, very short-range communications for personal area (or perhaps home) networks. Bluetooth is in effect a replacement for the much sold but little used infrared links incorporated into many computers and some printers.

In the industrial world, Bluetooth may eventually lead to wireless personal peripherals – for example, a wireless headset communicating with a belt-worn voice computer. But there are two major issues under consideration before such devices become reality. The first is that Bluetooth and 802.11b operate in the same frequency band, and they do interfere with one another. So a wearable computer that incorporates both 802.11b and Bluetooth radios will need to be designed to avoid this interference. An IEEE committee is working on the interference issue, but has not as yet released any recommendations.

The second concern with wireless personal peripherals is that although they have the desirable effect of eliminating wires, they replace those wires with additional batteries. A wireless headset must have a battery. And living with an extra battery may be more difficult overall than living with a wire. So it is far from clear today that wireless personal peripherals, whether driven by Bluetooth or any other technology, will be attractive for industrial applications.

Vocollect’s Design Decisions

Vocollect’s products support any RF network cards that come in the standard PC card form factor, and for which the required software drivers are available. In practice, this means all vendors’ networks. We generally recommend assembling a singlevendor 802.11b solution, with access points and the portable device PC cards coming from the same manufacturer.

With respect to Bluetooth, we have been monitoring the technology for some time, but, for the reasons listed above, do not expect it to appear in our products any time soon.

System Architecture: Thick Client or Thin Client?

The intelligence (data processing) in an application involving users who are remote from the main computer system is always distributed between the user’s computer (the client) and the remote system (the server). The term “thin client” is used to describe a client device that does little data processing – a traditional wired or RF data terminal is the thinnest possible client (usually called a dumb terminal). A thick client does a great deal of processing locally, and uses the remote server primarily as a data storage device.

In an industrial voice system there are two thick or thin decisions designers must make: where the speech processing takes place, and how much operating logic resides in the client.

Speech Processing

All available industrial voice systems perform speech synthesis on the client. For speech recognition, however, the designers have made different choices. In a system using a server-based speech recognizer, the speech signal from the wearable computer’s microphone is pre-processed on the wearable device, and then transmitted over the RF network to a server that performs the speech recognition work for many wearable devices simultaneously. In a system using a client-based speech recognizer, all of that work takes place in the terminal, with no server and no data transmission. In theory the greater computing power available on the server allows it to run more powerful speech recognition algorithms than could the wearable client. In practice, given the great advances over the past few years in the computing power that can be built into a wearable device, and given the constrained nature of the speech recognition problem for industrial systems (see the Speech Recognition discussion earlier in this document), there is no real advantage today in a server based recognizer. There are some significant disadvantages, however.

First is scaling. How many wearable devices can a single speech recognition server support? Even if the server is ten times more powerful than a wearable device could be, that still suggests the need for multiple servers to support the one hundred or more wearable devices often in use in a warehouse. Can the system support using multiple servers? Is it practical to support a large “server farm” in the typical warehouse environment?

Second is processing delay, or latency. Users are very sensitive to delays in response from a speech recognizer. What will happen when multiple wearable terminals send data to the server at the same time? The RF network compounds the latency issue. RF data networks, unlike telephone networks, are not designed to minimize delay times. They are designed to guarantee data delivery. With multiple wearable terminals in a single zone, there are delays in transmitting data (and this problem is compounded because transmitting even pre-processed speech adds dramatically to the amount of data that must be moved over the network). This issue is particularly important for high-speed piece-pick operations. Vocollect used to receive negative reactions from customers with response delays of even about two thirds of a second (and as a result we reduced those delays to less than one quarter of a second). A server-based system simply cannot respond in this kind of time frame. The much-vaunted “subsecond” response of RF networks is about an order of magnitude too slow.

Third is network coverage. The coverage of industrial RF LANs today is generally excellent. In fact, one leading vendor guarantees continuous connectivity. However, temporary issues, such as seasonal movement of products within a warehouse or failure of part of the network backbone (access point, hub, wiring, power, etc.) may create coverage dead spots. A wearable computer using a server-based speech recognizer will be useless under these conditions.

A final concern for a server-based speech recognizer is that the server becomes a single point of failure for the entire system. The server(s) must therefore be treated as a mission-critical device, with full redundancy and complete hot fail-over capability. Such systems are not inexpensive.

Vocollect’s Design Decisions

All Vocollect products perform all speech processing on the wearable device. They are therefore thick client designs. We devote considerable research and development effort to optimizing our extremely sophisticated speech recognition algorithms to run very effectively on our wearable devices. And we do this because we firmly believe that the many benefits of the thick client speech processing architecture make that effort well worthwhile.

Operating Logic

Consider an order selection system for a warehouse. At the thin client extreme, the wearable device must communicate with the server (the warehouse or picking management system) at each step in every pick operation, such as direction to slot, verification of location, determination of pick quantity, and verification. At the thick client extreme the warehouse management system (WMS) could transmit a complete pick list to the wearable device, and (perhaps an hour later) the wearable device would report back that all items had been picked. The design trade-off is between real-time information and control (thin client benefits), and guaranteeing rapid response to the user regardless of RF network performance and server load (thick client benefit).

The considerations for rapid response to the user are similar to those discussed above for server-based and client-based speech recognition. A thin client system requires perfect, not too heavily loaded, RF network coverage, and an always-responsive server. These conditions are not easy to guarantee. A thick client system may require more onetime software work to link the client devices to the WMS, but once the interface is implemented the system can function perfectly from the users’ point of view even if the network and server are far from perfect in their coverage and response times respectively.

With respect to real-time information and control, the primary operating requirement for an order selection system is generally to know immediately that product has been removed from a location (or that a location has become empty). A thin client system accomplishes this automatically. A thick client system can readily do so with a minor modification. When an operator wants to start work, the WMS transmits a complete pick list to the operator’s wearable device. Each time the operator completes an operation, the wearable device transmits the pick data back to the WMS. In a well designed system this data transmission occurs in the background, while the operator continues to work. If there happens to be a dead spot in the RF network coverage the terminal simply batches up the pick data records until it can transmit them.

A secondary reason for wanting real-time information and control in an order selection system is to allow the WMS to modify a pick trip while it is in progress. In theory this would allow the WMS to have the operator bypass a slot if that slot were known to be empty. In practice, Vocollect does not know of a WMS that modifies pick trips on the fly.

It is possible to compromise between the thin client and thick client modes of operation. The WMS might send two or three pick records to the wearable device at the beginning of a pick trip, and then send one more each time the operator reports a pick completion. This design guarantees good response time for the operator (because information for the next pick operation is always on hand in the wearable device), but it does not entirely overcome the RF dead spot issue.

Vocollect’s Design Decisions

Vocollect has opted for a thick client design in all our products. However, the wearable device software can readily be configured to function as a thin client, or even as a dumb terminal. For order selection systems, we recommend to our customers that, whenever possible, they operate our equipment in the modified thick client mode described above: the WMS sends a complete pick trip to the wearable terminal, and the wearable reports back (invisibly to the operator) as pick operations are completed. We believe this mode of operation offers the best trade-off in overall system design, guaranteeing very rapid response to the wearable device operator while providing real-time information to the people and software managing the warehouse.

Summary

There is a variety of complex technology and architecture decisions that any creator of voice-powered systems for industrial applications must consider. Vocollect’s rationale in making these decisions has been to promote those options that our experience tells us offer maximum benefit in real-world environments, while also offering maximum flexibility to meet specific customer needs. We believe our choices have been vindicated by our market leadership position, and by the 100% installation success rate we have achieved in a broad range of applications and operating environments.

Copyright© 2002 Vocollect, Inc. All rights reserved.

Voice System Technologies and Architecture, version 1.2

Published by Vocollect
703 Rodi Road
Pittsburgh, PA 15235
t) 412.829.8145
f) 412.829.0972

Printed in the United States of America January 2002

Talkman® is a registered trademark of Vocollect, Inc.

Microsoft and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The Bluetooth name and the Bluetooth trademarks are owned by Bluetooth SIG, Inc. All other trademarks are the property of their respective owners.

The information in this paper has been carefully checked and is believed to be accurate. However, Vocollect assumes no responsibility for any inaccuracies that may be contained in this guide. In no event will Vocollect be liable for direct, indirect, special, exemplary, incidental, or consequential damages resulting from any defect or omission in this paper, even if advised of the possibility of such damages.

In the interest of product development, Vocollect reserves the right to make improvements in this guide and the products it describes at any time, without notice or obligation.




Printable Version

Name:

Company:

Email:

Please note: All fields required