Speech recording and transcription – Towards increased and smarter deployment

Share This Post

Speech recording and transcription – Towards increased and smarter deployment

Contributed by Private Transcription Solutions (PrivaTrans) (http://www.privatrans.com/), AlluriTech, Inc. (http://www.alluritech.com/). Contributed articles do not necessarily reflect the viewpoint of ÆGIS e-journal.

Wouldn’t it be nice if all of our important thoughts and spoken words were automatically recorded in textual form and stored as both audio and text files on a memory device that we wear on our bodies? And what if this information was well organized, summarized, wirelessly communicated, mapped into tasks, instantly retrievable and acted upon in a timely manner to the great benefit? Sound far off and futuristic? Think again: A revolution in speech recording and transcription (SRT) and speech content management (SCM) is in the making.

Wherever the stakes are high you will find that SRT is taking place. Doctors, lawyers, qualitative researchers, investigators, and journalists all deploy SRT to great advantage. In fact, they would hardly be in business without it. But what about all the other professionals out there in the business world who are not making use of SRT? Don’t they stand to gain some competitive advantage or indeed define entire new competitive market spaces through the use of SRT?

While many of you already working daily with SRT may find that this article constitutes “preaching to the choir”, the hope is that by presenting the main messages herewith in as many channels as possible the vast majority of the population currently not making use of SRT will have gained a strong new calling. Increased use of SRT means both challenge and opportunity for business security professionals. While the government sector is much more up to speed on SRT, the commercial sector, for which this article is primarily intended, is indeed poised for an explosion in usage – and this leaves nobody unaffected.

So, what in simple terms are we proposing? Basically, on an individual basis, we are proposing increased recording of speech throughout each day, transcription of that speech. and organization and usage of the original recordings and the textual permutations in a way which assists the user in more effectively reaching their goals and contributing to the success of their organization.

Transcription is accomplished through the deployment of internal and/or external, outsourced transcription services providers (TSPs), in some cases aided by computer speech recognition (speech-to-text).

Some of the reasons why only a few of the many who could benefit from SRT are doing so are as follows:

• Speech recording technology and its capabilities are not widely known, as the technology is relatively new and not yet specifically marketed on a broad scale.

• The quality of available speech recording technology varies with more visible, widely available, low end products providing only incomplete and discouraging results.

• Speech-to-text technology requires more computing power and higher quality audio hardware and software normally limited to larger, less portable systems and even in the case of best-class hardware and software the speech-to-text results can leave much to be desired.

• Wireless communications with sufficient ubiquity and transfer speed required for compressed digital audio file transfer from anywhere have been lacking until recently.

• The human-intensive work of speech transcription requires human resources (skills and man-hours) which the potential beneficiary perceives that it/he/she cannot afford, whether deployed internally to the organization or outsourced.

• SRT brings with it its share of privacy, confidentiality, legal, information integrity and security issues.

• Potential beneficiaries simply have not become enough aware of the capabilities and benefits of SRT, and have thus not built it into their strategies and integrated it into their operations.

• Potential beneficiaries have lacked awareness of and know-how in the domains of practice known as speech content management (SCM) and speech knowledge management (SKM).

So where do we stand today? The premise of this article is indeed that we stand on the brink of an SRT revolution – driven by advances in technology and compelling examples of high return on investment coupled with recognized need to better manage information and knowledge in order to survive, compete and win.


The competitive and security issues concerning businesses today are largely in the domain of intangible, intellectual property assets. There is an ongoing desire to better “capture” the data and knowledge fleeting through far-from- perfect human memory. Accurate, swift and complete codification of this information, storage of it in more reliable digital systems memory and subsequent organization, sharing and presentation with the assistance of advanced technologies is for many a convincingly worthwhile goal.

Companies are increasing their emphasis on information-intensive operations conducted far beyond the corporate office walls with missions to proactively seek out, discover, capture, analyze and act upon a host of essential and critical facts and insights from the business environment, a process described by some as business intelligence (BI) or competitive intelligence (CI).

A goal of knowledge management (KM) as stated in the early days of its discussion has been to fully explore and exploit the interfacing and transformations between tacit knowledge and explicit knowledge, soft and hard, cerebral and textual, that which is known and that which can be described, shared and acted upon to great advantage. SRT comprises some of the most important operational activities for realization of the objectives of strategic BI/CI and KM domains of practice at the interface where these programs meet.

Other industry terms describing domains of practice relevant to this discussion are Digital Asset Management (DAM) and Digital Content Management (DCM). SRT is a key component of BI/CI, KM, DAM and DCM efforts, and we can hereby define terms focused more on the program, process and management levels from permutations of the terms audio, speech, content, asset, knowledge and management as relevant sub-entities which we expect will be used increasingly in the literature. There are, namely, Audio Content Management (ACM), Speech Content Management (SCM), Audio Asset Management (AAM), Speech Asset Management (SAM), Audio Knowledge Management (AKM) and Speech Knowledge Management (SKM). For the purposes of this article, we will place preference upon the terms Speech Content Management (SCM) and Speech Knowledge Management (SKM).


We can think of SRT users as primary users and secondary users, with primary users those recording the speech to be transcribed (those with microphones) and secondary users those who otherwise use and benefit from the results of the recording and transcription. Primary SRT users can be broadly categorized as strategic thinkers and operational practitioners. Strategic thinkers are typically managers, entrepreneurs, strategy advisors and business leaders who require that their visionary thoughts, ideas, inspirations and insights be recorded and transformed to textual format and then processed to maximum strategic effect Operational practitioners might consist of human source intelligence collection specialists, investigators, reporters, researchers, etc. who need factual and insightful record of the speech transpiring in first hand interaction with a variety of human sources. They might also consist of technicians, industrial workers, sales representatives, real estate brokers, creative professionals and various other knowledge workers who need to record observations, data and ideas while conducting day-to-day operations. Secondary users can be literally anybody inside or outside the organization with whom the organization deems it should be shared, and associated with these secondary users is the non trivial task of building and operating an effective SCM/SKM system complete with encryption, access rights management and audit trail functionality.


Speech content can be described in the following three broad type categories:

• Factual: facts, data, observations (answers to the questions who? what? where? when? why? how?).

• Insightful: insights, ideas, inspirations, visionary thoughts.

• Tasking: tasks (categorized descriptions of action items, things to get done, with delegations and due dates often as a result of facts and insights).

You are at a conference and your ears are met with a barrage of factual information such as contact names, numbers, details of new initiatives and relationships, etc. and without SRT much would go in one ear and out the other, passing through a pitifully overworked, information-overloaded, incapable space, never to be remembered nor acted upon. You are walking along the beach and have a sudden flash of inspiration in the form of a new idea for a new business and SRT assists you in getting it on track for realization. In most cases you conceive tasks for yourself or others to complete in order to act upon the facts and insights you gain while in the field. Most of the better inspirations come not while sitting in front of a desktop computer but rather while out and about doing all the other things people do during both night and day, whether that be driving to work in a car, having lunch, sunbathing, walking on the beach, on a date, swimming, working out in the gym, fishing, meditating, sleeping…. Inspiration comes in a flash – anywhere or anytime – and SRT helps you exploit it to the maximum.

A competitive edge can clearly be gained by recording (original or dictated summaries thereof) and transcribing ideas, tasks, event/scheduling data, business calls, business meetings, negotiations, agreements, interviews with media and analysts, investigative/research interviews, brainstorming sessions, discussion groups, user group meetings, conferences, exhibitions, demonstrations, web casts, lectures, news broadcasts, newspaper, magazine, and newsletter articles, and observed data such as descriptions of persons, license plate numbers, movement of persons and vehicles, distances and directions of travel, locations, departure and arrival times, et cetera.


Whereas historically we have been limited to the options of analog tape recorders and wireless microphones, today we have mobile/wearable computer systems equipped with high quality recording hardware, microphones, software, and wireless connectivity. These mobile systems find different applications depending upon their size, portability, wearability, and other functions. Notebook computers have computing power but lack the portability and wearability demanded in many practical speech capture scenarios. Personal Digital Assistants (PDAs) are coming of age with improved recording capabilities, larger memories and wireless communications capabilities. Wearable computers, designed specifically for hands-free use, have sufficient computing power, storage capacity, input device options and networking/communications options, and represent compelling options for the most demanding in-field SRT tasks.

While in the field, we can either record speech directly to our mobile systems or to a remote server via a wired (LAN, PSTN) or wireless (802.11, RF, GSM, CDMA) communications network. The recorded speech can then be routed to human transcribers or a speech recognition application, or both, for processing.

A voice mailbox system enabling recording of messages over a Public Switched Telephone Network (PSTN) line or mobile wireless telephony connection and automatic routing of messages to transcribers via the internet requires an internet-connected computer, voice capable modem on a PSTN line and affordable software. In this way the user can dial into their own private message box at any time from a mobile telephone and leave messages including processing instructions to be routed as jobs to internal or external TSPs. For situations where keeping no record of the recording on one’s person for matter of security and/or convenience is required and where advanced audio quality levels are not necessary, recording in this manner to a remote system is feasible. In other cases, the preferred solution is to record at the very location where the speech is happening and just use the communications channels to transmit the compressed files as needed. In some situations it may be desirable to capture audio in stereo or even in 16-bit, CD-quality form, and neither the PSTN nor mobile wireless are up to the task.

Wireless microphone-transmitter units communicating to receiver-recorder units over various frequencies still enjoy active usage in the clandestine world, and may be used legally in some instances by business users in conjunction with setups already discussed. Thanks to advances in alternative, commercially standardized technology, this is not the only game in town anymore, which is certainly a relief to many experienced practitioners!

Just a few years ago, before the wide availability of quality digital speech recorders, a standard stereo cassette recorder was one of the best and least expensive solutions. For undercover reporting operations, some standard stereo microphones could easily be disassembled to separate the two microphones and mount one on each shoulder by sewing them into the shoulder pads of a business suit. The noise and echo defeating characteristics of these distance-separated stereo microphones at near mouth and ear level produces results superior to many alternatives. Using standard 90 minute (45 minute per side) or 120 minute (60 minute per side) tapes with long-play (2X) and auto-reverse capabilities some 4 hours of recording time could be accomplished. Powered by two AA-sized alkaline batteries, the unit fits comfortably enough for some in a suit jacket pocket. Some of the biggest problems with this legacy system are:

• Individual recordings cannot be immediately produced as individual digital files, labeled, annotated, and categorized.

• Dictated notes cannot be inserted between recordings.

• Digitization requires interfacing to a separate computer and is time consuming.

• The digitized audio files then need to be broken into smaller files corresponding to particular interviews and dictated notes, labeled, categorized, and organized, which is quite human resource intensive.

• Recordings cannot be immediately transmitted from in the field via wireless communications networks to TSPs for processing.

• The device is several times larger than digital devices.

• The device generates more noise than digital devices, and an end-of-tape

“click” can raise eyebrows!

• Record time for each tape is more limited than with digital devices.

• Batteries can and do run out of juice, albeit not before recording a lot of material at increasingly slow speeds.

Today, there are digital speech recording devices on the market with vastly improved features. For example, one device, that sells for about $260, can fit very comfortably (4 x 1 x 0.5 inches) in your pocket and record 64 hours of quality speech on a 128 MB memory chip. Standard 2.5 mm female connectors enable connectivity of small pen microphones, boom microphones, and other professional microphones that can pick up audio from hundreds of feet away. Super-sensitive voice activation functions mean recording starts within 1/10th of a second after speech is detected. Alarm circuitry enables presetting record start and stop times. PSTN and mobile telephone accessories enable the recording of phone conversations. The device uses standard, re-usable MMC memory cards and interfaces via USB to mobile or desktop computers where special software facilitates the management of the speech content.

Another producer offers a full range of digital speech recorders and hybrid products which feature various combinations of speech recorder, stereo MP3 playback, FM tuner, and digital camera in small form factors, powered by two AAA batteries. One such device of dimensions 4.80 x 1.22 x 0.86 inches, weight 2.18 ounces, can capture 36 hours of speech and 1000 640×480 pixel images (faces, business cards, documents…).

There are other devices on the market which also can fit in a pocket, although not quite as comfortably, and feature standard 20 GB or 30 GB notebook hard disk drives with minimal accompanying electronics, enabling high quality stereo recording for many more hours than you might practically need. Some devices in this class are specifically designed to record not only audio but broadcast quality color video.

And then there are PDAs, which are rapidly becoming for many the speech recording and hybrid function device of choice.

In somewhat larger form factors are wearable computers, the likes of those from Xybernaut (http://www.xybernaut.com/), Charmed Technology (http://www.charmed.com/) and others, which feature standard 10-30+ GB notebook hard disk drives, powerful processors, plenty of RAM, standard interfaces such as USB, serial and PCMCIA, sound hardware and run standard operating systems such as Windows and Linux. These systems can be worn on a special belt, in a backpack or strapped in a special harness elsewhere on the body. A high quality headset such as the Plantronics (http://www.plantronics.com/) DSP-500 Headset featuring highly sensitive and noise canceling microphone and stereo headphone speakers, can be used along with specialized recording/dictation software to produce high-quality speech recordings while at a desktop or in the field. There are other, more discrete, earphones and microphones, including some that can record inaudible speech from movement of the facial and head structures.

Due to the adequate power of these systems, computer speech-to-text can also be accomplished in the field. Wearable computers systems distinguish themselves from PDAs, notebooks, tablets, and other systems in that they are worn on the body leaving the user with hands free to accomplish normal everyday tasks, and, with head-mounted displays (HMDs) and other interfaces, the computer is on at all times and in constant communication with the user. HMDs, including some very discrete displays consisting of a square centimeter built into otherwise normal eyeglasses, can be used for SRT status and other communications. Equipped with top-class wireless communications, these systems provide a complete solution for the most advanced SRT users. These systems will continue to become both smaller and more powerful with time making them the systems of preference for the most demanding SRT applications.

Communications: GSM and CDMA wireless

Using wearable, tablet, or notebook computers equipped with PCMCIA cards such as the Sierra Wireless Aircard, GSM, GPRS, or CDMA 1x versions, one can accomplish transfer speeds today of up to 40 Kbps or 156 Kbps, respectively, over standard mobile wireless networks offered by the likes of (in the US) AT&T Wireless, Cingular, and T-Mobile (GSM), or Verizon and Sprint (CDMA). Some PDAs are also being equipped now with small form factor GPRS modems. Note: General Packet Radio Service (GPRS) is a 2.5G technology offered by Groupe Speciale Mobile (GSM) operators (worldwide) and 1x is a 2.5G technology offered by CDMA operators (largely in the US). The forthcoming 3G technologies WCDMA (Europe and elsewhere) and CDMA-2000 (USA) will provide improvements in transfer speed, ranging from 144 Kbps to 2 Mbps (depending on speed of mobility of user ranging from over 120 km/hour to under 10 km/hour).

At present, the CDMA network wins out over the GSM network in the USA for transmission of compressed audio files over wireless internet with CDMA offering significantly faster transfer speeds (up to 156 Kbps) and unlimited data plans (at the time of this writing, unlimited data transfer plans are offered by Verizon at $99 per month and Sprint at $80 per month), though T-Mobile offers stand-alone unlimited data transfer plans at $29.99.

Communications: 802.11 and Bluetooth wireless

Systems can also be equipped with 802.11 “WiFi” PCMCIA cards or integrated devices for wireless transfer up to 56 Mbps (in case of 802.11g). So whether at the office, at home, at an industrial work site or at one of the mushrooming wireless hotspots, 802.11 is now becoming the preferred way to rapidly upload and download compressed speech files. Bluetooth enables device-to-device communication over shorter distances.


One of the most promising technological developments is Voice-over-IP (VoIP) for wireless networks (note: IP = internet protocol). One player of note, TeleSym (http://www.telesym.com/), of Bellevue, Washington, USA, is developing next-generation technology for wireless IP telephony. The company claims that its SymPhone software delivers high-quality, cost- effective voice communications over wireless enterprise networks and has a publicly stated goal to become “the worldwide leader in advanced software solutions for wireless voice-over-IP.” TeleSym, with investment of the Intel Communications Fund over $150 million, claims an extensive portfolio of intellectual property (patents pending) and software assets (audio compression algorithms, latency management software, VoIP software ‘engines’, protocol stacks, et cetera) that underlie its VoIP solutions, and which are aimed at licensing for integration in software and hardware products.

And you can hardly speak of VoIP without mentioning Cisco (http://www.cisco.com/), the world leader in internetworking products, which announced in April of 2003 that it would start shipping its Cisco Wireless IP Phone 7920 in June of 2003. Also, in April of 2003, rival SpectraLink (http://www.spectralink.com/) announced its NetLink Wireless Telephone portfolio with prices starting at $400.

Basically, wireless VoIP will turn wearable computers, PDAs, notebooks, and desktops into wireless super-telephones. Technological advances are expected to overcome quality of service problems known to-date with IP: Jitter, delays, echo, and the like. What all this wireless VoIP talk really boils down to is that we will have cost-effective, if not totally free, feature-rich alternatives to the wireline telephone companies of the past (and that, with no stretch of the imagination, is reason for the majority of us all to grandly celebrate), and the ability to manage speech content with vastly expanded degrees of freedom.

Wireless video over IP, wireless IP cameras, GPS, and other technologies

At the risk of getting off the topic, it is perhaps worthy of mention that wireless video and voice over IP opens up entire new fields of applications. One company, TABLETMedia (http://www.tabletmedia.com/), is now shipping its iFON product, which turns a PDA into a wireless video and audio device. Air Broadband Communications (http://www.airbb.com/) just announced on May 1st 2003 that it will integrate its Soft-Roaming Wireless Access technology with the iFON, which will enable seamless roaming for voice and data over 802.11 wireless networks. Basically, users will be able to maintain secure voice connections as they roam across various subnets and access points, opening the doors to applications such as video conferencing, video surveillance monitoring and control, distance learning and remote expert guidance. Add to the picture 802.11 wireless web cameras (DLINK has one now with Axis, Canon and other heavy duty web camera producers planning to soon introduce theirs too), increasingly miniaturized and wearable digital video cameras, and global positioning system (GPS) technology and there is indeed reason for great excitement, and need for well thought out integration and management.

Server side hardware and software

Our overview of enabling technology would not be complete without mention of server-side hardware and software including automatic speech recognition, speech-to-text, text-to-speech, VoiceXML, automatic call distribution, storage systems, search and retrieval solutions, 128-bit encryption, etc. but this is all largely beyond the scope of this introductory article. We will mention, however that there has been vast improvement in automated speech-to-text but highest accuracy also demands a higher price for speech-to-text server implementations and that specialized acoustic models need to be implemented to handle multiple input sources such as digital voice recorder, PDA, wearable/notebook/desktop, PSTN phone, and mobile phone, as well as the different acoustics of various locations (at party, in conference room, in car, outdoors, etc.).


And how might the recorded speech be transcribed and otherwise processed? Although tape recordings may still be transcribed by some people using a variable speed tape playback device, the preference of modern transcriptionists is a digital audio file on a computer equipped with transcription software, speakers/headphones and input devices such as foot pedals, mice, and, of course, a keyboard. Basically, there will be some specially developed recording/dictation software which can accomplish the basic functions record, stop, rewind, insert, overwrite, pause, fast forward and end recording. The software will support a wide range of encoding and compression formats and encryption for desired recording quality, smaller file size and secure transmission. It will also accomplish transfer of the recorded files via a wired LAN, wireless network, internet, e-mail, or internet FTP upload. The functions of the dictation process can be activated by mouse actions, function keys, foot pedal movements, mechanical buttons, voice commands, or even rapid eyebrow movement.

Ah, but if only it were all so simple! Process needs to be well thought-out in the recording/dictation and job uploading stage if the overall result is to be efficient and advantageous. Written or recorded data can be attached to digital speech files to facilitate proper routing and action. The recorded speech files routed to transcriptionists might include specifications of important parameters including required turnaround, accuracy, routing, privacy measures, and tasking and event/scheduling information. In addition, they may be categorized as ideas, tasks, event/scheduling, interviews/meetings, interview/meeting summaries, calls, call summaries, observations/data, et cetera, and these with further coding for various businesses and projects with which the primary user may be involved.

Some special file formats allow text annotations to attach to and accompany the audio file throughout transmission to facilitate processing. In cases where such information cannot be attached, such as recording over the PSTN to a recording/dictation server, this information may be provided through interaction with a server-based menu system, or by including it all in audio form at the start of the audio file, to be extracted by transcriptionists as a first step in processing prior to transcription of the actual messages.


As a rule, in the absence of advanced technologies and many hours of experience in the transcription business, you can expect to spend some 10 hours transcribing for every one hour of recorded speech. A well trained transcriptionist may insert descriptions of soft insights and context from the audio recording that would otherwise be lost if only the speech was transcribed; for example, “sure” said in an agreeable tone as opposed to a disbelieving tone. Transcriptions will also often feature index numbers which map back into the audio recording so that particular portions of the work can be rapidly located and listened to in the original audio form. Inaudible and otherwise unintelligable words must also be flagged for review by other transcription specialists and perhaps the client who can apply contextual knowledge to better determine elusive words.

The products of the transcription process can be classified into the following four broad type categories:

• Human transcription (highly effective, quality depends on quality of recorded speech and skill of transcriber).

• Computer transcription/speech-to-text (fast but error-prone).

• Human transcription aided by computer transcription/speech-to-text (can speed transcription time).

• Human transcription summary (highly effective, quality depends on skill of transcriber/summarizer, sometimes delivered alone, other times with a complete word-for-word transcript depending on client requirements)


• A corporate marketing specialist is at a conference and records presentations and question/answer sessions, transmitting concise, hand- typed, time-sensitive alerts to corporate users via email over the mobile wireless network. Summaries of conversations conducted during breaks and other networking opportunities are recorded and the compressed, digital audio files are transmitted via 156 Kbps mobile wireless CDMA

Internet connection to internal corporate transcription specialists for immediate processing. Computer generated transcriptions of the conversation summaries are immediately delivered with the human- transcribed summaries following within six hours to multiple need-to- know recipients as specified previously in the job specification. Upon returning the next day to a local office with a broadband internet connection, the audio files containing the recorded speech of the conference presentations are uploaded to internal corporate transcription specialists for turnaround within 48 hours. The summaries and transcripts of the conference papers are stored along with PDF files of the original published papers within the corporate digital content management system along with question/answer session transcripts and conversation summaries, all with appropriate access rights in place.

• An investigative reporter for a niche electronic media company spends over 6 days and nights at a major trade show interviewing 46 sources on the floor during exhibition hours, attending 6 press conferences and participating in 5 parties and informal networking events, generating 20 hours of digital audio recordings in 98 separate files, 3.5 hours of video recordings and 167 digital still images, plus a collection of hand-written notes. Compressed audio files and digital images (including images of handwritten notes) are uploaded at a local 802.11g hotspot via broadband internet connection to headquarters where they are routed to internal transcription, editing and production specialists for immediate processing. Stories are written, reviewed, corrected, authorized for publication and published electronically within hours of discovery as opposed to days and weeks typical of print media operations.

• An executive is walking on the beach at sunset on a Sunday evening when sudden inspiration comes in the form of new ideas to boost sales performance through the building of a synergistic relationship with a key industry player and launching of a co-branding campaign. He records the ideas to a digital speech recorder and upon return to his car uploads the files via the CDMA 1x network to a dedicated outsourced TSP which works through the night to have a complete transcript and summary of the notes in his assistant’s email inbox by 7:00 on Monday morning. The assistant updates the executive’s and sub-ordinates’ delegated tasks and meeting appointments in Microsoft Outlook by 7:30 and all are present for a strategic brainstorming meeting scheduled for 8:00.

• A corporate salesman visiting the sites of several corporate prospects per day over a period of three days to give presentations and influence sales while also interacting via cell phone with several other prospects and customers uses the time between meetings while traveling by rental car from one location to the next to summarize important facts, record insights and formulate tasks. The recorded speech files are transmitted via a mobile wireless network to headquarters where SCM specialists transcribe the messages, route results to relevant parties and update tasking and calendaring/scheduling sections in Microsoft Outlook. Some of the dictated notes are sent as alerts and reports to the Competitive Intelligence unit. The salesman accesses his Microsoft Outlook via a wireless virtual private network (VPN) connection and keeps up to date on all tasks and meetings.


To outsource or not to outsource? That is the question. Outsourcing work to transcription services providers (TSPs) can convincingly offer a lower cost solution, holding constant quality of service and speed of turnaround. The key issue is privacy. Encryption should be deployed to ensure the privacy of recorded speech files in transfer. In the healthcare industry, uploaded files are encrypted as required by the Health Insurance Portability and Accountability Act of 1996 (HIPAA) because, clearly, patient data must be kept private. 128+ bit encryption, detailed audit trail and other security features are standard with professional solutions but this is far from the entire picture. The largest threat to privacy of organizations’ recorded and transcribed speech is from human beings who come in contact with it anywhere along the way from recording to dissemination of the final products. These potential threats must be assessed both in the form of human beings working for outsourced TSPs and those TSPs, other SCM workers and any other persons internal to the organization who come in contact with the speech content. Some TSPs deploy independent contractors, others full- time employees. Some limit their workers to the home country, others exploit talent living abroad. Some professional outsourced TSPs claim special know how in managing highly efficient, self-motivated transcription specialists who typically work from home or any location of their choosing.

Professional work that can be performed anywhere where the workers have a computer and internet access is increasingly in high demand by a special breed of workers. These workers do not thrive awaking to alarm clocks each day, commuting long distances in traffic, punching time clocks, being subjected to cramped working quarters, corporate politics, unhealthy food, foul air, keystroke monitoring and constant video surveillance. This breed of professionals disdain the fixed working hours of most normal jobs which would prevent them from running their own businesses, subcontracting, doing part-time work, etc. for supplementary income and would put a limit on their earnings. They prefer to work from the comfort of their own homes, at the beach, on a boat, in a cafe, on a mountaintop – anywhere in the world where they wish to roam and at the time of their choosing. Some may be housewives and others may be cultural tourists exploring the third world. Some TSPs strive to work with team members located in regions within a country where the cost of living is lower or, especially as with the case of the English language, in working with team members who are living in other countries where the cost of living is considerably less. So, TSPs are keenly in tune with their team members’ motivation and in return for the unique benefits of this type of work they perform to high quality standards at performance-based rates which allow the TSPs to profit, invest in technical infrastructure and marketing, and expand their businesses.

There are some very large TSPs that are focused on the healthcare market. Some examples include:

• MedQuist (NASDAQ: MEDQ): 70% owned by Philips Electronics (Netherlands). Claims to deploy some 10,000 transcriptionists and serve 3,000 hospitals, physician groups, and other health care organizations nationwide. 2002 revenues of $486.2 million.

• Transcend Services (NASDAQ: TRCR): Utilizes Internet-based technology to convert doctors’ digital voice recordings into high quality electronic medical record documents. Claims to accept digital files from 30,000 registered physicians into its new $1.5M+ Atlanta-based hub for workload balancing and distribution and from there distributes the files, based on priority and difficulty, to a network of 250+ highly skilled, company employed, U.S. based medical language specialists. 2002 revenues of $12.2 million.

• Total eMed: Private company founded in 1998 and based in Franklin, TN, USA Leading provider of outsourced electronic medical transcription and one of the first to utilize a completely web-based system connecting physicians to experienced medical transcriptionists on a highly secure VPN. Total eMed uses integrated voice, text and data to connect physicians and medical transcriptionists. Employs over 500 staff to support its client base. Teamed with IDX Systems Corporation (NASDAQ: IDXC), an information technology systems integrator focused on the medical industry with 2002 revenues of some $460.1 million.

There are scores of small TSPs focused on the legal marketplace, which can be found by tapping into scores of PSTN and Internet listings. SRT is, of course, a well-established practice in the media and public events industry.

Some TSPs provide consulting and integration services in addition to transcription services, assisting in the strategic development of new capabilities, recommending technology products, providing training and delivering a range of other advanced services. In the role of Speech Content Management (SCM)/Speech Knowledge Management (SKM) consultant and systems integrator, some of these TSPs may, according to client needs assessment, assist a company in developing a complete SCM/SKM program, increasing deployment of SRT in the organization and establishing an internal TSP function for all or some portion of the transcription work. The costs for SCM program infrastructure can range from very low to very high depending on lots of factors including the size of the organization and depth of functionality.

Business sectors outside of the medical, legal and media sectors, represent especially large, untapped opportunities for innovative new TSPs and other SCM/SKM specialists.

We are rapidly entering an age of ubiquitous, wearable computing and communications with human speech playing an integral role. It is clear that one of the most challenging aspects is the privacy and confidentiality of information. Some TSPs offer elaborate systems deploying innovative policies, procedures, technologies and controls to ensure maximal privacy and confidentiality of clients’ content, taking great care to optimally match transcription personnel to client content with privacy always in the forefront. In many cases, potential clients will need to be convinced that the TSP personnel who work with client content do not have client identification information and TSP administrative personnel do not have access to client content in order for the TSP to win their business.

In light of milestone developments in technology, advances in thinking about information management and examples of early adopter success, companies in scores of industries which have to-date largely ignored SRT will significantly increase deployment of SRT, and put serious resources behind the development of cutting edge SCM/SKM programs, thereby gaining advantage in the ever intensifying race for competitive edge.

More To Explore