US20040138894A1 - Speech transcription tool for efficient speech transcription - Google Patents

Speech transcription tool for efficient speech transcription Download PDF

Info

Publication number
US20040138894A1
US20040138894A1 US10/685,445 US68544503A US2004138894A1 US 20040138894 A1 US20040138894 A1 US 20040138894A1 US 68544503 A US68544503 A US 68544503A US 2004138894 A1 US2004138894 A1 US 2004138894A1
Authority
US
United States
Prior art keywords
transcription
text
user
annotation information
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/685,445
Inventor
Daniel Kiecza
Francis Kubala
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Raytheon BBN Technologies Corp
Original Assignee
BBNT Solutions LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BBNT Solutions LLC filed Critical BBNT Solutions LLC
Priority to US10/685,445 priority Critical patent/US20040138894A1/en
Assigned to BBNT SOLUTIONS LLC reassignment BBNT SOLUTIONS LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KUBALA, FRANCIS, KIECZA, DANIEL
Publication of US20040138894A1 publication Critical patent/US20040138894A1/en
Assigned to BBNT SOLUTIONS LLC reassignment BBNT SOLUTIONS LLC CORRECTED COVER SHEET TO CORRECT ASSIGNEE ADDRESS, PREVIOUSLY RECORDED AT REEL/FRAME 014608/0723 (ASSIGNMENT OF ASSIGNOR'S INTEREST) Assignors: KUBALA, FRANCIS G., KIECZA, DANIEL
Assigned to BBN TECHNOLOGIES CORP. reassignment BBN TECHNOLOGIES CORP. MERGER (SEE DOCUMENT FOR DETAILS). Assignors: BBNT SOLUTIONS LLC
Assigned to BBN TECHNOLOGIES CORP. (AS SUCCESSOR BY MERGER TO BBNT SOLUTIONS LLC) reassignment BBN TECHNOLOGIES CORP. (AS SUCCESSOR BY MERGER TO BBNT SOLUTIONS LLC) RELEASE OF SECURITY INTEREST Assignors: BANK OF AMERICA, N.A. (SUCCESSOR BY MERGER TO FLEET NATIONAL BANK)
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems

Definitions

  • the present invention relates generally to speech processing and, more particularly, to the transcription of speech.
  • Speech has not traditionally been valued as an archival information source. As effective as the spoken word is for communicating, archiving spoken segments in a useful and easily retrievable manner has long been a difficult proposition. Although the act of recording audio is not difficult, automatically transcribing and indexing speech in an intelligent and useful manner can be difficult.
  • Automatic transcription systems are generally based on a language model.
  • the language model is trained on a speech signal and on a corresponding transcription of the speech.
  • the model will “learn” how the speech signal corresponds to the transcription.
  • the training transcriptions of the speech are derived through a manual transcription process in which a user listens to the training audio and types in the text corresponding to the audio.
  • Systems and methods consistent with the principles of this invention provide a transcription tool that allows a user to efficiently transcribe segments of speech to generate a structured and annotated transcription.
  • the speech transcription tool includes control logic, an input device, and a graphical user interface.
  • the control logic is configured to play back portions of an audio stream and the input device receives text from a user defining a transcription of the portions of the audio stream and receives annotation information from the user further defining the text.
  • the graphical user interface includes a first section that displays a graphical representation of a waveform corresponding to the audio stream and a second section that displays the text and representations of the annotation information for the text.
  • a second aspect consistent with the invention is directed to a method that comprises receiving an audio stream containing speech data, receiving text from a user defining a transcription of the speech data, receiving annotation information from the user further defining the text, displaying the text, and displaying symbolic representations of the annotation information with the text.
  • a third aspect consistent with the invention is directed to a computing device for transcribing an audio file that includes speech.
  • the computing device includes an audio output device, a processor, and a computer memory.
  • the computer memory is coupled to the processor and contains programming instructions that when executed by the processor cause the processor to play a current segment of the audio file through the audio output device, receive transcription information for the speech segments played through the audio output device, receive annotation information relating to the transcription information, and display the transcription information in an output section of a graphical user interface. Additionally, the processor displays the annotation information as graphical icons in the output section of the graphical user interface.
  • FIG. 1 is a diagram illustrating an exemplary system in which concepts consistent with the invention may be implemented
  • FIG. 2 is a block diagram of a transcription tool consistent with the present invention
  • FIG. 3 is an exemplary diagram of an interface that may be presented to the user of the transcription tool shown in FIG. 2;
  • FIG. 4 is a flow chart illustrating exemplary operation of the transcription tool shown in FIG. 2;
  • FIG. 5 is a diagram illustrating user selection of a speaker turn
  • FIG. 6 is an exemplary diagram of an interface including a pop-up box for further defining annotations.
  • a speech transcription tool assists a user in transcribing speech.
  • the speech transcription tool allows the user to transcribe and annotate speech in intuitive ways.
  • the transcription tool presents an integrated view to the user that includes a view of the audio waveform, a view of the text input by the user, and a view of the structured version of the transcribed text.
  • the view of the text input by the user may include graphical icons that represent annotation information that relates to the transcribed text.
  • FIG. 1 is a diagram illustrating an exemplary system 100 in which concepts consistent with the invention may be implemented.
  • System 100 includes a computing device 101 that has a computer-readable medium, such as a random access memory 109 , coupled to a processor 108 .
  • Computing device 101 may also include a number of additional external or internal devices.
  • An external input device 120 and an external output device 121 are shown in FIG. 1.
  • the input device 120 may include, without limitation, a mouse, a CD-ROM, or a keyboard.
  • the output device may include, without limitation, a display or an audio output device, such as a speaker.
  • a keyboard in particular, may be used by the user of system 100 when transcribing a speech segment that is played back from an output device, such as a speaker.
  • a foot pedal may be used for audio playback control.
  • computing device 101 may be any type of computing platform, and may be connected to a network 102 .
  • Computing device 101 is exemplary only. Concepts consistent with the present invention can be implemented on any computing device, whether or not connected to a network.
  • Processor 108 executes program instructions stored in memory 109 .
  • Processor 108 can include any of a number of well-known computer processors, such as processors from Intel Corporation, of Santa Clara, Calif.
  • Memory 109 contains an application program.
  • the application program may implement a transcription tool 115 described below.
  • Transcription tool 115 plays audio segments to a user. The user transcribes speech in the audio and enters annotation information that further describes the transcription into transcription tool 115 .
  • FIG. 2 is a block diagram illustrating software elements of transcription tool 115 .
  • Users of transcription tool 115 i.e., transcribers
  • GUI graphical user interface
  • Control logic 202 coordinates the operation of graphical user interface 204 and user input component 203 to perform transcription in a manner consistent with the present invention.
  • Control logic 202 may additionally handle the playback of the input audio to the user.
  • User input component 203 processes information received from the user.
  • a user may input information through a number of different hardware input devices.
  • a keyboard for example, is an input device that the user may use in entering text corresponding to speech.
  • Other devices such as a foot pedal or a mouse, may be used to control the operation of transcription tool 115 .
  • FIG. 3 is an exemplary diagram of an interface 300 that may be presented to the user via graphical user interface 204 .
  • Interface 300 includes waveform section 301 , transcription section 302 , and structured representation section 303 .
  • Interface 300 may include selectable menu options 304 and window control buttons 305 .
  • menu options 304 a user may initiate functions of transcription tool 115 , such as opening an audio file for transcription, saving a transcription, and setting program options.
  • Waveform section 301 graphically illustrates the time-domain waveform of the audio stream that is being processed.
  • the exemplary waveform shown in FIG. 3, waveform 310 includes a number of quiet segments 311 and audible segments 312 .
  • Audible segments 312 may include, for example, speech, music, other sounds, or combinations thereof.
  • transcription tool 115 may play the audio signal to the user.
  • Transcription tool 115 may visually mark the portion of waveform 310 that is currently being played. For example, as shown in FIG. 3, an arrow 316 may point to the current playback position in audio waveform 310 . The user may move the arrow using a mouse or keyboard commands to quickly adjust the current playback position.
  • Sections of waveform 310 may be labeled as corresponding to different segments of an audio stream.
  • the segments may be defined hierarchically. In one implementation consistent with the invention, these different segments may include “turns,” “sections,” and “episodes.” Additional segments, such as a “gap” segment that defines a period of non-speech such as silence, music, noise, etc., may also be used.
  • a turn may refer to a section of the audio in which a single speaker is speaking (i.e., a “speaker turn”).
  • a section may refer to a number of speaker turns that relate to a particular topic.
  • An episode may refer to a group of sections that each have something in common, such as being of the same broadcast.
  • the turn, section, and episode segments for an audio stream may be illustrated in waveform section 310 using graphical markers.
  • these three segments, as well as the gap segment, are illustrated as turns 320 , sections 321 , episodes 322 , and gaps 323 .
  • One type of audio stream that can be confidently divided into turns, sections, and episodes is a news broadcast.
  • the whole news broadcast (e.g., a 30 minute broadcast) may correspond to a single episode.
  • An episode may include multiple sections that each corresponds to a different news story. Each section may have one or more speaker turns.
  • Transcription section 302 displays transcribed text received by control logic 202 from user input component 203 .
  • the text may be represented in Unicode so that the transcription tool can handle left-to-right, right-to-left, and bi-directional scripts, such as English, Chinese, and Arabic.
  • the text will be typed by the user as the user listens to the audio waveform 310 .
  • the user may input additional information relating to the text. This additional information is received by control logic 202 and stored as annotation information for the text.
  • Annotations may include, for example, an indication that a certain noun corresponds to a person's name or to a location.
  • the annotation information may be displayed in transcription section 302 .
  • Annotations 313 and 314 are shown in FIG. 3 that each defines a word or series of words. More particularly, annotations 313 define names of persons and annotations 314 define location names. Annotations may additionally be nested. For example, in the phase “CNN News,” “CNN” may be annotated as “spelled” and the complete phrase “CNN News” may be a “name” annotation.
  • Structured representation section 303 displays the transcribed text in a hierarchical tree structure.
  • the hierarchical structure may be based on the relationships of segments 320 - 322 .
  • an episode entry 330 e.g., a folder icon
  • An episode may include one or more section entries 331 , which may include one or more turn and/or gap entries 332 .
  • the turn entries are at the base level in the hierarchy and contain the actual transcription text.
  • One of turn entries 332 is highlighted in FIG. 3, which indicates that this is the currently active turn.
  • Turn entry 332 in addition to the transcribed text, may include annotation information that was input by the user, such as the name of the speaker (“Riaz Ahmad Khan”) and the sex of the speaker (“male”). The sex of the speaker may alternatively be determined automatically based on acoustic processing techniques applied to the speaker turn.
  • Section entries 331 may include a general description of the topic(s) discussed in the turns corresponding to a section. The topic description may be determined automatically based on the speaker turn transcriptions.
  • transcription tool 115 may save the transcription as an output file.
  • the output file may be based on the information in structured representation section 303 . That is, the output file may include the transcribed text as well as meta-data that encapsulates the annotation information, including indications of the hierarchical segments.
  • the output file may be an extensible markup language (XML) document.
  • FIG. 4 is a flow chart illustrating exemplary operation of transcription tool 115 consistent with an aspect of the invention.
  • the user may first load an audio waveform into transcription tool 115 . This may be accomplished through the “file” menu.
  • FIG. 5 is a diagram illustrating a waveform in which the user has highlighted a portion 501 (shown as a simple rectangle in FIG. 5) that corresponds to a speaker turn.
  • the highlighted portion 501 may include buffer areas 502 that the user aligns to the edge of the speaker turn.
  • Transcription tool may 115 adjust the graphical marker 520 that defines the speaker turn as the user varies the highlighted portion with the mouse.
  • the user may press a predefined key combination, such as CTRL-T, that causes control logic 202 to store the speaker turn.
  • a predefined key combination such as CTRL-T
  • Other user actions such as a mouse click, instead of a keyboard combination, may be used to inform control logic 202 of a speaker turn.
  • the user may load a saved version of the waveform in which segments have already been defined. In this situation, the user may not have to re-define the segments.
  • control logic 202 may automatically define sections and/or episodes based on the transcribed context of the speaker turns.
  • Control logic 202 may, for example, determine that speaker turns are similar based on the text of the speaker turns. Speaker turns discussing the same topic will tend to use similar words and may, thus, be compared for similarity based on the frequency of occurrence of words in the speaker turn.
  • control logic 202 may use automated speech and language processing functions to initially classify the audio based on an audio type, such as speech, music, or silence.
  • an audio type such as speech, music, or silence.
  • One such technique for automatically identifying segments in an audio stream, such as speaker turns, appropriate for transcription are discussed in the application cited in the “Related Application” section of this document. Portions of the audio that contain only music may be noted on interface 300 so that the user does not need to bother listening to these portions.
  • transcription tool 115 may update structured representation section 303 to indicate the defined segments. The user may listen to audio before creating segments to determine where turn boundaries should be.
  • the user may begin playback of a particular one of the segments, such as a speaker turn (Act 402 ).
  • the user may control which of the speaker turns is the active speaker turn via mouse or keyboard commands.
  • a user may point to a particular speaker turn 320 to select the corresponding section of waveform 310 for playback.
  • the user may adjust the current playback position using predefined keyboard commands.
  • the key combination CTRL- ⁇ may cause control logic 202 to select the next speaker turn as the active speaker turn
  • the key combination CTRL- ⁇ may cause control logic 202 to select the previous speaker turn as the active speaker turn
  • the key combinations SHIFT-CTRL- ⁇ / ⁇ may move the current active location, as indicated by arrow 316 , to the left/right in predetermined increments (e.g., 0.8 second increments).
  • transcription tool 115 While transcription tool 115 is playing back audio, the user may transcribe speech in the audio by entering (e.g., typing) the text into user input device 203 (Act 403 ).
  • Control logic 202 displays the text in transcription section 302 and may simultaneously update structured representation section 303 .
  • the user may enter annotation information for a particular word or sequence of words (Acts 404 and 405 ).
  • the annotation information is entered by a user through keyboard shortcuts.
  • the user may input a key combination such as CTRL-N.
  • This key combination informs control logic 202 that the succeeding text corresponds to a name.
  • pressing CTRL-N may bring up a selection box that allows the user to further define the name that is to be annotated, such as the name of a person or the name of a location.
  • FIG. 6 is an exemplary diagram of an interface 600 for transcription tool 115 that includes a pop-up box 601 that allows the user to further define name annotations.
  • Control logic 202 may display pop-up box 601 in response to the keyboard combination (e.g., CTRL-N) for a name object.
  • control logic 202 may generate an appropriate name icon surrounding text typed by the user, such as name icons 313 or 314 (FIG. 3).
  • name icons 313 or 314 FIG. 3
  • control logic 202 may again press CTRL-N to turn off name annotation and revert back to normal text transcription.
  • annotations in addition to name annotations, may be entered by the user.
  • the user may mark an unintelligible section of speech with a “skip” marker that is toggled on/off via the key combination CTRL-K.
  • transcription tool 115 may output the transcription entered by the user to a file (Act 407 ).
  • the user selects the output file to write to using the “File” menu on interface 300 .
  • the output file may be an XML document that includes the information in structured representation section 303 .
  • the output file may include the transcribed text, the annotation information, the segmentation information, and other information, such as time codes that correlate the transcription with the original audio.
  • the output file in addition to being an XML document, may be generated using Unicode to represent the characters.
  • the Unicode standard is a character encoding standard that represents characters using a unique number for every character, regardless of the computing platform or language.
  • the Unicode standard is maintained by the Unicode consortium.
  • control logic 202 may allow users to highlight text in transcription section 302 and then select the annotation information to apply to the highlighted text.
  • transcription tool 115 may read a configuration file.
  • the configuration file may define functionality for a number of operational aspects of the transcription tool.
  • the configuration file may define the names and the relationships (e.g., hierarchy) between the segments, the possible annotation information, and the keyboard shortcuts that are used to enter the annotation information.
  • transcription tool 115 can be customized for a particular transcription task.
  • the structure of the configuration file is defined through an XML schema definition.
  • the transcription tool described herein allows users to efficiently create rich transcriptions of an audio stream.
  • the user may easily annotate the spoken words and segment the audio stream into useful segments.
  • the categories of allowed annotation information and the possible segments can be easily modified by the user by changing a configuration file.
  • the software may more generally be implemented as any type of logic.
  • This logic may include hardware, such as an application specific integrated circuit or a field programmable gate array, software, or a combination of hardware and software.

Abstract

A transcription tool [115] includes a graphical user interface [209] that displays the waveform of an input audio signal to a user. The user may define speaker turn segments using the displayed waveform. The graphical user interface further displays a transcription section [302] that includes a textual representation of speech that was transcribed by the user and a graphical representation of annotation information [314] relating to the transcribed text. The user may enter the annotation information on-the-fly while transcribing the text using predefined keyboard shortcut commands or other mechanisms. The graphical user interface may further display a structured representation section [303] that may present the transcribed text as a hierarchical tree structure.

Description

    RELATED APPLICATION
  • This application is related to the concurrently-filed U.S. application (Docket No. 02-4040), Ser. No. ______, titled “Fast Transcription of Speech,” which is incorporated herein by reference. [0001]
  • This application claims priority under 35 U.S.C. § 119 based on U.S. Provisional Application No. 60/419,214 filed Oct. 17, 2002, the disclosure of which is incorporated herein by reference.[0002]
  • GOVERNMENT CONTRACT
  • [0003] The U.S. Government has a paid-up license in this invention and the right in limited circumstances to require the patent owner to license others on reason-able terms as provided for by the terms of (contract No. 1999*S018900*000) awarded by Federal Broadcast Information Service (FBIS).
  • BACKGROUND OF THE INVENTION
  • A. Field of the Invention [0004]
  • The present invention relates generally to speech processing and, more particularly, to the transcription of speech. [0005]
  • B. Description of Related Art [0006]
  • Speech has not traditionally been valued as an archival information source. As effective as the spoken word is for communicating, archiving spoken segments in a useful and easily retrievable manner has long been a difficult proposition. Although the act of recording audio is not difficult, automatically transcribing and indexing speech in an intelligent and useful manner can be difficult. [0007]
  • Automatic transcription systems are generally based on a language model. The language model is trained on a speech signal and on a corresponding transcription of the speech. The model will “learn” how the speech signal corresponds to the transcription. Typically, the training transcriptions of the speech are derived through a manual transcription process in which a user listens to the training audio and types in the text corresponding to the audio. [0008]
  • Manually transcribing speech can be a time consuming and, thus, expensive task. Conventionally, generating one hour of transcribed training data requires up to 40 hours of a skilled transcriber's time. Accordingly, in situations in which a lot of training data is required, or in which a number of different languages are to be modeled, the cost of obtaining the training data can be prohibitive. [0009]
  • Thus, there is a need in the art to be able to cost-effectively transcribe speech. [0010]
  • SUMMARY OF THE INVENTION
  • Systems and methods consistent with the principles of this invention provide a transcription tool that allows a user to efficiently transcribe segments of speech to generate a structured and annotated transcription. [0011]
  • One aspect consistent with the invention is directed to a speech transcription tool. The speech transcription tool includes control logic, an input device, and a graphical user interface. The control logic is configured to play back portions of an audio stream and the input device receives text from a user defining a transcription of the portions of the audio stream and receives annotation information from the user further defining the text. The graphical user interface includes a first section that displays a graphical representation of a waveform corresponding to the audio stream and a second section that displays the text and representations of the annotation information for the text. [0012]
  • A second aspect consistent with the invention is directed to a method that comprises receiving an audio stream containing speech data, receiving text from a user defining a transcription of the speech data, receiving annotation information from the user further defining the text, displaying the text, and displaying symbolic representations of the annotation information with the text. [0013]
  • A third aspect consistent with the invention is directed to a computing device for transcribing an audio file that includes speech. The computing device includes an audio output device, a processor, and a computer memory. The computer memory is coupled to the processor and contains programming instructions that when executed by the processor cause the processor to play a current segment of the audio file through the audio output device, receive transcription information for the speech segments played through the audio output device, receive annotation information relating to the transcription information, and display the transcription information in an output section of a graphical user interface. Additionally, the processor displays the annotation information as graphical icons in the output section of the graphical user interface.[0014]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate the invention and, together with the description, explain the invention. In the drawings, [0015]
  • FIG. 1 is a diagram illustrating an exemplary system in which concepts consistent with the invention may be implemented; [0016]
  • FIG. 2 is a block diagram of a transcription tool consistent with the present invention; [0017]
  • FIG. 3 is an exemplary diagram of an interface that may be presented to the user of the transcription tool shown in FIG. 2; [0018]
  • FIG. 4 is a flow chart illustrating exemplary operation of the transcription tool shown in FIG. 2; [0019]
  • FIG. 5 is a diagram illustrating user selection of a speaker turn; and [0020]
  • FIG. 6 is an exemplary diagram of an interface including a pop-up box for further defining annotations. [0021]
  • DETAILED DESCRIPTION
  • The following detailed description of the invention refers to the accompanying drawings. The same reference numbers may be used in different drawings to identify the same or similar elements. Also, the following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims and equivalents of the claim limitations. [0022]
  • A speech transcription tool assists a user in transcribing speech. The speech transcription tool allows the user to transcribe and annotate speech in intuitive ways. The transcription tool presents an integrated view to the user that includes a view of the audio waveform, a view of the text input by the user, and a view of the structured version of the transcribed text. The view of the text input by the user may include graphical icons that represent annotation information that relates to the transcribed text. [0023]
  • System Overview
  • Speech transcription, as described herein, may be performed on one or more processing devices or networks of processing devices. FIG. 1 is a diagram illustrating an [0024] exemplary system 100 in which concepts consistent with the invention may be implemented. System 100 includes a computing device 101 that has a computer-readable medium, such as a random access memory 109, coupled to a processor 108. Computing device 101 may also include a number of additional external or internal devices. An external input device 120 and an external output device 121 are shown in FIG. 1. The input device 120 may include, without limitation, a mouse, a CD-ROM, or a keyboard. The output device may include, without limitation, a display or an audio output device, such as a speaker. A keyboard, in particular, may be used by the user of system 100 when transcribing a speech segment that is played back from an output device, such as a speaker. A foot pedal may be used for audio playback control.
  • In general, [0025] computing device 101 may be any type of computing platform, and may be connected to a network 102. Computing device 101 is exemplary only. Concepts consistent with the present invention can be implemented on any computing device, whether or not connected to a network.
  • [0026] Processor 108 executes program instructions stored in memory 109. Processor 108 can include any of a number of well-known computer processors, such as processors from Intel Corporation, of Santa Clara, Calif.
  • [0027] Memory 109 contains an application program. In particular, the application program may implement a transcription tool 115 described below. Transcription tool 115 plays audio segments to a user. The user transcribes speech in the audio and enters annotation information that further describes the transcription into transcription tool 115.
  • Transcription Tool
  • FIG. 2 is a block diagram illustrating software elements of [0028] transcription tool 115. Users of transcription tool 115 (i.e., transcribers) interact with transcription tool 115 through user input component 203 and graphical user interface (GUI) 204. Control logic 202 coordinates the operation of graphical user interface 204 and user input component 203 to perform transcription in a manner consistent with the present invention. Control logic 202 may additionally handle the playback of the input audio to the user.
  • [0029] User input component 203 processes information received from the user. A user may input information through a number of different hardware input devices. A keyboard, for example, is an input device that the user may use in entering text corresponding to speech. Other devices, such as a foot pedal or a mouse, may be used to control the operation of transcription tool 115.
  • [0030] Graphical user interface 204 displays the graphical interface through which the user interacts with transcription tool 115. FIG. 3 is an exemplary diagram of an interface 300 that may be presented to the user via graphical user interface 204. Interface 300 includes waveform section 301, transcription section 302, and structured representation section 303. Interface 300 may include selectable menu options 304 and window control buttons 305. Through menu options 304, a user may initiate functions of transcription tool 115, such as opening an audio file for transcription, saving a transcription, and setting program options.
  • [0031] Waveform section 301 graphically illustrates the time-domain waveform of the audio stream that is being processed. The exemplary waveform shown in FIG. 3, waveform 310, includes a number of quiet segments 311 and audible segments 312. Audible segments 312 may include, for example, speech, music, other sounds, or combinations thereof.
  • Concurrently with the display of [0032] audio waveform 310, transcription tool 115 may play the audio signal to the user. Transcription tool 115 may visually mark the portion of waveform 310 that is currently being played. For example, as shown in FIG. 3, an arrow 316 may point to the current playback position in audio waveform 310. The user may move the arrow using a mouse or keyboard commands to quickly adjust the current playback position.
  • Sections of [0033] waveform 310 may be labeled as corresponding to different segments of an audio stream. The segments may be defined hierarchically. In one implementation consistent with the invention, these different segments may include “turns,” “sections,” and “episodes.” Additional segments, such as a “gap” segment that defines a period of non-speech such as silence, music, noise, etc., may also be used. A turn may refer to a section of the audio in which a single speaker is speaking (i.e., a “speaker turn”). A section may refer to a number of speaker turns that relate to a particular topic. An episode may refer to a group of sections that each have something in common, such as being of the same broadcast. The turn, section, and episode segments for an audio stream may be illustrated in waveform section 310 using graphical markers. In FIG. 3, these three segments, as well as the gap segment, are illustrated as turns 320, sections 321, episodes 322, and gaps 323.
  • One type of audio stream that can be confidently divided into turns, sections, and episodes is a news broadcast. The whole news broadcast (e.g., a 30 minute broadcast) may correspond to a single episode. An episode may include multiple sections that each corresponds to a different news story. Each section may have one or more speaker turns. [0034]
  • [0035] Transcription section 302 displays transcribed text received by control logic 202 from user input component 203. The text may be represented in Unicode so that the transcription tool can handle left-to-right, right-to-left, and bi-directional scripts, such as English, Chinese, and Arabic. Typically, the text will be typed by the user as the user listens to the audio waveform 310. In addition to merely typing the text of the transcription, the user may input additional information relating to the text. This additional information is received by control logic 202 and stored as annotation information for the text. Annotations may include, for example, an indication that a certain noun corresponds to a person's name or to a location. The annotation information may be displayed in transcription section 302. Annotations 313 and 314 are shown in FIG. 3 that each defines a word or series of words. More particularly, annotations 313 define names of persons and annotations 314 define location names. Annotations may additionally be nested. For example, in the phase “CNN News,” “CNN” may be annotated as “spelled” and the complete phrase “CNN News” may be a “name” annotation.
  • [0036] Structured representation section 303 displays the transcribed text in a hierarchical tree structure. The hierarchical structure may be based on the relationships of segments 320-322. Thus, in FIG. 3, for example, an episode entry 330 (e.g., a folder icon) is at the highest level. An episode may include one or more section entries 331, which may include one or more turn and/or gap entries 332. The turn entries are at the base level in the hierarchy and contain the actual transcription text. One of turn entries 332 is highlighted in FIG. 3, which indicates that this is the currently active turn. Turn entry 332, in addition to the transcribed text, may include annotation information that was input by the user, such as the name of the speaker (“Riaz Ahmad Khan”) and the sex of the speaker (“male”). The sex of the speaker may alternatively be determined automatically based on acoustic processing techniques applied to the speaker turn. Section entries 331 may include a general description of the topic(s) discussed in the turns corresponding to a section. The topic description may be determined automatically based on the speaker turn transcriptions.
  • When the user finishes a transcription, [0037] transcription tool 115 may save the transcription as an output file. The output file may be based on the information in structured representation section 303. That is, the output file may include the transcribed text as well as meta-data that encapsulates the annotation information, including indications of the hierarchical segments. In one implementation, the output file may be an extensible markup language (XML) document.
  • FIG. 4 is a flow chart illustrating exemplary operation of [0038] transcription tool 115 consistent with an aspect of the invention. Before transcribing speech, the user may first load an audio waveform into transcription tool 115. This may be accomplished through the “file” menu.
  • With the waveform loaded, the user may define segments in the waveform, such as turn, section, and episode segments (Act [0039] 401). Segments may be defined by, for example, using a mouse to highlight a continuous portion of the audio that corresponds to a single speaker. FIG. 5 is a diagram illustrating a waveform in which the user has highlighted a portion 501 (shown as a simple rectangle in FIG. 5) that corresponds to a speaker turn. The highlighted portion 501 may include buffer areas 502 that the user aligns to the edge of the speaker turn. Transcription tool may 115 adjust the graphical marker 520 that defines the speaker turn as the user varies the highlighted portion with the mouse. When the user has adjusted highlighted portion 501 to adequately cover the speaker turn, the user may press a predefined key combination, such as CTRL-T, that causes control logic 202 to store the speaker turn. Other user actions, such as a mouse click, instead of a keyboard combination, may be used to inform control logic 202 of a speaker turn.
  • In some implementations, the user may load a saved version of the waveform in which segments have already been defined. In this situation, the user may not have to re-define the segments. [0040]
  • The user may define sections and episodes in a manner similar to defining speaker turns. Alternatively, [0041] control logic 202 may automatically define sections and/or episodes based on the transcribed context of the speaker turns. Control logic 202 may, for example, determine that speaker turns are similar based on the text of the speaker turns. Speaker turns discussing the same topic will tend to use similar words and may, thus, be compared for similarity based on the frequency of occurrence of words in the speaker turn.
  • Additionally, instead of having the user manually highlight [0042] portions 501 of a speaker turn, control logic 202 may use automated speech and language processing functions to initially classify the audio based on an audio type, such as speech, music, or silence. One such technique for automatically identifying segments in an audio stream, such as speaker turns, appropriate for transcription are discussed in the application cited in the “Related Application” section of this document. Portions of the audio that contain only music may be noted on interface 300 so that the user does not need to bother listening to these portions.
  • As the user defines speaker turns, sections, and episodes, [0043] transcription tool 115 may update structured representation section 303 to indicate the defined segments. The user may listen to audio before creating segments to determine where turn boundaries should be.
  • After defining one or a number of segments, the user may begin playback of a particular one of the segments, such as a speaker turn (Act [0044] 402). In one implementation, the user may control which of the speaker turns is the active speaker turn via mouse or keyboard commands. Thus, a user may point to a particular speaker turn 320 to select the corresponding section of waveform 310 for playback. Alternatively, the user may adjust the current playback position using predefined keyboard commands. For example, the key combination CTRL-↓ may cause control logic 202 to select the next speaker turn as the active speaker turn, the key combination CTRL-↑ may cause control logic 202 to select the previous speaker turn as the active speaker turn, and the key combinations SHIFT-CTRL-←/→ may move the current active location, as indicated by arrow 316, to the left/right in predetermined increments (e.g., 0.8 second increments).
  • While [0045] transcription tool 115 is playing back audio, the user may transcribe speech in the audio by entering (e.g., typing) the text into user input device 203 (Act 403). Control logic 202 displays the text in transcription section 302 and may simultaneously update structured representation section 303. During the transcription process, the user may enter annotation information for a particular word or sequence of words (Acts 404 and 405).
  • In one implementation, the annotation information is entered by a user through keyboard shortcuts. For example, before typing in a name, the user may input a key combination such as CTRL-N. This key combination informs [0046] control logic 202 that the succeeding text corresponds to a name. In some implementations, pressing CTRL-N may bring up a selection box that allows the user to further define the name that is to be annotated, such as the name of a person or the name of a location. FIG. 6 is an exemplary diagram of an interface 600 for transcription tool 115 that includes a pop-up box 601 that allows the user to further define name annotations. Control logic 202 may display pop-up box 601 in response to the keyboard combination (e.g., CTRL-N) for a name object.
  • Based on the selected name, [0047] control logic 202 may generate an appropriate name icon surrounding text typed by the user, such as name icons 313 or 314 (FIG. 3). When the user has completed typing the name, he may again press CTRL-N to turn off name annotation and revert back to normal text transcription.
  • Other annotations, in addition to name annotations, may be entered by the user. For example, the user may mark an unintelligible section of speech with a “skip” marker that is toggled on/off via the key combination CTRL-K. [0048]
  • When the user finishes transcribing (Act [0049] 406), transcription tool 115 may output the transcription entered by the user to a file (Act 407). In one implementation, the user selects the output file to write to using the “File” menu on interface 300. As previously mentioned, the output file may be an XML document that includes the information in structured representation section 303. Thus, the output file may include the transcribed text, the annotation information, the segmentation information, and other information, such as time codes that correlate the transcription with the original audio.
  • The output file, in addition to being an XML document, may be generated using Unicode to represent the characters. The Unicode standard is a character encoding standard that represents characters using a unique number for every character, regardless of the computing platform or language. The Unicode standard is maintained by the Unicode consortium. [0050]
  • In addition to entering information while transcribing text, in some implementations, users may enter annotation information after transcribing the text. In particular, [0051] control logic 202 may allow users to highlight text in transcription section 302 and then select the annotation information to apply to the highlighted text.
  • Transcription Tool Configuration
  • When initially starting up, [0052] transcription tool 115 may read a configuration file. The configuration file may define functionality for a number of operational aspects of the transcription tool. For example, the configuration file may define the names and the relationships (e.g., hierarchy) between the segments, the possible annotation information, and the keyboard shortcuts that are used to enter the annotation information. In this manner, by modifying the configuration file, transcription tool 115 can be customized for a particular transcription task.
  • In one implementation, the structure of the configuration file is defined through an XML schema definition. [0053]
  • Conclusion
  • The transcription tool described herein allows users to efficiently create rich transcriptions of an audio stream. In addition to merely typing in spoken word, the user may easily annotate the spoken words and segment the audio stream into useful segments. Moreover, the categories of allowed annotation information and the possible segments can be easily modified by the user by changing a configuration file. [0054]
  • The foregoing description of preferred embodiments of the invention provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. For example, while a series of acts has been presented with respect to FIG. 4, the order of the acts may be different in other implementations consistent with the present invention. Also, certain actions have been described as keyboard actions, however, these actions might also be performed via other input devices such as a mouse or a footpedal. [0055]
  • Certain portions of the invention have been described as software that performs one or more functions. The software may more generally be implemented as any type of logic. This logic may include hardware, such as an application specific integrated circuit or a field programmable gate array, software, or a combination of hardware and software. [0056]
  • No element, act, or instruction used in the description of the present application should be construed as critical or essential to the invention unless explicitly described as such. Also, as used herein, the article “a” is intended to include one or more items. Where only one item is intended, the term “one” or similar language is used. [0057]
  • The scope of the invention is defined by the claims and their equivalents. [0058]

Claims (28)

What is claimed:
1. A speech transcription tool comprising:
control logic configured to play back portions of an audio stream;
an input device configured to receive text from a user defining a transcription of the portions of the audio stream and receive annotation information from the user further defining the text; and
a graphical user interface including
a first section configured to display a graphical representation of a waveform corresponding to the audio stream, and
a second section configured to display the text and representations of the annotation information for the text.
2. The speech transcription tool of claim 1, wherein the graphical user interface further includes:
a third section configured to display a hierarchically structured representation of the text.
3. The speech transcription tool of claim 1, wherein the first section of the graphical user interface further includes graphical markers that define the portions of the audio stream.
4. The speech transcription tool of claim 1, wherein the representations of the annotation information include graphical icons.
5. The speech transcription tool of claim 1, wherein the input device further receives information from the user classifying the portions of the audio stream into a plurality of hierarchical segments.
6. The speech transcription tool of claim 5, wherein the segments include speaker turns, sections, and episodes.
7. The speech transcription tool of claim 1, wherein the control logic writes the transcription of the portions of the audio stream and the annotation information as a Unicode output file.
8. The speech transcription tool of claim 1, wherein the annotation information is selected from a possible set of annotation information defined by a configuration file.
9. The speech transcription tool of claim 1, wherein the annotation information is entered by the user through predefined keyboard shortcuts.
10. A method comprising:
receiving an audio stream containing speech data;
receiving text from a user defining a transcription of the speech data;
receiving annotation information from the user further defining the text;
displaying the text; and
displaying symbolic representations of the annotation information with the text.
11. The method of claim 10, further comprising:
displaying a graphical representation of a waveform corresponding to the audio stream.
12. The method of claim 11, wherein the graphical representation of the waveform includes graphical markers that define segments within the audio stream.
13. The method of claim 12, wherein the graphical markers are adjustable by the user, and wherein adjusting the markers adjusts a corresponding definition of a segment.
14. The method of claim 12, wherein the segments include speaker turns, sections, and episodes.
15. The method of claim 10, further comprising:
categorizing the audio stream into a plurality of hierarchically arranged segments, and
displaying the hierarchically arranged segments.
16. The method of claim 10, wherein the symbolic representations of the annotation information include graphical icons.
17. The method of claim 10, wherein the annotation information is selected from a possible set of annotation information defined by a configuration file.
18. The method of claim 10, wherein the user enters the annotation information using predefined keyboard shortcuts.
19. A computing device for transcribing an audio file that includes speech, the computing device comprising:
an audio output device;
a processor; and
a computer memory coupled to the processor and containing programming instructions that when executed by the processor cause the processor to:
play a current one of a plurality of segments of the audio file through the audio output device,
receive transcription information for speech segments of the segments of the audio file played through the audio output device,
receive annotation information relating to the transcription information, and
display the transcription information in an output section of a graphical user interface, and
display the annotation information as graphical icons in the output section of the graphical user interface.
20. The computing device of claim 19, wherein the programming instructions additionally cause the processor to:
display a graphical representation of a waveform corresponding to the audio file.
21. The computing device of claim 20, wherein the graphical representation of the waveform includes graphical markers that represent the segments of the audio file.
22. The computing device of claim 19, wherein the graphical icons are displayed overlaid with the transcription information.
23. The computing device of claim 19, further comprising:
an input device configured to receive information from the user classifying the segments of the audio file.
24. The computing device of claim 23, wherein the segments include speaker turns, sections, and episodes.
25. The speech transcription tool of claim 19, wherein the processor writes the transcription information and the annotation information to a Unicode output file.
26. The speech transcription tool of claim 19, wherein the annotation information is selected from a possible set of annotation information defined by a configuration file.
27. A computer-readable medium containing program instructions for execution by a processor, the program instructions comprising:
instructions for obtaining an audio stream containing speech data;
instructions for receiving text from a user that defines a transcription of the speech data;
instructions for receiving annotation information from the user further defining the text;
instructions for presenting the text; and
instructions for providing symbolic representations of the annotation information with the text.
28. A device comprising:
means for receiving an audio stream containing speech data;
means for receiving text from a user defining a transcription of the speech data;
means for receiving annotation information from the user further defining the text;
means for displaying the text; and
means for displaying symbolic representations of the annotation information as graphical icons associated with the text.
US10/685,445 2002-10-17 2003-10-16 Speech transcription tool for efficient speech transcription Abandoned US20040138894A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/685,445 US20040138894A1 (en) 2002-10-17 2003-10-16 Speech transcription tool for efficient speech transcription

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US41921402P 2002-10-17 2002-10-17
US10/685,445 US20040138894A1 (en) 2002-10-17 2003-10-16 Speech transcription tool for efficient speech transcription

Publications (1)

Publication Number Publication Date
US20040138894A1 true US20040138894A1 (en) 2004-07-15

Family

ID=32110223

Family Applications (9)

Application Number Title Priority Date Filing Date
US10/685,586 Abandoned US20040204939A1 (en) 2002-10-17 2003-10-16 Systems and methods for speaker change detection
US10/685,565 Active - Reinstated 2026-04-05 US7292977B2 (en) 2002-10-17 2003-10-16 Systems and methods for providing online fast speaker adaptation in speech recognition
US10/685,410 Expired - Fee Related US7389229B2 (en) 2002-10-17 2003-10-16 Unified clustering tree
US10/685,479 Abandoned US20040163034A1 (en) 2002-10-17 2003-10-16 Systems and methods for labeling clusters of documents
US10/685,585 Active 2026-01-10 US7424427B2 (en) 2002-10-17 2003-10-16 Systems and methods for classifying audio into broad phoneme classes
US10/685,478 Abandoned US20040083104A1 (en) 2002-10-17 2003-10-16 Systems and methods for providing interactive speaker identification training
US10/685,566 Abandoned US20040176946A1 (en) 2002-10-17 2003-10-16 Pronunciation symbols based on the orthographic lexicon of a language
US10/685,403 Abandoned US20040083090A1 (en) 2002-10-17 2003-10-16 Manager for integrating language technology components
US10/685,445 Abandoned US20040138894A1 (en) 2002-10-17 2003-10-16 Speech transcription tool for efficient speech transcription

Family Applications Before (8)

Application Number Title Priority Date Filing Date
US10/685,586 Abandoned US20040204939A1 (en) 2002-10-17 2003-10-16 Systems and methods for speaker change detection
US10/685,565 Active - Reinstated 2026-04-05 US7292977B2 (en) 2002-10-17 2003-10-16 Systems and methods for providing online fast speaker adaptation in speech recognition
US10/685,410 Expired - Fee Related US7389229B2 (en) 2002-10-17 2003-10-16 Unified clustering tree
US10/685,479 Abandoned US20040163034A1 (en) 2002-10-17 2003-10-16 Systems and methods for labeling clusters of documents
US10/685,585 Active 2026-01-10 US7424427B2 (en) 2002-10-17 2003-10-16 Systems and methods for classifying audio into broad phoneme classes
US10/685,478 Abandoned US20040083104A1 (en) 2002-10-17 2003-10-16 Systems and methods for providing interactive speaker identification training
US10/685,566 Abandoned US20040176946A1 (en) 2002-10-17 2003-10-16 Pronunciation symbols based on the orthographic lexicon of a language
US10/685,403 Abandoned US20040083090A1 (en) 2002-10-17 2003-10-16 Manager for integrating language technology components

Country Status (1)

Country Link
US (9) US20040204939A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070100626A1 (en) * 2005-11-02 2007-05-03 International Business Machines Corporation System and method for improving speaking ability
US20080195370A1 (en) * 2005-08-26 2008-08-14 Koninklijke Philips Electronics, N.V. System and Method For Synchronizing Sound and Manually Transcribed Text
US20120278071A1 (en) * 2011-04-29 2012-11-01 Nexidia Inc. Transcription system
US20130030806A1 (en) * 2011-07-26 2013-01-31 Kabushiki Kaisha Toshiba Transcription support system and transcription support method
US20130030805A1 (en) * 2011-07-26 2013-01-31 Kabushiki Kaisha Toshiba Transcription support system and transcription support method
US20130080163A1 (en) * 2011-09-26 2013-03-28 Kabushiki Kaisha Toshiba Information processing apparatus, information processing method and computer program product
US8676590B1 (en) 2012-09-26 2014-03-18 Google Inc. Web-based audio transcription tool
US20150006174A1 (en) * 2012-02-03 2015-01-01 Sony Corporation Information processing device, information processing method and program
US20160323521A1 (en) * 2005-02-28 2016-11-03 Facebook, Inc. Titling apparatus, a titling method, and a machine readable medium storing thereon a computer program for titling
US11404049B2 (en) * 2019-12-09 2022-08-02 Microsoft Technology Licensing, Llc Interactive augmentation and integration of real-time speech-to-text

Families Citing this family (158)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1390319A1 (en) * 2001-05-16 2004-02-25 E.I. Du Pont De Nemours And Company Dielectric composition with reduced resistance
WO2004029773A2 (en) * 2002-09-27 2004-04-08 Callminer, Inc. Software for statistical analysis of speech
WO2004090870A1 (en) * 2003-04-04 2004-10-21 Kabushiki Kaisha Toshiba Method and apparatus for encoding or decoding wide-band audio
US7567908B2 (en) * 2004-01-13 2009-07-28 International Business Machines Corporation Differential dynamic content delivery with text display in dependence upon simultaneous speech
JP2005202014A (en) * 2004-01-14 2005-07-28 Sony Corp Audio signal processor, audio signal processing method, and audio signal processing program
US8923838B1 (en) 2004-08-19 2014-12-30 Nuance Communications, Inc. System, method and computer program product for activating a cellular phone account
JP4220449B2 (en) * 2004-09-16 2009-02-04 株式会社東芝 Indexing device, indexing method, and indexing program
GB0511307D0 (en) * 2005-06-03 2005-07-13 South Manchester University Ho A method for generating output data
US7382933B2 (en) * 2005-08-24 2008-06-03 International Business Machines Corporation System and method for semantic video segmentation based on joint audiovisual and text analysis
US7801893B2 (en) * 2005-09-30 2010-09-21 Iac Search & Media, Inc. Similarity detection and clustering of images
US20070094023A1 (en) * 2005-10-21 2007-04-26 Callminer, Inc. Method and apparatus for processing heterogeneous units of work
US20070094270A1 (en) * 2005-10-21 2007-04-26 Callminer, Inc. Method and apparatus for the processing of heterogeneous units of work
KR100755677B1 (en) * 2005-11-02 2007-09-05 삼성전자주식회사 Apparatus and method for dialogue speech recognition using topic detection
WO2007061947A2 (en) * 2005-11-18 2007-05-31 Blacklidge Emulsions, Inc. Method for bonding prepared substrates for roadways using a low-tracking asphalt emulsion coating
US20070129943A1 (en) * 2005-12-06 2007-06-07 Microsoft Corporation Speech recognition using adaptation and prior knowledge
CA2536976A1 (en) * 2006-02-20 2007-08-20 Diaphonics, Inc. Method and apparatus for detecting speaker change in a voice transaction
US8996592B2 (en) * 2006-06-26 2015-03-31 Scenera Technologies, Llc Methods, systems, and computer program products for identifying a container associated with a plurality of files
US20080004876A1 (en) * 2006-06-30 2008-01-03 Chuang He Non-enrolled continuous dictation
US20080051916A1 (en) * 2006-08-28 2008-02-28 Arcadyan Technology Corporation Method and apparatus for recording streamed audio
KR100826875B1 (en) * 2006-09-08 2008-05-06 한국전자통신연구원 On-line speaker recognition method and apparatus for thereof
US8073681B2 (en) 2006-10-16 2011-12-06 Voicebox Technologies, Inc. System and method for a cooperative conversational voice user interface
US20080104066A1 (en) * 2006-10-27 2008-05-01 Yahoo! Inc. Validating segmentation criteria
US7272558B1 (en) 2006-12-01 2007-09-18 Coveo Solutions Inc. Speech recognition training method for audio and video file indexing on a search engine
US20080154579A1 (en) * 2006-12-21 2008-06-26 Krishna Kummamuru Method of analyzing conversational transcripts
US7818176B2 (en) 2007-02-06 2010-10-19 Voicebox Technologies, Inc. System and method for selecting and presenting advertisements based on natural language processing of voice-based input
US8386254B2 (en) * 2007-05-04 2013-02-26 Nuance Communications, Inc. Multi-class constrained maximum likelihood linear regression
WO2009018223A1 (en) * 2007-07-27 2009-02-05 Sparkip, Inc. System and methods for clustering large database of documents
DE602007004733D1 (en) * 2007-10-10 2010-03-25 Harman Becker Automotive Sys speaker recognition
JP4405542B2 (en) * 2007-10-24 2010-01-27 株式会社東芝 Apparatus, method and program for clustering phoneme models
US9386154B2 (en) 2007-12-21 2016-07-05 Nuance Communications, Inc. System, method and software program for enabling communications between customer service agents and users of communication devices
WO2009122779A1 (en) * 2008-04-03 2009-10-08 日本電気株式会社 Text data processing apparatus, method, and recording medium with program recorded thereon
US9305548B2 (en) 2008-05-27 2016-04-05 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
WO2010019831A1 (en) 2008-08-14 2010-02-18 21Ct, Inc. Hidden markov model for speech processing with training method
CA2680304C (en) * 2008-09-25 2017-08-22 Multimodal Technologies, Inc. Decoding-time prediction of non-verbalized tokens
US8458105B2 (en) 2009-02-12 2013-06-04 Decisive Analytics Corporation Method and apparatus for analyzing and interrelating data
US8326637B2 (en) 2009-02-20 2012-12-04 Voicebox Technologies, Inc. System and method for processing multi-modal device interactions in a natural language voice services environment
US8301446B2 (en) * 2009-03-30 2012-10-30 Adacel Systems, Inc. System and method for training an acoustic model with reduced feature space variation
US8412525B2 (en) * 2009-04-30 2013-04-02 Microsoft Corporation Noise robust speech classifier ensemble
US8515957B2 (en) 2009-07-28 2013-08-20 Fti Consulting, Inc. System and method for displaying relationships between electronically stored information to provide classification suggestions via injection
CA3026879A1 (en) 2009-08-24 2011-03-10 Nuix North America, Inc. Generating a reference set for use during document review
US8554562B2 (en) * 2009-11-15 2013-10-08 Nuance Communications, Inc. Method and system for speaker diarization
US8983958B2 (en) * 2009-12-21 2015-03-17 Business Objects Software Limited Document indexing based on categorization and prioritization
JP5477635B2 (en) * 2010-02-15 2014-04-23 ソニー株式会社 Information processing apparatus and method, and program
NZ602633A (en) 2010-02-24 2014-11-28 Blacklidge Emulsions Inc Hot applied tack coat
US9305553B2 (en) * 2010-04-28 2016-04-05 William S. Meisel Speech recognition accuracy improvement through speaker categories
US9009040B2 (en) * 2010-05-05 2015-04-14 Cisco Technology, Inc. Training a transcription system
US8391464B1 (en) 2010-06-24 2013-03-05 Nuance Communications, Inc. Customer service system, method, and software program product for responding to queries using natural language understanding
JP2012038131A (en) * 2010-08-09 2012-02-23 Sony Corp Information processing unit, information processing method, and program
US8630854B2 (en) * 2010-08-31 2014-01-14 Fujitsu Limited System and method for generating videoconference transcriptions
US20120084149A1 (en) * 2010-09-10 2012-04-05 Paolo Gaudiano Methods and systems for online advertising with interactive text clouds
US8791977B2 (en) 2010-10-05 2014-07-29 Fujitsu Limited Method and system for presenting metadata during a videoconference
CN102455997A (en) * 2010-10-27 2012-05-16 鸿富锦精密工业(深圳)有限公司 Component name extraction system and method
KR101172663B1 (en) * 2010-12-31 2012-08-08 엘지전자 주식회사 Mobile terminal and method for grouping application thereof
US20120197643A1 (en) * 2011-01-27 2012-08-02 General Motors Llc Mapping obstruent speech energy to lower frequencies
GB2489489B (en) * 2011-03-30 2013-08-21 Toshiba Res Europ Ltd A speech processing system and method
CA2832918C (en) * 2011-06-22 2016-05-10 Rogers Communications Inc. Systems and methods for ranking document clusters
US9313336B2 (en) 2011-07-21 2016-04-12 Nuance Communications, Inc. Systems and methods for processing audio signals captured using microphones of multiple devices
US8433577B2 (en) * 2011-09-27 2013-04-30 Google Inc. Detection of creative works on broadcast media
US20130144414A1 (en) * 2011-12-06 2013-06-06 Cisco Technology, Inc. Method and apparatus for discovering and labeling speakers in a large and growing collection of videos with minimal user effort
US9002848B1 (en) * 2011-12-27 2015-04-07 Google Inc. Automatic incremental labeling of document clusters
US20130266127A1 (en) 2012-04-10 2013-10-10 Raytheon Bbn Technologies Corp System and method for removing sensitive data from a recording
US20140365221A1 (en) * 2012-07-31 2014-12-11 Novospeech Ltd. Method and apparatus for speech recognition
US20140136204A1 (en) * 2012-11-13 2014-05-15 GM Global Technology Operations LLC Methods and systems for speech systems
US20140207786A1 (en) * 2013-01-22 2014-07-24 Equivio Ltd. System and methods for computerized information governance of electronic documents
US9865266B2 (en) * 2013-02-25 2018-01-09 Nuance Communications, Inc. Method and apparatus for automated speaker parameters adaptation in a deployed speaker verification system
US9330167B1 (en) * 2013-05-13 2016-05-03 Groupon, Inc. Method, apparatus, and computer program product for classification and tagging of textual data
US10806360B2 (en) 2013-09-25 2020-10-20 Bardy Diagnostics, Inc. Extended wear ambulatory electrocardiography and physiological sensor monitor
US10433751B2 (en) 2013-09-25 2019-10-08 Bardy Diagnostics, Inc. System and method for facilitating a cardiac rhythm disorder diagnosis based on subcutaneous cardiac monitoring data
US10736531B2 (en) 2013-09-25 2020-08-11 Bardy Diagnostics, Inc. Subcutaneous insertable cardiac monitor optimized for long term, low amplitude electrocardiographic data collection
US10799137B2 (en) 2013-09-25 2020-10-13 Bardy Diagnostics, Inc. System and method for facilitating a cardiac rhythm disorder diagnosis with the aid of a digital computer
US9364155B2 (en) 2013-09-25 2016-06-14 Bardy Diagnostics, Inc. Self-contained personal air flow sensing monitor
US11213237B2 (en) 2013-09-25 2022-01-04 Bardy Diagnostics, Inc. System and method for secure cloud-based physiological data processing and delivery
US9717433B2 (en) 2013-09-25 2017-08-01 Bardy Diagnostics, Inc. Ambulatory electrocardiography monitoring patch optimized for capturing low amplitude cardiac action potential propagation
US10667711B1 (en) 2013-09-25 2020-06-02 Bardy Diagnostics, Inc. Contact-activated extended wear electrocardiography and physiological sensor monitor recorder
US9737224B2 (en) 2013-09-25 2017-08-22 Bardy Diagnostics, Inc. Event alerting through actigraphy embedded within electrocardiographic data
US9504423B1 (en) 2015-10-05 2016-11-29 Bardy Diagnostics, Inc. Method for addressing medical conditions through a wearable health monitor with the aid of a digital computer
US9655537B2 (en) 2013-09-25 2017-05-23 Bardy Diagnostics, Inc. Wearable electrocardiography and physiology monitoring ensemble
US9615763B2 (en) 2013-09-25 2017-04-11 Bardy Diagnostics, Inc. Ambulatory electrocardiography monitor recorder optimized for capturing low amplitude cardiac action potential propagation
US11723575B2 (en) 2013-09-25 2023-08-15 Bardy Diagnostics, Inc. Electrocardiography patch
US20190167139A1 (en) 2017-12-05 2019-06-06 Gust H. Bardy Subcutaneous P-Wave Centric Insertable Cardiac Monitor For Long Term Electrocardiographic Monitoring
US9655538B2 (en) 2013-09-25 2017-05-23 Bardy Diagnostics, Inc. Self-authenticating electrocardiography monitoring circuit
WO2015048194A1 (en) 2013-09-25 2015-04-02 Bardy Diagnostics, Inc. Self-contained personal air flow sensing monitor
US10820801B2 (en) 2013-09-25 2020-11-03 Bardy Diagnostics, Inc. Electrocardiography monitor configured for self-optimizing ECG data compression
US9619660B1 (en) 2013-09-25 2017-04-11 Bardy Diagnostics, Inc. Computer-implemented system for secure physiological data collection and processing
US9408545B2 (en) 2013-09-25 2016-08-09 Bardy Diagnostics, Inc. Method for efficiently encoding and compressing ECG data optimized for use in an ambulatory ECG monitor
US10433748B2 (en) 2013-09-25 2019-10-08 Bardy Diagnostics, Inc. Extended wear electrocardiography and physiological sensor monitor
US9433367B2 (en) 2013-09-25 2016-09-06 Bardy Diagnostics, Inc. Remote interfacing of extended wear electrocardiography and physiological sensor monitor
US9408551B2 (en) 2013-11-14 2016-08-09 Bardy Diagnostics, Inc. System and method for facilitating diagnosis of cardiac rhythm disorders with the aid of a digital computer
US10251576B2 (en) 2013-09-25 2019-04-09 Bardy Diagnostics, Inc. System and method for ECG data classification for use in facilitating diagnosis of cardiac rhythm disorders with the aid of a digital computer
US9775536B2 (en) 2013-09-25 2017-10-03 Bardy Diagnostics, Inc. Method for constructing a stress-pliant physiological electrode assembly
US10624551B2 (en) 2013-09-25 2020-04-21 Bardy Diagnostics, Inc. Insertable cardiac monitor for use in performing long term electrocardiographic monitoring
US10888239B2 (en) 2013-09-25 2021-01-12 Bardy Diagnostics, Inc. Remote interfacing electrocardiography patch
US9545204B2 (en) 2013-09-25 2017-01-17 Bardy Diagnostics, Inc. Extended wear electrocardiography patch
US10736529B2 (en) 2013-09-25 2020-08-11 Bardy Diagnostics, Inc. Subcutaneous insertable electrocardiography monitor
US9717432B2 (en) 2013-09-25 2017-08-01 Bardy Diagnostics, Inc. Extended wear electrocardiography patch using interlaced wire electrodes
US9700227B2 (en) 2013-09-25 2017-07-11 Bardy Diagnostics, Inc. Ambulatory electrocardiography monitoring patch optimized for capturing low amplitude cardiac action potential propagation
US9345414B1 (en) 2013-09-25 2016-05-24 Bardy Diagnostics, Inc. Method for providing dynamic gain over electrocardiographic data with the aid of a digital computer
US10463269B2 (en) 2013-09-25 2019-11-05 Bardy Diagnostics, Inc. System and method for machine-learning-based atrial fibrillation detection
US20150100582A1 (en) * 2013-10-08 2015-04-09 Cisco Technology, Inc. Association of topic labels with digital content
US9495439B2 (en) * 2013-10-08 2016-11-15 Cisco Technology, Inc. Organizing multimedia content
US9942396B2 (en) * 2013-11-01 2018-04-10 Adobe Systems Incorporated Document distribution and interaction
DE102013224417B3 (en) * 2013-11-28 2015-05-07 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Hearing aid with basic frequency modification, method for processing a speech signal and computer program with a program code for performing the method
CN104143326B (en) * 2013-12-03 2016-11-02 腾讯科技(深圳)有限公司 A kind of voice command identification method and device
US9544149B2 (en) 2013-12-16 2017-01-10 Adobe Systems Incorporated Automatic E-signatures in response to conditions and/or events
US9413891B2 (en) 2014-01-08 2016-08-09 Callminer, Inc. Real-time conversational analytics facility
JP6392012B2 (en) * 2014-07-14 2018-09-19 株式会社東芝 Speech synthesis dictionary creation device, speech synthesis device, speech synthesis dictionary creation method, and speech synthesis dictionary creation program
US9728190B2 (en) * 2014-07-25 2017-08-08 International Business Machines Corporation Summarization of audio data
CN107003996A (en) 2014-09-16 2017-08-01 声钰科技 VCommerce
US9703982B2 (en) 2014-11-06 2017-07-11 Adobe Systems Incorporated Document distribution and interaction
US9531545B2 (en) 2014-11-24 2016-12-27 Adobe Systems Incorporated Tracking and notification of fulfillment events
US9432368B1 (en) 2015-02-19 2016-08-30 Adobe Systems Incorporated Document distribution and interaction
JP6464411B6 (en) * 2015-02-25 2019-03-13 Dynabook株式会社 Electronic device, method and program
US10447646B2 (en) * 2015-06-15 2019-10-15 International Business Machines Corporation Online communication modeling and analysis
US10068445B2 (en) * 2015-06-24 2018-09-04 Google Llc Systems and methods of home-specific sound event detection
US10089061B2 (en) * 2015-08-28 2018-10-02 Kabushiki Kaisha Toshiba Electronic device and method
US9935777B2 (en) 2015-08-31 2018-04-03 Adobe Systems Incorporated Electronic signature framework with enhanced security
US20170075652A1 (en) 2015-09-14 2017-03-16 Kabushiki Kaisha Toshiba Electronic device and method
US9626653B2 (en) 2015-09-21 2017-04-18 Adobe Systems Incorporated Document distribution and interaction with delegation of signature authority
US9754593B2 (en) * 2015-11-04 2017-09-05 International Business Machines Corporation Sound envelope deconstruction to identify words and speakers in continuous speech
US10825464B2 (en) 2015-12-16 2020-11-03 Dolby Laboratories Licensing Corporation Suppression of breath in audio signals
US10347215B2 (en) 2016-05-27 2019-07-09 Adobe Inc. Multi-device electronic signature framework
WO2017210618A1 (en) 2016-06-02 2017-12-07 Fti Consulting, Inc. Analyzing clusters of coded documents
US10255905B2 (en) * 2016-06-10 2019-04-09 Google Llc Predicting pronunciations with word stress
US10217453B2 (en) * 2016-10-14 2019-02-26 Soundhound, Inc. Virtual assistant configured by selection of wake-up phrase
US20180232623A1 (en) * 2017-02-10 2018-08-16 International Business Machines Corporation Techniques for answering questions based on semantic distances between subjects
US10503919B2 (en) 2017-04-10 2019-12-10 Adobe Inc. Electronic signature framework with keystroke biometric authentication
US20180336892A1 (en) * 2017-05-16 2018-11-22 Apple Inc. Detecting a trigger of a digital assistant
GB2578386B (en) 2017-06-27 2021-12-01 Cirrus Logic Int Semiconductor Ltd Detection of replay attack
GB2563953A (en) 2017-06-28 2019-01-02 Cirrus Logic Int Semiconductor Ltd Detection of replay attack
GB201713697D0 (en) 2017-06-28 2017-10-11 Cirrus Logic Int Semiconductor Ltd Magnetic detection of replay attack
GB201801532D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Methods, apparatus and systems for audio playback
GB201801526D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Methods, apparatus and systems for authentication
GB201801527D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Method, apparatus and systems for biometric processes
GB201801528D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Method, apparatus and systems for biometric processes
GB201801661D0 (en) 2017-10-13 2018-03-21 Cirrus Logic International Uk Ltd Detection of liveness
GB201804843D0 (en) 2017-11-14 2018-05-09 Cirrus Logic Int Semiconductor Ltd Detection of replay attack
GB2567503A (en) * 2017-10-13 2019-04-17 Cirrus Logic Int Semiconductor Ltd Analysing speech signals
GB201801664D0 (en) 2017-10-13 2018-03-21 Cirrus Logic Int Semiconductor Ltd Detection of liveness
GB201801659D0 (en) 2017-11-14 2018-03-21 Cirrus Logic Int Semiconductor Ltd Detection of loudspeaker playback
TWI625680B (en) 2017-12-15 2018-06-01 財團法人工業技術研究院 Method and device for recognizing facial expressions
US11735189B2 (en) 2018-01-23 2023-08-22 Cirrus Logic, Inc. Speaker identification
US11264037B2 (en) 2018-01-23 2022-03-01 Cirrus Logic, Inc. Speaker identification
US11475899B2 (en) 2018-01-23 2022-10-18 Cirrus Logic, Inc. Speaker identification
CN109300486B (en) * 2018-07-30 2021-06-25 四川大学 PICGTFs and SSMC enhanced cleft palate speech pharynx fricative automatic identification method
US10692490B2 (en) 2018-07-31 2020-06-23 Cirrus Logic, Inc. Detection of replay attack
US20210312944A1 (en) * 2018-08-15 2021-10-07 Nippon Telegraph And Telephone Corporation End-of-talk prediction device, end-of-talk prediction method, and non-transitory computer readable recording medium
US10915614B2 (en) 2018-08-31 2021-02-09 Cirrus Logic, Inc. Biometric authentication
US11037574B2 (en) 2018-09-05 2021-06-15 Cirrus Logic, Inc. Speaker recognition and speaker change detection
US11096579B2 (en) 2019-07-03 2021-08-24 Bardy Diagnostics, Inc. System and method for remote ECG data streaming in real-time
US11696681B2 (en) 2019-07-03 2023-07-11 Bardy Diagnostics Inc. Configurable hardware platform for physiological monitoring of a living body
US11116451B2 (en) 2019-07-03 2021-09-14 Bardy Diagnostics, Inc. Subcutaneous P-wave centric insertable cardiac monitor with energy harvesting capabilities
US11410642B2 (en) * 2019-08-16 2022-08-09 Soundhound, Inc. Method and system using phoneme embedding
US11354920B2 (en) 2019-10-12 2022-06-07 International Business Machines Corporation Updating and implementing a document from an audio proceeding
US11862168B1 (en) * 2020-03-30 2024-01-02 Amazon Technologies, Inc. Speaker disambiguation and transcription from multiple audio feeds
US11373657B2 (en) * 2020-05-01 2022-06-28 Raytheon Applied Signal Technology, Inc. System and method for speaker identification in audio data
US11875791B2 (en) * 2020-05-21 2024-01-16 Orcam Technologies Ltd. Systems and methods for emphasizing a user's name
US11315545B2 (en) 2020-07-09 2022-04-26 Raytheon Applied Signal Technology, Inc. System and method for language identification in audio data
CN113284508B (en) * 2021-07-21 2021-11-09 中国科学院自动化研究所 Hierarchical differentiation based generated audio detection system

Citations (96)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4879648A (en) * 1986-09-19 1989-11-07 Nancy P. Cochran Search system which continuously displays search terms during scrolling and selections of individually displayed data sets
US4908866A (en) * 1985-02-04 1990-03-13 Eric Goldwasser Speech transcribing system
US5317732A (en) * 1991-04-26 1994-05-31 Commodore Electronics Limited System for relocating a multimedia presentation on a different platform by extracting a resource map in order to remap and relocate resources
US5404295A (en) * 1990-08-16 1995-04-04 Katz; Boris Method and apparatus for utilizing annotations to facilitate computer retrieval of database material
US5418716A (en) * 1990-07-26 1995-05-23 Nec Corporation System for recognizing sentence patterns and a system for recognizing sentence patterns and grammatical cases
US5544257A (en) * 1992-01-08 1996-08-06 International Business Machines Corporation Continuous parameter hidden Markov model approach to automatic handwriting recognition
US5559875A (en) * 1995-07-31 1996-09-24 Latitude Communications Method and apparatus for recording and retrieval of audio conferences
US5572728A (en) * 1993-12-24 1996-11-05 Hitachi, Ltd. Conference multimedia summary support system and method
US5613032A (en) * 1994-09-02 1997-03-18 Bell Communications Research, Inc. System and method for recording, playing back and searching multimedia events wherein video, audio and text can be searched and retrieved
US5614940A (en) * 1994-10-21 1997-03-25 Intel Corporation Method and apparatus for providing broadcast information with indexing
US5684924A (en) * 1995-05-19 1997-11-04 Kurzweil Applied Intelligence, Inc. User adaptable speech recognition system
US5715367A (en) * 1995-01-23 1998-02-03 Dragon Systems, Inc. Apparatuses and methods for developing and using models for speech recognition
US5752021A (en) * 1994-05-24 1998-05-12 Fuji Xerox Co., Ltd. Document database management apparatus capable of conversion between retrieval formulae for different schemata
US5757960A (en) * 1994-09-30 1998-05-26 Murdock; Michael Chase Method and system for extracting features from handwritten text
US5768607A (en) * 1994-09-30 1998-06-16 Intel Corporation Method and apparatus for freehand annotation and drawings incorporating sound and for compressing and synchronizing sound
US5777614A (en) * 1994-10-14 1998-07-07 Hitachi, Ltd. Editing support system including an interactive interface
US5787198A (en) * 1992-11-24 1998-07-28 Lucent Technologies Inc. Text recognition using two-dimensional stochastic models
US5806032A (en) * 1996-06-14 1998-09-08 Lucent Technologies Inc. Compilation of weighted finite-state transducers from decision trees
US5835667A (en) * 1994-10-14 1998-11-10 Carnegie Mellon University Method and apparatus for creating a searchable digital video library and a system and method of using such a library
US5862259A (en) * 1996-03-27 1999-01-19 Caere Corporation Pattern recognition employing arbitrary segmentation and compound probabilistic evaluation
US5875108A (en) * 1991-12-23 1999-02-23 Hoffberg; Steven M. Ergonomic man-machine interface incorporating adaptive pattern recognition based control system
US5960447A (en) * 1995-11-13 1999-09-28 Holt; Douglas Word tagging and editing system for speech recognition
US5963940A (en) * 1995-08-16 1999-10-05 Syracuse University Natural language information retrieval system and method
US5970473A (en) * 1997-12-31 1999-10-19 At&T Corp. Video communication device providing in-home catalog services
US6006221A (en) * 1995-08-16 1999-12-21 Syracuse University Multilingual document retrieval system and method using semantic vector matching
US6006184A (en) * 1997-01-28 1999-12-21 Nec Corporation Tree structured cohort selection for speaker recognition system
US6024571A (en) * 1996-04-25 2000-02-15 Renegar; Janet Elaine Foreign language communication system/device and learning aid
US6029195A (en) * 1994-11-29 2000-02-22 Herz; Frederick S. M. System for customized electronic identification of desirable objects
US6029124A (en) * 1997-02-21 2000-02-22 Dragon Systems, Inc. Sequential, nonparametric speech recognition and speaker identification
US6052657A (en) * 1997-09-09 2000-04-18 Dragon Systems, Inc. Text segmentation and identification of topic using language models
US6064963A (en) * 1997-12-17 2000-05-16 Opus Telecom, L.L.C. Automatic key word or phrase speech recognition for the corrections industry
US6067517A (en) * 1996-02-02 2000-05-23 International Business Machines Corporation Transcription of speech data with segments from acoustically dissimilar environments
US6067514A (en) * 1998-06-23 2000-05-23 International Business Machines Corporation Method for automatically punctuating a speech utterance in a continuous speech recognition system
US6073096A (en) * 1998-02-04 2000-06-06 International Business Machines Corporation Speaker adaptation system and method based on class-specific pre-clustering training speakers
US6076053A (en) * 1998-05-21 2000-06-13 Lucent Technologies Inc. Methods and apparatus for discriminative training and adaptation of pronunciation networks
US6088669A (en) * 1997-01-28 2000-07-11 International Business Machines, Corporation Speech recognition with attempted speaker recognition for speaker model prefetching or alternative speech modeling
US6112172A (en) * 1998-03-31 2000-08-29 Dragon Systems, Inc. Interactive searching
US6151598A (en) * 1995-08-14 2000-11-21 Shaw; Venson M. Digital dictionary with a communication system for the creating, updating, editing, storing, maintaining, referencing, and managing the digital dictionary
US6161087A (en) * 1998-10-05 2000-12-12 Lernout & Hauspie Speech Products N.V. Speech-recognition-assisted selective suppression of silent and filled speech pauses during playback of an audio recording
US6169789B1 (en) * 1996-12-16 2001-01-02 Sanjay K. Rao Intelligent keyboard system
US6185531B1 (en) * 1997-01-09 2001-02-06 Gte Internetworking Incorporated Topic indexing method
US6219640B1 (en) * 1999-08-06 2001-04-17 International Business Machines Corporation Methods and apparatus for audio-visual speaker recognition and utterance verification
US6246983B1 (en) * 1998-08-05 2001-06-12 Matsushita Electric Corporation Of America Text-to-speech e-mail reader with multi-modal reply processor
US6253179B1 (en) * 1999-01-29 2001-06-26 International Business Machines Corporation Method and apparatus for multi-environment speaker verification
US20010026377A1 (en) * 2000-03-21 2001-10-04 Katsumi Ikegami Image display system, image registration terminal device and image reading terminal device used in the image display system
US6308222B1 (en) * 1996-06-03 2001-10-23 Microsoft Corporation Transcoding of audio data
US6317716B1 (en) * 1997-09-19 2001-11-13 Massachusetts Institute Of Technology Automatic cueing of speech
US20010051984A1 (en) * 1996-01-30 2001-12-13 Toshihiko Fukasawa Coordinative work environment construction system, method and medium therefor
US20020001261A1 (en) * 2000-04-21 2002-01-03 Yoshinori Matsui Data playback apparatus
US20020010575A1 (en) * 2000-04-08 2002-01-24 International Business Machines Corporation Method and system for the automatic segmentation of an audio stream into semantic or syntactic units
US20020010916A1 (en) * 2000-05-22 2002-01-24 Compaq Computer Corporation Apparatus and method for controlling rate of playback of audio data
US6345252B1 (en) * 1999-04-09 2002-02-05 International Business Machines Corporation Methods and apparatus for retrieving audio information using content and speaker information
US6347295B1 (en) * 1998-10-26 2002-02-12 Compaq Computer Corporation Computer method and apparatus for grapheme-to-phoneme rule-set-generation
US6360234B2 (en) * 1997-08-14 2002-03-19 Virage, Inc. Video cataloger system with synchronized encoders
US6360237B1 (en) * 1998-10-05 2002-03-19 Lernout & Hauspie Speech Products N.V. Method and system for performing text edits during audio recording playback
US6373985B1 (en) * 1998-08-12 2002-04-16 Lucent Technologies, Inc. E-mail signature block analysis
US20020049589A1 (en) * 2000-06-28 2002-04-25 Poirier Darrell A. Simultaneous multi-user real-time voice recognition system
US6381640B1 (en) * 1998-09-11 2002-04-30 Genesys Telecommunications Laboratories, Inc. Method and apparatus for automated personalization and presentation of workload assignments to agents within a multimedia communication center
US20020059204A1 (en) * 2000-07-28 2002-05-16 Harris Larry R. Distributed search system and method
US6434520B1 (en) * 1999-04-16 2002-08-13 International Business Machines Corporation System and method for indexing and querying audio archives
US6437818B1 (en) * 1993-10-01 2002-08-20 Collaboration Properties, Inc. Video conferencing on existing UTP infrastructure
US20020133477A1 (en) * 2001-03-05 2002-09-19 Glenn Abel Method for profile-based notice and broadcast of multimedia content
US6463444B1 (en) * 1997-08-14 2002-10-08 Virage, Inc. Video cataloger system with extensibility
US6480826B2 (en) * 1999-08-31 2002-11-12 Accenture Llp System and method for a telephonic emotion detection that provides operator feedback
US20030051214A1 (en) * 1997-12-22 2003-03-13 Ricoh Company, Ltd. Techniques for annotating portions of a document relevant to concepts of interest
US20030088414A1 (en) * 2001-05-10 2003-05-08 Chao-Shih Huang Background learning of speaker voices
US20030093580A1 (en) * 2001-11-09 2003-05-15 Koninklijke Philips Electronics N.V. Method and system for information alerts
US6567980B1 (en) * 1997-08-14 2003-05-20 Virage, Inc. Video cataloger system with hyperlinked output
US6571208B1 (en) * 1999-11-29 2003-05-27 Matsushita Electric Industrial Co., Ltd. Context-dependent acoustic models for medium and large vocabulary speech recognition with eigenvoice training
US6602300B2 (en) * 1998-02-03 2003-08-05 Fujitsu Limited Apparatus and method for retrieving data from a document database
US6604110B1 (en) * 2000-08-31 2003-08-05 Ascential Software, Inc. Automated software code generation from a metadata-based repository
US6611803B1 (en) * 1998-12-17 2003-08-26 Matsushita Electric Industrial Co., Ltd. Method and apparatus for retrieving a video and audio scene using an index generated by speech recognition
US20030167163A1 (en) * 2002-02-22 2003-09-04 Nec Research Institute, Inc. Inferring hierarchical descriptions of a set of documents
US6624826B1 (en) * 1999-09-28 2003-09-23 Ricoh Co., Ltd. Method and apparatus for generating visual representations for audio documents
US6647383B1 (en) * 2000-09-01 2003-11-11 Lucent Technologies Inc. System and method for providing interactive dialogue and iterative search functions to find information
US6654735B1 (en) * 1999-01-08 2003-11-25 International Business Machines Corporation Outbound information analysis for generating user interest profiles and improving user productivity
US20040024739A1 (en) * 1999-06-15 2004-02-05 Kanisa Inc. System and method for implementing a knowledge management system
US6708148B2 (en) * 2001-10-12 2004-03-16 Koninklijke Philips Electronics N.V. Correction device to mark parts of a recognized text
US6711541B1 (en) * 1999-09-07 2004-03-23 Matsushita Electric Industrial Co., Ltd. Technique for developing discriminative sound units for speech recognition and allophone modeling
US6714911B2 (en) * 2001-01-25 2004-03-30 Harcourt Assessment, Inc. Speech transcription and analysis system and method
US6718303B2 (en) * 1998-05-13 2004-04-06 International Business Machines Corporation Apparatus and method for automatically generating punctuation marks in continuous speech recognition
US6718305B1 (en) * 1999-03-19 2004-04-06 Koninklijke Philips Electronics N.V. Specifying a tree structure for speech recognizers using correlation between regression classes
US20040073444A1 (en) * 2001-01-16 2004-04-15 Li Li Peh Method and apparatus for a financial database structure
US6732183B1 (en) * 1996-12-31 2004-05-04 Broadware Technologies, Inc. Video and audio streaming for multiple users
US6748356B1 (en) * 2000-06-07 2004-06-08 International Business Machines Corporation Methods and apparatus for identifying unknown speakers using a hierarchical tree structure
US6778958B1 (en) * 1999-08-30 2004-08-17 International Business Machines Corporation Symbol insertion apparatus and method
US6792409B2 (en) * 1999-12-20 2004-09-14 Koninklijke Philips Electronics N.V. Synchronous reproduction in a speech recognition system
US6847961B2 (en) * 1999-06-30 2005-01-25 Silverbrook Research Pty Ltd Method and system for searching information using sensor with identifier
US20050060162A1 (en) * 2000-11-10 2005-03-17 Farhad Mohit Systems and methods for automatic identification and hyperlinking of words or other data items and for information retrieval using hyperlinked words or data items
US6922691B2 (en) * 2000-08-28 2005-07-26 Emotion, Inc. Method and apparatus for digital media management, retrieval, and collaboration
US6931376B2 (en) * 2000-07-20 2005-08-16 Microsoft Corporation Speech-related event notification system
US6961954B1 (en) * 1997-10-27 2005-11-01 The Mitre Corporation Automated segmentation, information extraction, summarization, and presentation of broadcast news
US6999918B2 (en) * 2002-09-20 2006-02-14 Motorola, Inc. Method and apparatus to facilitate correlating symbols to sounds
US20060129541A1 (en) * 2002-06-11 2006-06-15 Microsoft Corporation Dynamically updated quick searches and strategies
US7131117B2 (en) * 2002-09-04 2006-10-31 Sbc Properties, L.P. Method and system for automating the analysis of word frequencies
US7257528B1 (en) * 1998-02-13 2007-08-14 Zi Corporation Of Canada, Inc. Method and apparatus for Chinese character text input

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0693221B2 (en) 1985-06-12 1994-11-16 株式会社日立製作所 Voice input device
US4908868A (en) * 1989-02-21 1990-03-13 Mctaggart James E Phase polarity test instrument and method
US6978277B2 (en) 1989-10-26 2005-12-20 Encyclopaedia Britannica, Inc. Multimedia search system
JP2524472B2 (en) * 1992-09-21 1996-08-14 インターナショナル・ビジネス・マシーンズ・コーポレイション How to train a telephone line based speech recognition system
GB2285895A (en) 1994-01-19 1995-07-26 Ibm Audio conferencing system which generates a set of minutes
US5729656A (en) 1994-11-30 1998-03-17 International Business Machines Corporation Reduction of search space in speech recognition using phone boundaries and phone ranking
US5638487A (en) * 1994-12-30 1997-06-10 Purespeech, Inc. Automatic speech recognition
US6026388A (en) * 1995-08-16 2000-02-15 Textwise, Llc User interface and other enhancements for natural language information retrieval system and method
US20020002562A1 (en) 1995-11-03 2002-01-03 Thomas P. Moran Computer controlled display system using a graphical replay device to control playback of temporal data representing collaborative activities
KR100422263B1 (en) * 1996-02-27 2004-07-30 코닌클리케 필립스 일렉트로닉스 엔.브이. Method and apparatus for automatically dividing voice
US5778187A (en) 1996-05-09 1998-07-07 Netcast Communications Corp. Multicasting method and apparatus
US5897614A (en) * 1996-12-20 1999-04-27 International Business Machines Corporation Method and apparatus for sibilant classification in a speech recognition system
JP2001511991A (en) 1997-10-01 2001-08-14 エイ・ティ・アンド・ティ・コーポレーション Method and apparatus for storing and retrieving label interval data for multimedia records
SE511584C2 (en) 1998-01-15 1999-10-25 Ericsson Telefon Ab L M information Routing
US6327343B1 (en) 1998-01-16 2001-12-04 International Business Machines Corporation System and methods for automatic call and data transfer processing
US6243680B1 (en) * 1998-06-15 2001-06-05 Nortel Networks Limited Method and apparatus for obtaining a transcription of phrases through text and spoken utterances
US6332139B1 (en) 1998-11-09 2001-12-18 Mega Chips Corporation Information communication system
EP1166555B1 (en) 1999-03-30 2008-04-23 Tivo, Inc. Data storage management and scheduling system
EP1079313A3 (en) 1999-08-20 2005-10-19 Digitake Software Systems Limited An audio processing system
EP1185976B1 (en) * 2000-02-25 2006-08-16 Philips Electronics N.V. Speech recognition device with reference transformation means
JP2002008389A (en) * 2000-06-20 2002-01-11 Mitsubishi Electric Corp Semiconductor memory
WO2002010887A2 (en) 2000-07-28 2002-02-07 Jan Pathuel Method and system of securing data and systems
AU2000276394A1 (en) 2000-09-30 2002-04-15 Intel Corporation Method and system for generating and searching an optimal maximum likelihood decision tree for hidden markov model (hmm) based speech recognition
AU2000276397A1 (en) 2000-09-30 2002-04-15 Intel Corporation Method and system to scale down a decision tree-based hidden markov model (hmm) for speech recognition
US6934756B2 (en) 2000-11-01 2005-08-23 International Business Machines Corporation Conversational networking via transport, coding and control conversational protocols
US7221663B2 (en) 2001-12-31 2007-05-22 Polycom, Inc. Method and apparatus for wideband conferencing
US6973428B2 (en) 2001-05-24 2005-12-06 International Business Machines Corporation System and method for searching, analyzing and displaying text transcripts of speech after imperfect speech recognition
US6778979B2 (en) * 2001-08-13 2004-08-17 Xerox Corporation System for automatically generating queries
US6748350B2 (en) * 2001-09-27 2004-06-08 Intel Corporation Method to compensate for stress between heat spreader and thermal interface material
US7580838B2 (en) 2002-11-22 2009-08-25 Scansoft, Inc. Automatic insertion of non-verbalized punctuation

Patent Citations (99)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4908866A (en) * 1985-02-04 1990-03-13 Eric Goldwasser Speech transcribing system
US4879648A (en) * 1986-09-19 1989-11-07 Nancy P. Cochran Search system which continuously displays search terms during scrolling and selections of individually displayed data sets
US5418716A (en) * 1990-07-26 1995-05-23 Nec Corporation System for recognizing sentence patterns and a system for recognizing sentence patterns and grammatical cases
US5404295A (en) * 1990-08-16 1995-04-04 Katz; Boris Method and apparatus for utilizing annotations to facilitate computer retrieval of database material
US5317732A (en) * 1991-04-26 1994-05-31 Commodore Electronics Limited System for relocating a multimedia presentation on a different platform by extracting a resource map in order to remap and relocate resources
US5875108A (en) * 1991-12-23 1999-02-23 Hoffberg; Steven M. Ergonomic man-machine interface incorporating adaptive pattern recognition based control system
US5544257A (en) * 1992-01-08 1996-08-06 International Business Machines Corporation Continuous parameter hidden Markov model approach to automatic handwriting recognition
US5787198A (en) * 1992-11-24 1998-07-28 Lucent Technologies Inc. Text recognition using two-dimensional stochastic models
US6437818B1 (en) * 1993-10-01 2002-08-20 Collaboration Properties, Inc. Video conferencing on existing UTP infrastructure
US5572728A (en) * 1993-12-24 1996-11-05 Hitachi, Ltd. Conference multimedia summary support system and method
US5752021A (en) * 1994-05-24 1998-05-12 Fuji Xerox Co., Ltd. Document database management apparatus capable of conversion between retrieval formulae for different schemata
US5613032A (en) * 1994-09-02 1997-03-18 Bell Communications Research, Inc. System and method for recording, playing back and searching multimedia events wherein video, audio and text can be searched and retrieved
US5757960A (en) * 1994-09-30 1998-05-26 Murdock; Michael Chase Method and system for extracting features from handwritten text
US5768607A (en) * 1994-09-30 1998-06-16 Intel Corporation Method and apparatus for freehand annotation and drawings incorporating sound and for compressing and synchronizing sound
US5777614A (en) * 1994-10-14 1998-07-07 Hitachi, Ltd. Editing support system including an interactive interface
US5835667A (en) * 1994-10-14 1998-11-10 Carnegie Mellon University Method and apparatus for creating a searchable digital video library and a system and method of using such a library
US5614940A (en) * 1994-10-21 1997-03-25 Intel Corporation Method and apparatus for providing broadcast information with indexing
US6029195A (en) * 1994-11-29 2000-02-22 Herz; Frederick S. M. System for customized electronic identification of desirable objects
US5715367A (en) * 1995-01-23 1998-02-03 Dragon Systems, Inc. Apparatuses and methods for developing and using models for speech recognition
US5684924A (en) * 1995-05-19 1997-11-04 Kurzweil Applied Intelligence, Inc. User adaptable speech recognition system
US5559875A (en) * 1995-07-31 1996-09-24 Latitude Communications Method and apparatus for recording and retrieval of audio conferences
US6151598A (en) * 1995-08-14 2000-11-21 Shaw; Venson M. Digital dictionary with a communication system for the creating, updating, editing, storing, maintaining, referencing, and managing the digital dictionary
US5963940A (en) * 1995-08-16 1999-10-05 Syracuse University Natural language information retrieval system and method
US6006221A (en) * 1995-08-16 1999-12-21 Syracuse University Multilingual document retrieval system and method using semantic vector matching
US5960447A (en) * 1995-11-13 1999-09-28 Holt; Douglas Word tagging and editing system for speech recognition
US20010051984A1 (en) * 1996-01-30 2001-12-13 Toshihiko Fukasawa Coordinative work environment construction system, method and medium therefor
US6067517A (en) * 1996-02-02 2000-05-23 International Business Machines Corporation Transcription of speech data with segments from acoustically dissimilar environments
US5862259A (en) * 1996-03-27 1999-01-19 Caere Corporation Pattern recognition employing arbitrary segmentation and compound probabilistic evaluation
US6024571A (en) * 1996-04-25 2000-02-15 Renegar; Janet Elaine Foreign language communication system/device and learning aid
US6308222B1 (en) * 1996-06-03 2001-10-23 Microsoft Corporation Transcoding of audio data
US5806032A (en) * 1996-06-14 1998-09-08 Lucent Technologies Inc. Compilation of weighted finite-state transducers from decision trees
US6169789B1 (en) * 1996-12-16 2001-01-02 Sanjay K. Rao Intelligent keyboard system
US6732183B1 (en) * 1996-12-31 2004-05-04 Broadware Technologies, Inc. Video and audio streaming for multiple users
US6185531B1 (en) * 1997-01-09 2001-02-06 Gte Internetworking Incorporated Topic indexing method
US6088669A (en) * 1997-01-28 2000-07-11 International Business Machines, Corporation Speech recognition with attempted speaker recognition for speaker model prefetching or alternative speech modeling
US6006184A (en) * 1997-01-28 1999-12-21 Nec Corporation Tree structured cohort selection for speaker recognition system
US6029124A (en) * 1997-02-21 2000-02-22 Dragon Systems, Inc. Sequential, nonparametric speech recognition and speaker identification
US6463444B1 (en) * 1997-08-14 2002-10-08 Virage, Inc. Video cataloger system with extensibility
US6360234B2 (en) * 1997-08-14 2002-03-19 Virage, Inc. Video cataloger system with synchronized encoders
US6567980B1 (en) * 1997-08-14 2003-05-20 Virage, Inc. Video cataloger system with hyperlinked output
US6877134B1 (en) * 1997-08-14 2005-04-05 Virage, Inc. Integrated data and real-time metadata capture system and method
US6052657A (en) * 1997-09-09 2000-04-18 Dragon Systems, Inc. Text segmentation and identification of topic using language models
US6317716B1 (en) * 1997-09-19 2001-11-13 Massachusetts Institute Of Technology Automatic cueing of speech
US6961954B1 (en) * 1997-10-27 2005-11-01 The Mitre Corporation Automated segmentation, information extraction, summarization, and presentation of broadcast news
US6064963A (en) * 1997-12-17 2000-05-16 Opus Telecom, L.L.C. Automatic key word or phrase speech recognition for the corrections industry
US20030051214A1 (en) * 1997-12-22 2003-03-13 Ricoh Company, Ltd. Techniques for annotating portions of a document relevant to concepts of interest
US5970473A (en) * 1997-12-31 1999-10-19 At&T Corp. Video communication device providing in-home catalog services
US6602300B2 (en) * 1998-02-03 2003-08-05 Fujitsu Limited Apparatus and method for retrieving data from a document database
US6073096A (en) * 1998-02-04 2000-06-06 International Business Machines Corporation Speaker adaptation system and method based on class-specific pre-clustering training speakers
US7257528B1 (en) * 1998-02-13 2007-08-14 Zi Corporation Of Canada, Inc. Method and apparatus for Chinese character text input
US6112172A (en) * 1998-03-31 2000-08-29 Dragon Systems, Inc. Interactive searching
US6718303B2 (en) * 1998-05-13 2004-04-06 International Business Machines Corporation Apparatus and method for automatically generating punctuation marks in continuous speech recognition
US6076053A (en) * 1998-05-21 2000-06-13 Lucent Technologies Inc. Methods and apparatus for discriminative training and adaptation of pronunciation networks
US6067514A (en) * 1998-06-23 2000-05-23 International Business Machines Corporation Method for automatically punctuating a speech utterance in a continuous speech recognition system
US6246983B1 (en) * 1998-08-05 2001-06-12 Matsushita Electric Corporation Of America Text-to-speech e-mail reader with multi-modal reply processor
US6373985B1 (en) * 1998-08-12 2002-04-16 Lucent Technologies, Inc. E-mail signature block analysis
US6381640B1 (en) * 1998-09-11 2002-04-30 Genesys Telecommunications Laboratories, Inc. Method and apparatus for automated personalization and presentation of workload assignments to agents within a multimedia communication center
US6161087A (en) * 1998-10-05 2000-12-12 Lernout & Hauspie Speech Products N.V. Speech-recognition-assisted selective suppression of silent and filled speech pauses during playback of an audio recording
US6360237B1 (en) * 1998-10-05 2002-03-19 Lernout & Hauspie Speech Products N.V. Method and system for performing text edits during audio recording playback
US6347295B1 (en) * 1998-10-26 2002-02-12 Compaq Computer Corporation Computer method and apparatus for grapheme-to-phoneme rule-set-generation
US6728673B2 (en) * 1998-12-17 2004-04-27 Matsushita Electric Industrial Co., Ltd Method and apparatus for retrieving a video and audio scene using an index generated by speech recognition
US6611803B1 (en) * 1998-12-17 2003-08-26 Matsushita Electric Industrial Co., Ltd. Method and apparatus for retrieving a video and audio scene using an index generated by speech recognition
US6654735B1 (en) * 1999-01-08 2003-11-25 International Business Machines Corporation Outbound information analysis for generating user interest profiles and improving user productivity
US6253179B1 (en) * 1999-01-29 2001-06-26 International Business Machines Corporation Method and apparatus for multi-environment speaker verification
US6718305B1 (en) * 1999-03-19 2004-04-06 Koninklijke Philips Electronics N.V. Specifying a tree structure for speech recognizers using correlation between regression classes
US6345252B1 (en) * 1999-04-09 2002-02-05 International Business Machines Corporation Methods and apparatus for retrieving audio information using content and speaker information
US6434520B1 (en) * 1999-04-16 2002-08-13 International Business Machines Corporation System and method for indexing and querying audio archives
US20040024739A1 (en) * 1999-06-15 2004-02-05 Kanisa Inc. System and method for implementing a knowledge management system
US6847961B2 (en) * 1999-06-30 2005-01-25 Silverbrook Research Pty Ltd Method and system for searching information using sensor with identifier
US6219640B1 (en) * 1999-08-06 2001-04-17 International Business Machines Corporation Methods and apparatus for audio-visual speaker recognition and utterance verification
US6778958B1 (en) * 1999-08-30 2004-08-17 International Business Machines Corporation Symbol insertion apparatus and method
US6480826B2 (en) * 1999-08-31 2002-11-12 Accenture Llp System and method for a telephonic emotion detection that provides operator feedback
US6711541B1 (en) * 1999-09-07 2004-03-23 Matsushita Electric Industrial Co., Ltd. Technique for developing discriminative sound units for speech recognition and allophone modeling
US6624826B1 (en) * 1999-09-28 2003-09-23 Ricoh Co., Ltd. Method and apparatus for generating visual representations for audio documents
US6571208B1 (en) * 1999-11-29 2003-05-27 Matsushita Electric Industrial Co., Ltd. Context-dependent acoustic models for medium and large vocabulary speech recognition with eigenvoice training
US6792409B2 (en) * 1999-12-20 2004-09-14 Koninklijke Philips Electronics N.V. Synchronous reproduction in a speech recognition system
US20010026377A1 (en) * 2000-03-21 2001-10-04 Katsumi Ikegami Image display system, image registration terminal device and image reading terminal device used in the image display system
US20020010575A1 (en) * 2000-04-08 2002-01-24 International Business Machines Corporation Method and system for the automatic segmentation of an audio stream into semantic or syntactic units
US20020001261A1 (en) * 2000-04-21 2002-01-03 Yoshinori Matsui Data playback apparatus
US20020010916A1 (en) * 2000-05-22 2002-01-24 Compaq Computer Corporation Apparatus and method for controlling rate of playback of audio data
US6748356B1 (en) * 2000-06-07 2004-06-08 International Business Machines Corporation Methods and apparatus for identifying unknown speakers using a hierarchical tree structure
US20020049589A1 (en) * 2000-06-28 2002-04-25 Poirier Darrell A. Simultaneous multi-user real-time voice recognition system
US6931376B2 (en) * 2000-07-20 2005-08-16 Microsoft Corporation Speech-related event notification system
US20020059204A1 (en) * 2000-07-28 2002-05-16 Harris Larry R. Distributed search system and method
US6922691B2 (en) * 2000-08-28 2005-07-26 Emotion, Inc. Method and apparatus for digital media management, retrieval, and collaboration
US6604110B1 (en) * 2000-08-31 2003-08-05 Ascential Software, Inc. Automated software code generation from a metadata-based repository
US6647383B1 (en) * 2000-09-01 2003-11-11 Lucent Technologies Inc. System and method for providing interactive dialogue and iterative search functions to find information
US20050060162A1 (en) * 2000-11-10 2005-03-17 Farhad Mohit Systems and methods for automatic identification and hyperlinking of words or other data items and for information retrieval using hyperlinked words or data items
US20040073444A1 (en) * 2001-01-16 2004-04-15 Li Li Peh Method and apparatus for a financial database structure
US6714911B2 (en) * 2001-01-25 2004-03-30 Harcourt Assessment, Inc. Speech transcription and analysis system and method
US20020133477A1 (en) * 2001-03-05 2002-09-19 Glenn Abel Method for profile-based notice and broadcast of multimedia content
US7171360B2 (en) * 2001-05-10 2007-01-30 Koninklijke Philips Electronics N.V. Background learning of speaker voices
US20030088414A1 (en) * 2001-05-10 2003-05-08 Chao-Shih Huang Background learning of speaker voices
US6708148B2 (en) * 2001-10-12 2004-03-16 Koninklijke Philips Electronics N.V. Correction device to mark parts of a recognized text
US20030093580A1 (en) * 2001-11-09 2003-05-15 Koninklijke Philips Electronics N.V. Method and system for information alerts
US20030167163A1 (en) * 2002-02-22 2003-09-04 Nec Research Institute, Inc. Inferring hierarchical descriptions of a set of documents
US20060129541A1 (en) * 2002-06-11 2006-06-15 Microsoft Corporation Dynamically updated quick searches and strategies
US7131117B2 (en) * 2002-09-04 2006-10-31 Sbc Properties, L.P. Method and system for automating the analysis of word frequencies
US6999918B2 (en) * 2002-09-20 2006-02-14 Motorola, Inc. Method and apparatus to facilitate correlating symbols to sounds

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160323521A1 (en) * 2005-02-28 2016-11-03 Facebook, Inc. Titling apparatus, a titling method, and a machine readable medium storing thereon a computer program for titling
US8560327B2 (en) 2005-08-26 2013-10-15 Nuance Communications, Inc. System and method for synchronizing sound and manually transcribed text
US20080195370A1 (en) * 2005-08-26 2008-08-14 Koninklijke Philips Electronics, N.V. System and Method For Synchronizing Sound and Manually Transcribed Text
US8924216B2 (en) 2005-08-26 2014-12-30 Nuance Communications, Inc. System and method for synchronizing sound and manually transcribed text
US8756057B2 (en) * 2005-11-02 2014-06-17 Nuance Communications, Inc. System and method using feedback speech analysis for improving speaking ability
US9230562B2 (en) 2005-11-02 2016-01-05 Nuance Communications, Inc. System and method using feedback speech analysis for improving speaking ability
US20070100626A1 (en) * 2005-11-02 2007-05-03 International Business Machines Corporation System and method for improving speaking ability
US20120278071A1 (en) * 2011-04-29 2012-11-01 Nexidia Inc. Transcription system
US9774747B2 (en) * 2011-04-29 2017-09-26 Nexidia Inc. Transcription system
US20130030805A1 (en) * 2011-07-26 2013-01-31 Kabushiki Kaisha Toshiba Transcription support system and transcription support method
US20130030806A1 (en) * 2011-07-26 2013-01-31 Kabushiki Kaisha Toshiba Transcription support system and transcription support method
US9489946B2 (en) * 2011-07-26 2016-11-08 Kabushiki Kaisha Toshiba Transcription support system and transcription support method
US10304457B2 (en) * 2011-07-26 2019-05-28 Kabushiki Kaisha Toshiba Transcription support system and transcription support method
US20130080163A1 (en) * 2011-09-26 2013-03-28 Kabushiki Kaisha Toshiba Information processing apparatus, information processing method and computer program product
US20150006174A1 (en) * 2012-02-03 2015-01-01 Sony Corporation Information processing device, information processing method and program
US10339955B2 (en) * 2012-02-03 2019-07-02 Sony Corporation Information processing device and method for displaying subtitle information
US8676590B1 (en) 2012-09-26 2014-03-18 Google Inc. Web-based audio transcription tool
US11404049B2 (en) * 2019-12-09 2022-08-02 Microsoft Technology Licensing, Llc Interactive augmentation and integration of real-time speech-to-text

Also Published As

Publication number Publication date
US20040230432A1 (en) 2004-11-18
US20040083104A1 (en) 2004-04-29
US20040172250A1 (en) 2004-09-02
US7389229B2 (en) 2008-06-17
US7424427B2 (en) 2008-09-09
US7292977B2 (en) 2007-11-06
US20040083090A1 (en) 2004-04-29
US20050038649A1 (en) 2005-02-17
US20040176946A1 (en) 2004-09-09
US20040163034A1 (en) 2004-08-19
US20040204939A1 (en) 2004-10-14

Similar Documents

Publication Publication Date Title
US20040138894A1 (en) Speech transcription tool for efficient speech transcription
US20040006481A1 (en) Fast transcription of speech
KR100378898B1 (en) A pronunciation setting method, an articles of manufacture comprising a computer readable medium and, a graphical user interface system
US6334102B1 (en) Method of adding vocabulary to a speech recognition system
US5970448A (en) Historical database storing relationships of successively spoken words
US6327566B1 (en) Method and apparatus for correcting misinterpreted voice commands in a speech recognition system
US6760700B2 (en) Method and system for proofreading and correcting dictated text
Arons Hyperspeech: Navigating in speech-only hypermedia
US7693717B2 (en) Session file modification with annotation using speech recognition or text to speech
US7006973B1 (en) Providing information in response to spoken requests
US8374875B2 (en) Providing programming information in response to spoken requests
US6801897B2 (en) Method of providing concise forms of natural commands
MacWhinney Tools for analyzing talk part 2: The CLAN program
US20070244700A1 (en) Session File Modification with Selective Replacement of Session File Components
US11132108B2 (en) Dynamic system and method for content and topic based synchronization during presentations
JPH09185879A (en) Recording indexing method
JP2005537532A (en) Comprehensive development tool for building natural language understanding applications
US11049525B2 (en) Transcript-based insertion of secondary video content into primary video content
JP2008514983A (en) Interactive conversational conversations of cognitively overloaded users of devices
US6745165B2 (en) Method and apparatus for recognizing from here to here voice command structures in a finite grammar speech recognition system
KR20030065051A (en) A voice command interpreter with dialogue focus tracking function and method thereof
US8725505B2 (en) Verb error recovery in speech recognition
JP2009042968A (en) Information selection system, information selection method, and program for information selection
MacWhinney The childes project
US6741791B1 (en) Using speech to select a position in a program

Legal Events

Date Code Title Description
AS Assignment

Owner name: BBNT SOLUTIONS LLC, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KIECZA, DANIEL;KUBALA, FRANCIS;REEL/FRAME:014608/0723;SIGNING DATES FROM 20031002 TO 20031003

AS Assignment

Owner name: BBNT SOLUTIONS LLC, MASSACHUSETTS

Free format text: CORRECTED COVER SHEET TO CORRECT ASSIGNEE ADDRESS, PREVIOUSLY RECORDED AT REEL/FRAME 014608/0723 (ASSIGNMENT OF ASSIGNOR'S INTEREST);ASSIGNORS:KIECZA, DANIEL;KUBALA, FRANCIS G.;REEL/FRAME:015815/0624;SIGNING DATES FROM 20031002 TO 20031003

AS Assignment

Owner name: BBN TECHNOLOGIES CORP.,MASSACHUSETTS

Free format text: MERGER;ASSIGNOR:BBNT SOLUTIONS LLC;REEL/FRAME:017274/0318

Effective date: 20060103

Owner name: BBN TECHNOLOGIES CORP., MASSACHUSETTS

Free format text: MERGER;ASSIGNOR:BBNT SOLUTIONS LLC;REEL/FRAME:017274/0318

Effective date: 20060103

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: BBN TECHNOLOGIES CORP. (AS SUCCESSOR BY MERGER TO

Free format text: RELEASE OF SECURITY INTEREST;ASSIGNOR:BANK OF AMERICA, N.A. (SUCCESSOR BY MERGER TO FLEET NATIONAL BANK);REEL/FRAME:023427/0436

Effective date: 20091026