============================================================= Program: VoIP4Text 1.0 Author: Elmar Weiskopf Date: 24.10.2006 License info: free to use Supervised by: Michael Welzl ============================================================= Readme 1. Introduction to VoIP4Text 2. Hardware and software requirements 3. Download instructions 4. Installation instructions 4.1 ViaVoice installation 4.2 VoIP4Text prerequisites 4.3 Engine registration 4.4 Uninstallation 5. Using VoIP4Text 5.1 GUI description 5.2 Menu bar 5.3 Engine buttons 5.4 Status panel 5.5 Input text field 5.6 Log window 5.7 Call area 5.7.1 Calls 5.7.2 Call management 5.7.2 Call panel 6. Tutorial 1. Introduction to VoIP4Text ------------------------------------------------------------- VoIP4Text is an application which translates human speech into normal text, transmits this text to another participant and transforms it again to human language with voice synthesizing. Also it provides a routine for ordinary VoIP operation and a chat function too. As many VoIP applications are available with growing popularity today, one may ask why developing a program with characteristics mentioned above. Consider situations with limited or hardly any bandwidth available, such as disasters or blackouts. The infrastructure of many information systems will collapse due to heavy use of telephone and computer networks. Although more and more modern audio compression algorithms are available today, there is nothing more efficient in terms of low data throughput than transmitting pure text. One can transfer a couple of sentences with just one kilobyte. There is of course a significant drop in speech quality on the receiver’s side and some spoken words cannot be recognized for the very first time on the other side too, but with careful and systematic use of recognition facilities, it could be a useful tool somewhere, sometime. VoIP4Text uses FreeTTS as synthesizing and IBM Java-For-Speech as recognizing engine. FreeTTS libraries are already included in VoIP4Text but also can be downloaded freely from http://sourceforge.net/projects/freetts as they are open-source. IBM Java-For-Speech provides powerful recognizing capabilities for the application. Unfortunately, the Java¬-For-Speech project has been shut down, so the IBM Java classes aren’t available for download anymore, but have been included in VoIP4Text. 2. Hardware and software requirements ------------------------------------------------------------- Hardware: * Standard PC * Soundcard (for speech input/output) * Microphone (analog, USB) Software: * Windows (XP, 2000, 98, NT) * Java Runtime Environment 1.5.0 www.java.com * Java Media Framework 2.2.1e http://java.sun.com/products/java-media/jmf/ * IBM ViaVoice (possibly older versions) http://www.nuance.de/viavoice 3. Download instructions ------------------------------------------------------------- Please download VoIP4Text from heim.ifi.uio.no/michawe. This location provides additional information on this project too. Uncompress the downloaded zip file to a destination of your choice. 4. Installation instructions ------------------------------------------------------------- 4.1 ViaVoice installation As said before, ViaVoice has to be installed, just follow the installation wizard. In the final step, the wizard will ask you to train your voice by speaking a number of sentences, so that the recognition engine can adapt to your voice and speaking style. The ViaVoice options offer a profile saving feature, which creates an approximately 2 megabyte zip file. So, if you have a saved profile and use ViaVoice on another computer, you just have to import your profile. Finally you should be in the position to use the ViaVoice recognition engine. 4.2 VoIP4Text prerequisites Please note that the latest Java Runtime Environment (JRE) is required, otherwise the application won't start. Furthermore, Java Media Framework (JMF) is required to enable transmission of speech via RTP. Download JRE and JMF and install them. Both install routines should alter CLASSPATH and PATH settings appropriately. 4.3 Engine registration To enable JSAPI with FreeTTS enter the ‘lib/freetts’ directory of the VoIP4Text folder and execute ‘jsapi.exe’. Click on ‘I Agree’ and press ‘Close’ to complete this part. Set the Windows environment variables by double-clicking the ‘System’ icon in your control panel. Choose the ‘Advanced’ tab and then click on the button ‘Environment variables’. In the ‘System variables’ section create or edit the variable ‘PATH’ and append the following string: %VoIP4Text%\lib\ibmjs; where %VoIP4Text% is the VoIP4Text installation folder. For example if you installed VoIP4Text in ‘C:\Program Files\VoIP4Text’, the PATH entry would be C:\Program Files\VoIP4Text\lib\ibmjs; After you did these steps restart your system to apply the changes. Finally, enter the ‘lib’ directory of the installed JRE and open the ‘speech.properties’ file with a text editor. Append the following 2 lines (perhaps to already existing text) to register both engines. FreeTTSSynthEngineCentral=com.sun.speech.freetts.jsapi.FreeTTSEngineCentral com.ibm.speech.recognition.EngineCentral=com.ibm.speech.recognition.IBMEngineCentral 4.4 Uninstallation If you don't want the application on your PC anymore, remove the VoIP4Text folder. Additionally remove the PATH entries and the 2 added lines in the ‘speech.properties’ file. This doesn’t uninstall the IBM ViaVoice application, you also have to remove it for a totally cleaned system. 5. Using VoIP4Text ------------------------------------------------------------- 5.1 GUI Description Start VoIP4Text by double-clicking VoIP4Text.jar or by typing java –jar VoIP4Text.jar in a console window. This brings up the main window of VoIP4Text, which consists of the following parts: * Menu bar * Buttons to switch engines/speech transmission on/off * Status panel with information on the current call/engines * Text field to send text messages * Log window to display call and system messages * Call area with all existing calls 5.2 Menu bar In the following section all menu items are described: File ==== Restart Listeners (Ctrl-G): UDP- and TCP-Listeners can be restarted by selecting the items, press ‘OK’ to confirm. Exit VoIP4Text (Alt-F4): Exits the application and ends all running threads (all calls are terminated on both sides) Call ==== Start (Ctrl-T): Starts a new wizard for starting a new call. You can click ‘Next’ or ‘Back’ to get to the next/previous page. Enter the IP-address and select the transmission mode (UDP or TCP) on the first screen. You can also enable Nagle’s algorithm in addition to TCP mode, which has positive effects on the efficiency of data transmission. On the next wizard page enter a unique ID for the call. A call is identified with this ID externally and internally, mostly the ID will consist of the call partner’s name. ‘Output messages to synthesizer’ and ‘Display log messages’ enable speech output and log window output, respectively. Both items are enabled by default, but you can switch them off later in the popup menu of the call button. Also a representing call color can be selected. If not, a default color will be used. At the last page you can enter a message, which is displayed on your partner’s side when he clicks on the incoming call. Press ‘Finish’ to quit the wizard and start a call with the entered properties. Hang Up Call(s) (Ctrl-U): Shows a list where you can select between suitable calls to hang up. By clicking ‘Select All’ all calls are checked. Finally press ‘OK’ to hang up selected calls. Hung up calls are still shown in the call panel, so their log messages can still be saved. Remove Call(s) (Ctrl-E): Shows a list where you can select hung up calls to remove them permanently. Press ‘OK’ to confirm your choice. Switch Call (Ctrl-W): Shows a list of inactive calls, where you can select one to switch to. The currently running call will be forced to wait. Save Call(s) (Ctrl-S): Shows a list of all existing calls. You can save the log messages of selected calls in a text file (one file per call) with date, time, speaker and spoken message. Therefore, a file chooser window will appear for each selected call. Options ======= Preferences (Ctrl-P): The following ports can be changed within this dialog window: * UDP/TCP port * RTP local port receiver/remote port sender IMPORTANT: The application cannot receive and send data from only one port, so you have to adjust the RTP local/remote port settings to your call partner. Your call partner’s local receiver port should be equal to your remote sender port and vice versa. Otherwise RTP audio transmission will not work. You can also switch on/off the possibility to use the recognizer without a running call by selecting the according checkbox. The ‘RTP timeout’ value determines the time (in seconds) the application will wait for incoming RTP streams when you start RTP speech transmission. Enter the name you wish to be recognized in the log window messages in the appropriate box (‘User name’). Press ‘Apply’ or ‘OK’ to save settings. Help ==== Help (Ctrl-H): Shows this readme file. About (Ctrl-B): Shows some information about VoIP4Text. 5.3 Engine buttons There are 3 big buttons with icons in the upper left area of the main window. Every button switches an engine on or off. The current status of an engine can be seen by the actual state of the button: if an engine is running, the button appears pushed and the icon on the button is colored; when deactivating the look of the button will change to its initial look. Also messages in the log window will inform you about the current engine states. The engines can be turned on/off independently (except for some combinations, e.g. RTP speech transmission and a recognizer cannot run at the same time). Therefore, you can also use the program if your PC isn’t equipped with a soundcard or a microphone, as you can type in text messages with your keyboard. The first click on the synthesizer button will start the engine, following clicks will only pause/resume it. This is much faster as starting the engine completely new. This also applies to the recognizing engine. Additionally, a graphical view of the microphone volume is shown below the buttons on activation of the recognizer. This can possibly be helpful if you have problems with the microphone. These 2 engines can be enabled without starting a new call. The RTP speech transmission can only be activated if there exists a running call. When clicking on the ‘Speech Transmission’ button you will have the choice between different audio formats to use for this session, each with a different data rate. You can adapt your choice to your existing bandwidth, the call partners can also use 2 different audio formats (their formats are not bound together). Press ‘OK’ to confirm your choice. 5.4 Status panel The status panel includes information on the current running call (ID, IP-address, port, etc.). Furthermore, several adjustments can be done for all relevant engines (synthesizer, recognizer and speech transmission). Synthesizer and recognizer properties can only be changed if the respective engine is running. However, the ‘Speech Transmission’ tab can only be accessed before starting that mode. The tabs ‘Synthesizer’, ‘Recognizer’ and ‘Speech Transmission’ will be enabled/disabled according to the current states of the engines, whereas the tab ‘General’ is always accessible. For all synthesizer and recognizer adjustments there is a default button too, which sets the according value to its default. You can adjust the values by dragging the sliders. Synthesizer properties: Volume: adjusts speech output volume, ranges from 0 (muted) to 10 (maximum); default is 10 Words/min: adjusts the speed words and sentences are spoken, ranges from 0 to 400; default is 200 Pitch: adjusts the pitch of the voice, ranges from 50 (very low) to 200 (very high); default is 100 Range of voice: adjusts the pitch range of the voice, ranges from 0 (very monotonous) to 50 (excessively lively); default is 10 Recognizer properties: The following properties aren’t implemented (yet) in the IBM engine and dragging the sliders doesn’t have any effects. Complete timeout: determines the time of silence (in seconds) before the engine stops the current recognition process and publishes the result; ranges from 0 to 10 seconds; default is 1 Sensitivity: a higher value makes the recognizer more sensitive to (background) noise, a lower value requires a louder speaking voice, but most of the background noise will be ignored; ranges from 0 to 10; default is 5 Speed/Accuracy: sets the ratio between fast and accurate recognition; decreasing the value minimizes recognition time, increasing leads to an improved, but very slow recognition process; ranges from 0 to 10; default is 5 Speech Transmission properties: Audio capture engine: In Windows you can select between the DirectSound capture and the JavaSound capture engine. Both use the same audio compressions for transmission. 5.5 Input text field The long input text field provides a way to send text directly to the partner of the current running call, which gives the program chat capabilities. Type the text into the text field and press ‘Go’ to send it. 5.6 Log window The log window captures all incoming and outcoming messages of all calls (except of the manually excluded ones). Also system messages (synthesizer on/off,...) are shown. All call messages have the same format: Date->Time->Speaker->Message. Below the log window you have 3 buttons: ‘Save active call’ saves the current running call to a text file and ‘Save log window’ saves all messages to a text file. 5.7 Call area The call panel is the area on the bottom of the main window where all existing calls are shown. 5.7.1 Calls A call is displayed as a button with a telephone icon in a certain color. Next to the telephone there is the unique ID of the call. You have the possibility of managing various calls. A call is always in some state: RUNNING: When a call is in the RUNNING state, it is active and all inputs (speech and text) are sent to the IP-address of the call’s opponent. Only one call can be active at a time. If you set another call into the RUNNING state, the previous active call will be transferred into the WAITING state. A running call has a green telephone icon within the button. WAITING: When a Call is in the WAITING state, it is temporarily suspended (no input will be sent to the counterpart), but incoming messages will be received continuously. There is no limitation for waiting calls. Calls in this state are visualized by a blue telephone icon. INITIALIZED: When you start a call and your opponent hasn’t responded to it yet, it is in the INITIALIZED state. You cannot send text or speech to the call partner, until he accepts your call. INITIALIZED calls have a light yellow telephone icon. RECEIVED: An incoming call is in the RECEIVED state, until you accept that call (set it to RUNNING or WAITING state). If a new call has arrived you will hear a telephone ring and the call button will flash. Also received calls are indicated by a light yellow telephone icon. CANCELLED: A cancelled call is a hung up call, so it can’t take input anymore. However, as long as they aren’t removed permanently, you are still able to save the their log messages. These calls will have a gray telephone icon. 5.7.2 Call management As said before, only one call can be in the RUNNING state at a certain time, all other states don’t have such restrictions. If there is no active call, some options are grayed out (Speech transmission button, text input) and a running recognizer puts the recognized words only in the log window. If your opponent switches from your call to another call, you won’t notice it. You can still send messages, your counterpart only receives them in the log window. If a participant quits the call (or closes the application) the call will be hung up immediately on both sides. 5.7.3 Call panel Call buttons are grouped in the call panel and represent calls. You can click on such a button, regardless of the current state of the call. If you click on a call button which representing call is in the RECEIVED state, a dialog window will pop up, showing you a welcome message, a text field to enter the ID for this call and some buttons to change the state (activate, set waiting, hang up) and the main color of the call. On all other call states, a popup menu will be shown, with some call information and several menu items. Active menu items depend on the current state of the call. You will have the possibility to activate a call, set it to the WAITING state, hang up and remove the call permanently. What is more, you can save the call, change the color and decide, whether log messages of this call are shown and speech output from the synthesizer is enabled. 6. Tutorial ------------------------------------------------------------- To make it easier for you to use VoIP4Text, the following little tutorial has been written. After downloading and installing all required components according to paragraphs 3 and 4 of this readme file, start VoIP4Text and the main window appears. Adapt the RTP ports to your call partner, according to the ‘Preferences’ section in paragraph 5.2. Attach your microphone to an empty USB-port or to your soundcard. In the windows mixer window select the microphone line to capture audio. Starting a new call: Select from the menu ‘Call’ the item ‘Start’ or press Ctrl-T and a little wizard appears. Enter the desired IP-address of your call partner in the first box and select a transmission mode from the next combobox. UDP data transfer is the default mode for a call, so just leave it. Press ‘Next’ to get to the next page, where you have to enter a unique ID for the call. Leave the checkboxes selected and choose a call color by clicking the ‘Color’ button. After you hit ‘Next’ enter a welcome message, which will be displayed on your opponent’s side when it accepts your call. Press ‘Finish’. If any of your inputs was faulty, you will be informed by that, otherwise a new button with a light yellow telephone icon will appear at the bottom section of the main screen. Now you have to wait until your call partner accepts the call. If it doesn’t, you can quit the call by clicking on the button and select ‘Hang up’, otherwise the light yellow telephone icon will turn into a green one and the call will be activated immediately (forcing the current active call to wait). Accepting an incoming call: If someone starts a call to your IP-address, you will get a flashing button in the call panel. Click on the button to show a dialog window. If you don’t want to accept the call, press ‘Hang up’. Otherwise type in a welcome message and click on ‘Activate’ to accept the call and make it active or click on ‘Set Waiting’ to suspend it (e.g. you have an important call currently running). Enabling engines: Simply click on the big ‘Synthesizer’ button to enable speech output, do the same with the button ‘Recognizer’ to activate speech recognition. System messages in the log window will inform you on success. To hold a traditional telephone conversation press the button ‘Speech Transmission’. This will open a dialog window where you can set an audio codec. Select one and press ‘OK’. These steps also have to be done on your call partner’s side at the same time. The application will wait a certain amount of time (can be adjusted in the preferences window) for the connection to be set up. The log window will inform you about a successful establishment. To disable engines press the buttons again. Sending messages: With an active call and enabled engines you should be able to send and receive messages, get the incoming text synthesized and have your speech recognized. You can also type in messages directly in the ‘Input directly’ labeled text field. Exiting VoIP4Text: To exit the application simply press Alt-F4, press the x in the upper right window corner or select ‘Exit VoIP4Text’ from the ‘File’ menu. You don’t have to quit all calls manually, they will be closed instantly.