SimpleDS

A Simple Deep Reinforcement Learning Dialogue System

DESCRIPTION

SimpleDS is a simple dialogue system trained with deep reinforcement learning. In contrast to other dialogue systems, this system selects dialogue actions directly from raw (noisy) text of the last system and user responses. The motivation is to train dialogue agents with as little human intervention as possible.

This system runs under a client-server architecture, where the learning agent (in JavaScript) acts as the "server" and the environment (in Java) acts as the "client". They communicate by exchanging messages, where the server tells the client the action to execute, and the client tells the server the actions available, environment state and reward observed. SimpleDS is based on ConvNetJS, which implements the algorithm `Deep Q-Learning with experience replay' (Mnih, et al. 2013). SimpleDS is a dialogue system on top of ConvNetJS with support for multi-threaded and client-server processing, and fast learning via constrained search spaces.

This system has been tested with simulated and real dialogues using the Google Speech Recogniser. It has also been tested in three different languages: English, German and Spanish. SimpleDS is for experimental purposes, represents work in progress, and is therefore released without any guarantees.

SOFTWARE

This system was implemented and tested under Linux and Mac OS X with the following software -- though it should run in other operating systems with minor modifications.

  • Ubuntu 14.10.4 / Mac OS X 10.10
  • Java 1.8.0 or higher
  • Ant 1.9.3 or higher
  • Node 0.10.25 or higher
  • Octave 3.8.0 or higher
  • Android 4.4.3 (optional)

DOWNLOAD

You can download the system directly from the command line:

git clone https://github.com/cuayahuitl/SimpleDS.git

You can also download the system as a zip file using the following URL, and then unzip it in your path of preference. https://github.com/cuayahuitl/SimpleDS/archive/master.zip

COMPILATION

cd YourPath/SimpleDS

ant

EXECUTION

cd YourPath/SimpleDS

scripts/run.sh train

[From the command line, press Ctrl+C for termination]

or

cd YourPath/SimpleDS

scripts/run.sh test

[From the command line, press Ctrl+C for termination]

Alternatively (recommended), you can run the system from two terminals:

Terminal1:YourPath/SimpleDS>ant SimpleDS

Terminal2:YourPath/SimpleDS/web/main>nodejs runclient.js (train|test) [num_dialogues] [-v|-nv]

Alt text

For practical reasons, you can specify the number of dialogues and verbose mode from the command line. The values of these parameters would override the values specified in the file config.txt.

The outputs from the training phase consists in the learnt interaction policy (json file under the folder 'results/language'), and logged performance metrics (txt file under the 'results/language'). Depending on the config file, the metrics produce multiple rows with the following information: number of dialogues, average reward, epsilon value, number of actions per state, number of dialogues, and execution time (in hours). The outputs from the test phase are similar exept that no learnt policy is generated. In addition, executing the system in verbose mode would print out training/test dialogues -- according to the specified parameters.

PLOTTING

You can visualise a learning curve of the SimpleDS agent according to number of learning steps in the x-axis and average reward + learning time in the y-axis. Learning curves can be generated for newly trained or pre-trained policies in the currently supported languages (English, German and Spanish).

cd YourPath/SimpleDS

octave scripts/plotdata.m results/english/simpleds-output.txt

[From the command line, press the space bar key for termination]

or

cd YourPath/SimpleDS

octave scripts/plotdata.m results/english/simpleds-output.txt results/english/simpleds-output.png

[From the command line, press the space bar key for termination]

The latter generates an image of the plot in png (Portable Network Graphics) format. The file plotdata.m can also be used from Matlab if that software is prefered. The following learning curves (available from YourPath/results//.png) can be obtained with the default parameters for the supported languages: English, German and Spanish.

CONFIGURATION

The config file "YourPath/SimpleDialogueSystem/config.txt" has the following parameters:

Dialogues=Number of dialogues for training/test (positive integer)

Verbose=Shows compressed information or detailed info (false or true)

Language=Defines the (spoken) language to use (english, german, spanish)

SysResponses=Path and file name of system responses (e.g. resources/SysResponses.txt)

UsrResponses=Path and file name of system responses (e.g. resources/UsrResponses.txt)

SlotValues=Slot-value pairs of the system (e.g. resources/SlotValues.txt)

DemonstrationsPath=Path to the demonstration dialogues (e.g. data/)

DemonstrationsFile=Pointer to training instances from demonstrations (models/demonstrations.arff)

MinimumProbability=Minimum probability (>=0) for probable actions considered for action-selection

SlotsToConfirm=Number of slots to confirm (positive integer, e.g. 3)

OutputPath=The directory where the output files (policy and metrics) will be stored

NoiseLevel=Scores under this level (<=0.2) would receive distorsion to model noisy recognition

AddressPort=Address and port of the client socket (e.g. ws://localhost:8082/simpleds)

SavingFrequency=This number defines the frequency for policiy/output saving (positive integer)

NumInputs=This number defines the number of input nodes of the neural net (positive integer)

NumActions=This number defines the number of actions of the agent (positive integer)

LearningSteps=This number defines the number of time steps during learning (positive integer)

ExperienceSize=This number defines the size of the experience replay moemory (positive integer)

BurningSteps=This number defines the time steps with random action selection (positive integer)

DiscountFactor=This number defines gamma parameter also known as discount factor (real number)

MinimumEpsilon=This number defines the minimum epsilon during learning (real number)

BatchSize=This number defines the batch size (positive integer, e.g. 32 or 64)

AndroidSupport=This variable is used to test dialogues with a real speech recogniser (true or false)

SocketServerPort=This number defines the socket port used for communication with Android (positive integer)

You may want to set Verbose=false during training and Verbose=true during tests. You may also want to set a high number of dialogues during training (e.g. Dialogues=2000) and a low one during tests (e.g. Dialogues=1). You may want to change the system/user responses if you want different verbalisations. If this is the case, then you will also want to update the demonstration dialogues in the folder YourPath/SimpleDS/data/.

REFERENCES

SimpleDS has been applied to spoken dialogue systems and interactive games. See the following references for further information.

See "How to apply SimpleDS to interactive systems" if you would like to use SimpleDS in your own system.

COMMENTS/QUESTIONS/COLLABORATIONS?

Contact: Heriberto Cuayahuitl

Email: h.cuayahuitl@gmail.com



SimpleDS

简单的深度加固学习对话系统

说明

SimpleDS是一个简单的对话系统,经过深入的强化学习训练。与其他对话系统相比,该系统直接从最后一个系统的原始(嘈杂)文本和用户响应中选择对话操作。动机是尽可能少的人为干预来培训对话代理人

该系统以客户端 - 服务器架构运行,其中学习代理(以JavaScript方式)充当服务器,环境(以Java为例)充当客户端。他们通过交换消息进行通信,服务器告诉客户执行操作,客户端告诉服务器可用的操作,环境状态和观察到的奖励。 SimpleDS基于 ConvNetJS ,它实现了一种算法深度Q学习与经验重播(Mnih等人,2013)。 SimpleDS是ConvNetJS上的对话系统,支持多线程和客户端 - 服务器处理,并通过有限的搜索空间快速学习。

该系统已通过使用Google语音识别器的模拟和真实对话进行了测试。它也已经用三种不同的语言进行了测试:英语,德语和西班牙语。 SimpleDS用于实验目的,表示工作正在进行中,因此无需担保即可发布。

软件

该系统在Linux和Mac OS X下使用以下软件进行了实施和测试,尽管它应该在其他操作系统中运行,而且稍作修改。

  • Ubuntu 14.10.4 / Mac OS X 10.10
  • Java 1.8.0以上
  • Ant 1.9.3或更高版本
  • 节点0.10.25或更高
  • 八度3.8.0以上
  • Android 4.4.3(可选)

下载

您可以直接从命令行下载系统:

git clone https://github.com/cuayahuitl/SimpleDS.git

您也可以使用以下URL将系统下载为zip文件, 然后将其解压缩到您的偏好路径中。 https://github.com/cuayahuitl/SimpleDS/archive/master.zip

COMPILATION

cd YourPath/SimpleDS

ant

执行

cd YourPath/SimpleDS

scripts/run.sh train

[From the command line, press Ctrl+C for termination]

cd YourPath/SimpleDS

scripts/run.sh test

[From the command line, press Ctrl+C for termination]

或者(推荐),您可以从两个终端运行系统:

Terminal1:YourPath / SimpleDS&gt; ant SimpleDS

Terminal2:YourPath / SimpleDS / web / main&gt; nodejs runclient.js(train | test)[num_dialogues] [-v | -nv]

出于实际的原因,您可以从命令行指定对话和详细模式的数量。这些参数的值将覆盖文件config.txt中指定的值。

培训阶段的输出包括学习的交互策略(文件夹结果/语言下的json文件)和记录的性能指标(结果/语言下的txt文件)。根据配置文件,度量产生多行,具有以下信息:对话数,平均回报,epsilon值,每个状态的操作数,对话次数和执行时间(以小时为单位)。测试阶段的输出类似于没有学习策略的产生。另外,以详细模式执行系统会根据指定的参数打印训练/测试对话。

PLOTTING

您可以根据x轴上的学习步骤数量和y轴上的平均奖励+学习时间来显示SimpleDS代理的学习曲线。可以为目前支持的语言(英语,德语和西班牙语)的新培训或预训练的政策生成学习曲线。

cd YourPath/SimpleDS

octave scripts/plotdata.m results/english/simpleds-output.txt

[From the command line, press the space bar key for termination]

cd YourPath/SimpleDS

octave scripts/plotdata.m results/english/simpleds-output.txt results/english/simpleds-output.png

[From the command line, press the space bar key for termination]

后者以png(便携式网络图形)格式生成图形的图像。如果该软件是首选,也可以从Matlab使用文件plotdata.m。可以使用支持的语言的默认参数获得以下学习曲线(可从YourPath / results / / .png获取):英文德语西班牙语

CONFIGURATION

配置文件YourPath / SimpleDialogueSystem / config.txt具有以下参数:

Dialogues =训练/测试对话的数量(正整数)

Verbose =显示压缩信息或详细信息(false或true)

语言=定义使用的(口语)语言(英语,德语,西班牙语)

SysResponses =系统响应的路径和文件名(例如resources / SysResponses.txt)

UsrResponses =系统响应的路径和文件名(例如resources / UsrResponses.txt)

SlotValues =系统的槽值对(例如资源/ SlotValues.txt)

DemonstrationsPath =演示对话的路径(例如data /)

DemonstrationsFile =演示培训实例的指针(models / demoations.arff)

MinimumProbability =针对动作选择考虑的可能动作的最小概率(&gt; = 0)

SlotsToConfirm =要确认的插槽数(正整数,例如3)

OutputPath =输出文件(策略和度量)将被存储的目录

NoiseLevel =此级别下的评分(<= 0.2)将会收到模型噪声识别的变数

AddressPort =客户端套接字的地址和端口(例如ws:// localhost:8082 / simpleds)

SavingFrequency =此数字定义了policiy / output save(正整数)

的频率

NumInputs =该数字定义神经网络(正整数)的输入节点数量

NumActions =该数字定义了代理程序的数量(正整数)

LearningSteps =该数字定义学习期间的时间步数(正整数)

ExperienceSize =此数字定义体验回放动画(正整数)

的大小

BurningSteps =该数字定义随机动作选择(正整数)

的时间步长

DiscountFactor =此数字定义伽玛参数也称为折扣因子(实数)

MinimumEpsilon =此数字定义学习期间的最小epsilon(实数)

BatchSize =此数字定义批量大小(正整数,例如32或64)

AndroidSupport =此变量用于测试具有真实语音识别器(真或假)的对话

SocketServerPort =此数字定义用于与Android进行通信的套接字端口(正整数)

您可能需要在训练期间设置Verbose = false,并在测试期间设置Verbose = true。您可能还想在训练期间设置大量的对话(例如Dialogues = 2000),在测试期间设置低对话(例如Dialogues = 1)。如果您想要不同的语言,您可能需要更改系统/用户的响应。如果是这种情况,那么您还需要更新文件夹YourPath / SimpleDS / data /.

中的演示对话框

参考文献

SimpleDS已被应用于口语对话系统和互动游戏。有关更多信息,请参阅以下参考资料。

如果您想要如何将SimpleDS应用于交互式系统在您自己的系统中使用SimpleDS。

评论/问题/协作?

联系人: Heriberto Cuayahuitl

电子邮件: h.cuayahuitl@gmail.com




相关问题推荐