Chapter 7. JanusGraph Server

Janusgraph用 gremlin server 作为服务器组件来处理和回答client查询。当打包组装进janusgraph,gremlin server被叫做janusgraph server。

janusgraph server必须手动启动来使用。janusgraph server提供一种方式用来让1个或多个janusgraph实例远程执行gremlin脚本到server上。这章节将描述如何使用这个 WebSocket 配置。也会描述如何配置janusgraph server处理 HTTP 终端交互。

7.1 开始

7.1.1 用预先打包好的发布包

janusgraph的 版本 预配置了开箱即用的janusgraph server,使用 Cassandra and Elasticsearch 配置使用户可以快速使用server。这个配置默认让client应用可以通过定制子协议的 WebSocket 方式连接到server。有多种语言的client可以支持子协议。最熟悉的client就是gremlin console使用 WebSocket 接口。这里并不展示一个生产环境的安装,而是提供一个和server的开发环境,运行测试和看组件如何一起工作。使用下面这个默认配置:

  • 下载 janusgraph-$VERSION.zip
  • unzip到目录
  • 运行 bin/janusgraph.sh start ,将启动server with Cassandra/ES 到独立的进程。注意:安全考虑,es必须用非root启动。
  • $ bin/janusgraph.sh start
    Forking Cassandra...
    Running `nodetool statusthrift`.. OK (returned exit status 0 and printed string "running").
    Forking Elasticsearch...
    Connecting to Elasticsearch (127.0.0.1:9300)... OK (connected to 127.0.0.1:9300).
    Forking Gremlin-Server...
    Connecting to Gremlin-Server (127.0.0.1:8182)... OK (connected to 127.0.0.1:8182).
    Run gremlin.sh to connect.
    
    7.1.1.1 连接到server

    运行之后,server会监听WebSocket连接,最简单是用gremlin console测试。
    启动console用bin/gremlin.sh,用:remote用连接,用:>提交命令。

    $  bin/gremlin.sh
             \,,,/
             (o o)
    -----oOOo-(3)-oOOo-----
    plugin activated: tinkerpop.server
    plugin activated: tinkerpop.hadoop
    plugin activated: tinkerpop.utilities
    plugin activated: janusgraph.imports
    plugin activated: tinkerpop.tinkergraph
    gremlin> :remote connect tinkerpop.server conf/remote.yaml
    ==>Connected - localhost/127.0.0.1:8182
    gremlin> :> graph.addVertex("name", "stephen")
    ==>v[256]
    gremlin> :> g.V().values('name')
    ==>stephen
    

    :remote命令告诉consoler配置远程连接到server,参数conf/remote.yaml是连接的配置文件。文件中执行server运行在localhost
    :>是提交命令,发送命令行给active的server。默认远程连接是无会话的,意思是console每一行发出的都是独立的请求。多条连续命令可以一行用;隔开。相应的,你也可以让console处于一个会话,在创建连接时指定session。一个连接的session允许你跨多行命令复用变量。

    gremlin> :remote connect tinkerpop.server conf/remote.yaml
    ==>Configured localhost/127.0.0.1:8182
    gremlin> graph
    ==>standardjanusgraph[cql:[127.0.0.1]]
    gremlin> g
    ==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard]
    gremlin> g.V()
    gremlin> user = "Chris"
    ==>Chris
    gremlin> graph.addVertex("name", user)
    No such property: user for class: Script21
    Type ':help' or ':h' for help.
    Display stack trace? [yN]
    gremlin> :remote connect tinkerpop.server conf/remote.yaml session
    ==>Configured localhost/127.0.0.1:8182-[9acf239e-a3ed-4301-b33f-55c911e04052]
    gremlin> g.V()
    gremlin> user = "Chris"
    ==>Chris
    gremlin> user
    ==>Chris
    gremlin> graph.addVertex("name", user)
    ==>v[4344]
    gremlin> g.V().values('name')
    ==>Chris
    

    7.2 清理预打包的发布包

    如果你想启动全新的,清理之前的db和log,你可以用janusgraph.sh clean命令。运行clean之前server应该是stop的。

    $ cd /Path/to/janusgraph/janusgraph-0.2.0-hadoop2/
    $ ./bin/janusgraph.sh stop
    Killing Gremlin-Server (pid 91505)...
    Killing Elasticsearch (pid 91402)...
    Killing Cassandra (pid 91219)...
    $ ./bin/janusgraph.sh clean
    Are you sure you want to delete all stored data and logs? [y/N] y
    Deleted data in /Path/to/janusgraph/janusgraph-0.2.0-hadoop2/db
    Deleted logs in /Path/to/janusgraph/janusgraph-0.2.0-hadoop2/log
    

    7.3 janusgraph server作为一个WebSocket终端

    上面是默认配置,如果你想改变配置用你自己的存储如hbase,按照如下步骤:

  • test连接到db。用console或用一个程序连接test。在conf目录下修改特定的文件,例如./conf/janusgraph-hbase.properties。确保storage.backend, storage.hostnamestorage.hbase.table是正确的。更多配置说明可以看storage backends。确保指定文件包含下面这句:
  • gremlin.graph=org.janusgraph.core.JanusGraphFactory

  • 本地配置test通过,有了这个环境的配置文件,拷贝这个properties文件从conf到./conf/gremlin-server目录。
  • cp conf/janusgraph-hbase.properties conf/gremlin-server/socket-janusgraph-hbase-server.properties

  • 拷贝/conf/gremlin-server/gremlin-server.yaml到一个新文件,命名为socket-gremlin-server.yaml。这样做是以防之后你需要用到源文件。
  • cp conf/gremlin-server/gremlin-server.yaml conf/gremlin-server/socket-gremlin-server.yaml

    编辑socket-gremlin-server.yaml文件,修改以下内容:

  • 更新server的ip地址
  • host: 10.10.10.100

  • 更新graph的配置,指向你得properties文件,所以server能找到并连接到janusgraph实例。
  • graphs: {
    graph: conf/gremlin-server/socket-janusgraph-hbase-server.properties}

  • 启动server,并制定yaml文件

  • bin/gremlin-server.sh ./conf/gremlin-server/socket-gremlin-server.yaml

    不用要bin/janusgraph.sh,会启动默认配置。

  • server现在启动用WebSocket模式,可以用console测试连接。
  • 7.4 server作为HTTP终端

    默认配置是WebSocket终端,如果你想用HTTP,如下配置:

  • test连接到db。用console或用一个程序连接test。在conf目录下修改特定的文件,例如./conf/janusgraph-hbase.properties。确保storage.backend, storage.hostnamestorage.hbase.table是正确的。更多配置说明可以看storage backends。确保指定文件包含下面这句:
  • gremlin.graph=org.janusgraph.core.JanusGraphFactory

  • 本地配置test通过,有了这个环境的配置文件,拷贝这个properties文件从conf到./conf/gremlin-server目录。
  • cp conf/janusgraph-hbase.properties conf/gremlin-server/http-janusgraph-hbase-server.properties

  • 拷贝/conf/gremlin-server/gremlin-server.yaml到一个新文件,命名为http-gremlin-server.yaml。这样做是以防之后你需要用到源文件。
  • cp conf/gremlin-server/gremlin-server.yaml conf/gremlin-server/http-gremlin-server.yaml

    编辑http-gremlin-server.yaml文件,修改以下内容:

  • 更新server的ip地址
  • host: 10.10.10.100

  • 更新channelizerHttpChannelizer

    channelizer: org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer

  • 更新graph的配置,指向你得properties文件,所以server能找到并连接到janusgraph实例。
  • graphs: {
    graph: conf/gremlin-server/http-janusgraph-hbase-server.properties}

  • 启动server,并制定yaml文件

  • bin/gremlin-server.sh ./conf/gremlin-server/http-gremlin-server.yaml

  • server现在启动用HTTP模式,可以用curl测试连接。
  • curl -XPOST -Hcontent-type:application/json -d '{"gremlin":"g.V().count()"}' http://[IP for JanusGraph server host]:8182

    7.5 启动server用WebSocket和HTTP终端

    JanusGraph 0.2.0后,可以配置gremlin-server.yaml用同一个port接收WebSocket和HTTP。
    只需要channelizer更改一下:

    channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer

    7.6 高级server配置

    7.6.1 HTTP认证

    下面的例子,credentialsDb应该和graph用不同的db,应该配置正确的后端和不同的表、存储目录。这个图将用来存储username和password。

    7.6.1.1 HTTP基础认证

    gremlin-server.yaml中配置如下来激活基础认证:

    authentication: {
       authenticator: org.janusgraph.graphdb.tinkerpop.gremlin.server.auth.JanusGraphSimpleAuthenticator,
       authenticationHandler: org.apache.tinkerpop.gremlin.server.handler.HttpBasicAuthenticationHandler,
       config: {
         defaultUsername: user,
         defaultPassword: password,
         credentialsDb: conf/janusgraph-credentials-server.properties
    

    校验基础认证是否生效:

    curl -v -XPOST http://localhost:8182 -d '{"gremlin": "g.V().count()"}'

    应该返回401。

    curl -v -XPOST http://localhost:8182 -d '{"gremlin": "g.V().count()"}' -u user:password

    应该返回200和结果。

    7.6.2 WebSocket认证

    WebSocket认证通过一个简单认证和SASL机制。
    gremlin-server.yaml中配置SASL认证,

    authentication: {
      authenticator: org.janusgraph.graphdb.tinkerpop.gremlin.server.auth.JanusGraphSimpleAuthenticator,
      authenticationHandler: org.apache.tinkerpop.gremlin.server.handler.SaslAuthenticationHandler,
      config: {
        defaultUsername: user,
        defaultPassword: password,
        credentialsDb: conf/janusgraph-credentials-server.properties
    

    如果你用console连接,你的yaml文件需要加上:

    username: user
    password: password
    
    7.6.3 HTTP和WebSocket认证

    如果你用综合的channelizer,你可以用SaslAndHMACAuthenticator两种都校验,WebSocket用SASL,HTTP用HMAC。第一次需要通过session请求HMAC的token。
    gremlin-server.yaml应该配置:

    authentication: {
      authenticator: org.janusgraph.graphdb.tinkerpop.gremlin.server.auth.SaslAndHMACAuthenticator,
      authenticationHandler: org.janusgraph.graphdb.tinkerpop.gremlin.server.handler.SaslAndHMACAuthenticationHandler,
      config: {
        defaultUsername: user,
        defaultPassword: password,
        hmacSecret: secret,
        credentialsDb: conf/janusgraph-credentials-server.properties
    

    hmacSecret应该是一样的,如果你想用同样的HMAC token在每一个server。

    HMAC认证,默认每个serssion终端token会1小时过期,可以在配置文件authentication.config中的tokenTimeout配置超时时间,是个毫秒的long值。

    可以如下请求获取token:

    curl http://localhost:8182/session -XGET -u user:password

    {"token": "dXNlcjoxNTA5NTQ2NjI0NDUzOkhrclhYaGhRVG9KTnVSRXJ5U2VpdndhalJRcVBtWEpSMzh5WldqRTM4MW89"}

    使用时加到header里Authorization: Token

    curl -v http://localhost:8182/session -XPOST -d '{"gremlin": "g.V().count()"}' -H "Authorization: Token dXNlcjoxNTA5NTQ2NjI0NDUzOkhrclhYaGhRVG9KTnVSRXJ5U2VpdndhalJRcVBtWEpSMzh5WldqRTM4MW89"

    7.6.4 使用TinkerPop gremlin server
    暂不重要,未翻译
    Since JanusGraph Server is a TinkerPop Gremlin Server packaged with configuration files for JanusGraph, a version compatible TinkerPop Gremlin Server can be downloaded separately and used with JanusGraph. Get started by downloading the appropriate version of Gremlin Server, which needs to match a version supported by the JanusGraph version in use (3.3.3).
    [Important] Important
    Any references to file paths in this section refer to paths under a TinkerPop distribution for Gremlin Server and not a JanusGraph distribution with the JanusGraph Server, unless specifically noted.
    Configuring a standalone Gremlin Server to work with JanusGraph is similar to configuring the packaged JanusGraph Server. You should be familiar with graph configuration. Basically, the Gremlin Server yaml file points to graph-specific configuration files that are used to instantiate JanusGraph instances that it will then host. In order to instantiate these Graph instances, Gremlin Server requires that the appropriate libraries and dependencies for the JanusGraph be available on its classpath.
    For purposes of demonstration, these instructions will outline how to configure the BerkeleyDB backend for JanusGraph in Gremlin Server. As stated earlier, Gremlin Server needs JanusGraph dependencies on its classpath. Invoke the following command replacing $VERSION with the version of JanusGraph to use:
    bin/gremlin-server.sh -i org.janusgraph janusgraph-all $VERSION
    When this process completes, Gremlin Server should now have all the JanusGraph dependencies available to it and will thus be able to instantiate JanusGraph objects.
    [Important] Important
    The above command uses Groovy Grape and if it is not configured properly download errors may ensue. Please refer to this section of the TinkerPop documentation for more information around setting up ~/.groovy/grapeConfig.xml.
    Create a file called GREMLIN_SERVER_HOME/conf/janusgraph.properties with the following contents:
    gremlin.graph=org.janusgraph.core.JanusGraphFactory
    storage.backend=berkeleyje
    storage.directory=db/berkeley
    Configuration of other backends is similar. See Part III, “Storage Backends”. If using Cassandra, then use Cassandra configuration options in the janusgraph.properties file. The only important piece to leave unchanged is the gremlin.graph setting which should always use JanusGraphFactory. This setting tells Gremlin Server how to instantiate a JanusGraph instance.
    Next create a file called GREMLIN_SERVER_HOME/conf/gremlin-server-janusgraph.yaml that has the following contents:
    host: localhost
    port: 8182
    graphs: {
      graph: conf/janusgraph.properties}
    scriptEngines: {
      gremlin-groovy: {
        plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
                   org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
                   org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
                   org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
                   org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}}}}
    serializers:
      - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
      - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
      - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
    metrics: {
      slf4jReporter: {enabled: true, interval: 180000}}
    There are several important parts to this configuration file as they relate to JanusGraph.
    In the graphs map, there is a key called graph and its value is conf/janusgraph.properties. This tells Gremlin Server to instantiate a Graph instance called "graph" and use the conf/janusgraph.properties file to configure it. The "graph" key becomes the unique name for the Graph instance in Gremlin Server and it can be referenced as such in the scripts submitted to it.
    In the plugins list, there is a reference to JanusGraphGremlinPlugin, which tells Gremlin Server to initialize the "JanusGraph Plugin". The "JanusGraph Plugin" will auto-import JanusGraph specific classes for usage in scripts.
    Note the scripts key and the reference to scripts/janusgraph.groovy. This Groovy file is an initialization script for Gremlin Server and that particular ScriptEngine. Create scripts/janusgraph.groovy with the following contents:
    def globals = [:]
    globals << [g : graph.traversal()]
    The above script creates a Map called globals and assigns to it a key/value pair. The key is g and its value is a TraversalSource generated from graph, which was configured for Gremlin Server in its configuration file. At this point, there are now two global variables available to scripts provided to Gremlin Server - graph and g.
    At this point, Gremlin Server is configured and can be used to connect to a new or existing JanusGraph database. To start the server:
    $ bin/gremlin-server.sh conf/gremlin-server-janusgraph.yaml
    [INFO] GremlinServer -
             \,,,/
             (o o)
    -----oOOo-(3)-oOOo-----
    [INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server-janusgraph.yaml
    [INFO] MetricManager - Configured Metrics Slf4jReporter configured with interval=180000ms and loggerName=org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics
    [INFO] GraphDatabaseConfiguration - Set default timestamp provider MICRO
    [INFO] GraphDatabaseConfiguration - Generated unique-instance-id=7f0000016240-ubuntu1
    [INFO] Backend - Initiated backend operations thread pool of size 8
    [INFO] KCVSLog$MessagePuller - Loaded unidentified ReadMarker start time 2015-10-02T12:28:24.411Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@35399441
    [INFO] GraphManager - Graph [graph] was successfully configured via [conf/janusgraph.properties].
    [INFO] ServerGremlinExecutor - Initialized Gremlin thread pool.  Threads in pool named with pattern gremlin-*
    [INFO] ScriptEngines - Loaded gremlin-groovy ScriptEngine
    [INFO] GremlinExecutor - Initialized gremlin-groovy ScriptEngine with scripts/janusgraph.groovy
    [INFO] ServerGremlinExecutor - Initialized GremlinExecutor and configured ScriptEngines.
    [INFO] ServerGremlinExecutor - A GraphTraversalSource is now bound to [g] with graphtraversalsource[standardjanusgraph[berkeleyje:db/berkeley], standard]
    [INFO] AbstractChannelizer - Configured application/vnd.gremlin-v3.0+gryo with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
    [INFO] AbstractChannelizer - Configured application/vnd.gremlin-v3.0+gryo-stringd with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
    [INFO] GremlinServer$1 - Gremlin Server configured with worker thread pool of 1, gremlin pool of 8 and boss thread pool of 1.
    [INFO] GremlinServer$1 - Channel started at port 8182.
    The following section explains how to connect to the running server.
    7.6.4.1. Connecting to JanusGraph via Gremlin Server
    Gremlin Server will be ready to listen for WebSocket connections when it is started. The easiest way to test the connection is with Gremlin Console.
    Follow the instructions here Section 7.1.1.1, “Connecting to Gremlin Server” to verify the Gremlin Server is working.
    [Important] Important
    A difference you should understand is that when working with JanusGraph Server, the Gremlin Console is started from underneath the JanusGraph distribution and when following the test instructions here for a standalone Gremlin Server, the Gremlin Console is started from under the TinkerPop distribution.
    GryoMapper mapper = GryoMapper.build().addRegistry(JanusGraphIoRegistry.INSTANCE).create();
    Cluster cluster = Cluster.build().serializer(new GryoMessageSerializerV3d0(mapper)).create();
    Client client = cluster.connect();
    client.submit("g.V()").all().get();