Table of Contents
This guide contains configuration information for system administrators working with GRAM5. It describes procedures typically performed by system administrators, including GRAM5 software installation, configuration, testing, and debugging. Readers should be familiar with the GRAM5 Key Concepts to understand the motivation for and interaction between the various deployed components.
Table of Contents
The Globus Toolkit provides GRAM5: a service to submit, monitor, and cancel jobs on Grid computing resources. In GRAM5, a job consists of a computation and, optionally, file transfer and management operations related to the computation. Some users, particularly interactive ones, benefit from accessing output data files as the job is running. Monitoring consists of querying for and/or subscribing to status information, such as job state changes.
GRAM5 relies on GSI C mechanisms for security, and interacts with GridFTP services to stage files to compute resources. Please see their respective Administrator’s guides for information about installing, configuring, and managing those systems. In particular, you must understand the tasks in Installing GT 6.0 and install the basic GRAM5 packages, and complete the tasks in Basic Security Configuration.
Before installing GRAM5 on a server, you’ll first need to plan what
Local Resource
Managers (LRMs) you want GRAM5 to interface with,
what LRM you want to have as your default GRAM5 service, and whether
you’ll be using the globus-scheduler-event-generator
to process
LRM events.
GRAM5 requires a few services to be running to function: the Gatekeeper
and the Scheduler Event Generator (SEG). The supported way to run these
services is via the System-V style init scripts provided with the
GRAM5-related packages. The gatekeeper daemon can also be configured to
start via an internet superserver such as inetd
or
xinetd
though that is beyond the scope of this document. The
globus-scheduler-event-generator
can not be run in that way.
GRAM5 in GT 6.0 supports the following LRM adapters: Condor, PBS, GridEngine, and Fork. These LRM adapters translate GRAM5 job specifications into LRM-specific job descriptions and scripts to run them, as well as interfaces to the LRM to determine job termination status.
If you’re not familiar with the supported LRMs, you might want to start with the Fork one to get familiar with how GRAM5 works. This adapter simply forks the job and runs it on the GRAM5 node. You can then install one of the other LRMs and its adapter to provide batch or high-throughput job scheduling.
GRAM5 can be configured to support multiple LRMs on the same service
machine. In that case, one LRM is typically configured as the default
LRM which is used when a client uses a shortened version of a GRAM5
resource
name. A common configuration is to configure a batch system interface
as the default, and provide the jobmanager-fork
service as well for
simple jobs, such as creating directories or staging data.
GRAM5 has two ways of determining job state transitions: polling the LRM
and using the Scheduler Event Generator (SEG) service. When polling,
each user’s globus-job-manager
will periodically execute an
LRM-specific command to determine the state of each job. On systems with
many users, or with users submitting a large number of jobs, this can
cause significant resource use on the GRAM5 service machine. Instead,
the GRAM5 service can be configured (on a per-LRM basis) to use the
globus-scheduler-event-generator
service to more efficiently
process LRM state changes. [NOTE]
Not all LRM adapters provide an interface to the globus-scheduler-event-generator
, and some require LRM-specific configuration to work properly. This is described in more detail.
There are several LRM adapters included in the GT 6.0. For some, there
is a -setup-poll
and -setup-seg
package which installs the
adapter and configuration file needed for job status via polling or the
globus-scheduler-event-generator
program.
There are three ways to get LRM adapters: as RPM packages, as Debian packages, and from the source installer. These installation methods are described in Installing GT 6.0.
LRM adapter packages included in the GT 6.0 release are:
Table 1.1. GRAM5 LRM Adapters
LRM Adapter | Poll Package | SEG Package | Installer Target |
---|---|---|---|
fork | globus-gram-job-manager-fork-setup-poll | globus-gram-job-manager-fork-setup-seg [a] | globus_gram_job_manager_fork |
globus-gram-job-manager-pbs-setup-poll | globus-gram-job-manager-pbs-setup-seg | globus_gram_job_manager_pbs | |
N/A | globus-gram-job-manager-condor [b] | globus_gram_job_manager_condor | |
GridEngine | globus-gram-job-manager-sge-setup-poll | globus-gram-job-manager-sge-setup-seg | globus_gram_job_manager_sge |
[a] Not recommended for production use [b] This LRM uses a SEG-like mechanism
included in the |
Table of Contents
There are several tools provided with GT 6.0 to manage GRAM5, as well as
OS-specific tools to start and stop some of the services. There are
tools to manage user authorization, which services are enabled, which
scheduler event generator modules are enabled, and to test the
globus-gatekeeper
service.
Before a user may interact with the GRAM5 service to submit jobs, he or
she must be authorized to use the service. In order to be authorized, a
GRAM5 administrator must add the user’s credential name and local
account mapping to the /etc/grid-mapfile
. This can be done using the
. This can be done using the
grid-mapfile-add-entry
and
grid-mapfile-delete-entry
tools. For more
information, see the GSI C manual.
In order to run the service, the globus-gatekeeper
, and, if
applicable to your configuration, the
globus-scheduler-event-generator
services must be running on
your system. The packages for these services include init scripts and
configuration files which can be used to configure, start, and stop the
service.
The globus-gatekeeper
and
globus-scheduler-event-generator
init scripts handle the
following actions: start
, stop
, status
, restart
,
condrestart
, try-restart
, reload
, and force-reload
. The
globus-scheduler-event-generator
script also accepts another
optional parameter to start or stop a particular
globus-scheduler-event-generator
module. If the second parameter
is not present, then all services will be acted on.
If you installed using Debian packaging tools, then the services will
automatically be started upon installation. To start or stop the
service, use the command invoke-rc.d
with the service name and
action.
If you installed using the RPM packaging tools, then the services will be installed but not enabled by default. To enable the services to start at boot time, use the commands:
# chkconfig globus-gatekeeper on # chkconfig globus-scheduler-event-generator on
To start or stop the services, use the service
command to run
the init scripts with the service name and action and optional
globus-scheduler-event-generator
module.
The GRAM5 packages described in
Installing LRM Adapter Packages will
automatically register
themselves with the globus-gatekeeper
and
globus-scheduler-event-generator
services. The first LRM adapter
installed will be configured as the default Job Manager service. To list
the installed services, change the default, or disable a service, use
the globus-gatekeeper-admin
tool.
Example 2.1. Using globus-gatekeeper-admin to set the default service
This example shows how to use the globus-gatekeeper-admin
tool
to list the available services and then choose one as the default:
# globus-gatekeeper-admin -l jobmanager-condor [ENABLED] jobmanager-fork-poll [ENABLED] jobmanager-fork [ALIAS to jobmanager-fork-poll] # globus-gatekeeper-admin -e jobmanager-condor -n jobmanager # globus-gatekeeper-admin -l jobmanager-condor [ENABLED] jobmanager-fork-poll [ENABLED] jobmanager [ALIAS to jobmanager-condor] jobmanager-fork [ALIAS to jobmanager-fork-poll]
The -setup-seg
packages described in
Installing LRM Adapter Packages will
automatically register
themselves with the globus-scheduler-event-generator
service. To
disable a module from running when the
globus-scheduler-event-generator
service is started, use the
globus-scheduler-event-generator-admin
tool.
.Using globus-scheduler-event-generator-admin to disable a SEG module
This example shows how to stop the pbs
globus-scheduler-event-generator
module and disable it so it
will not restart when the system is rebooted:
# /etc/init.d/globus-scheduler-event-generator stop pbs Stopped globus-scheduler-event-generator [ OK ] # globus-scheduler-event-generator-admin -d pbs # globus-scheduler-event-generator-admin -l pbs [DISABLED]
Table of Contents
GRAM5 is designed to be usable by default without any manual configuration. However, there are many ways to customize a GRAM5 installation to better interact with site policies, filesystem layouts, LRM interactions, logging, and auditing. In addition to GRAM5-specific configuration, see Configuring GSI for information about configuring GSI security.
The globus-gatekeeper
has many configuration options related to
network configuration, security, logging, service path, and nice level.
This configuration is located in:
/etc/sysconfig/globus-gatekeeper
/etc/default/globus-gatekeeper
/etc/globus-gatekeeper.conf
The following configuration variables are available in the
globus-gatekeeper
configuration file:
globus-gatekeeper
uses the default of 2119
.
globus-gatekeeper
uses the paths defined at package compilation time.
globus-gatekeeper
logs to syslog using the GRAM-gatekeeper
log identification prefix. The default configuration value is /var/log/globus-gatekeeper.log
globus-gatekeeper
uses the default of /etc/grid-services
..
globus-gatekeeper
uses the default of /etc/grid-security/grid-mapfile
..
globus-gatekeeper
uses the default of /etc/grid-security/certificates
..
globus-gatekeeper
uses the default of /etc/grid-security/hostcert.pem
..
globus-gatekeeper
uses the default of /etc/grid-security/hostkey.pem
..
globus-gatekeeper
will use a kerberos GSSAPI implementation instead of the GSI GSSAPI implementation (untested).
globus-gatekeeper
's process ID is written. If not set, globus-gatekeeper
uses /var/run/globus-gatekeeper.pid
globus-gatekeeper
and globus-job-manager
processes. If not set, the default system process nice level is used.
After modifying the configuration file, restart the
globus-gatekeeper
using the methods described in
Starting
and Stopping GRAM5 services.
The globus-scheduler-event-generator
has several configuration
options related to filesystem paths. This configuration is located in:
/etc/sysconfig/globus-scheduler-event-generator
/etc/default/globus-scheduler-event-generator
/etc/globus-scheduler-event-generator.conf
The following configuration variables are available in the
globus-scheduler-event-generator
configuration file:
globus-scheduler-event-generator
writes its process IDs (one per configured LRM). The format is a printf
format string with one %s
to be replaced by the LRM name. By default, globus-scheduler-event-generator
uses /var/run/globus-scheduler-event-generator-%s.pid
..
globus-scheduler-event-generator
writes its event logs. The format is a printf
format string with one %s
to be replaced by the LRM name. By default, globus-scheduler-event-generator
uses /var/lib/globus/globus-seg-%s
. If you modify this value, you’ll need to also update the LRM configuration file to look for the log file in the new location.. If you modify this value, you’ll need to also update the LRM configuration file to look for the log file in the new location.
globus-scheduler-event-generator
processes. If not set, the default system process nice level is used.
After modifying the configuration file, restart the
globus-scheduler-event-generator
using the methods described in
Starting
and Stopping GRAM5 services.
The globus-job-manager
process is started by the
globus-gatekeeper
and uses the configuration defined in the
service entry for the resource name. By default, these service entries
use a common configuration file for most job manager features. This
configuration is located in:
/etc/globus/globus-gram-job-manager.conf
/etc/globus/globus-gram-job-manager.conf
/etc/globus-gram-job-manager.conf
This configuration file is used to construct the command-line options
for the globus-job-manager
program. Thus, all of the options
described in globus-job-manager may be
used.
From an administrator’s perspective, the most important job manager
configuration options are likely the ones related to logging and
auditing. The default GRAM5 configuration puts logs in
/var/log/globus/gram_USERNAME.log
, with logging enabled at the ,
with logging enabled at the FATAL
and ERROR
levels. To enable
more fine-grained logging, add the option -log-levels ' to
/etc/globus/globus-gram-job-manager.conf
. The value for . The value
for 'LEVELS is a set of log levels joined by the |
character. The
available log levels are:
Table 3.1. GRAM5 Log Levels
Level | Meaning | Default Behavior |
---|---|---|
| Problems which cause the job manager to terminate prematurely. | Enabled |
| Problems which cause a job or operation to fail. | Enabled |
| Problems which cause minor problems with job execution or monitoring. | Disabled |
| Major events in the lifetime of the job manager and its jobs. | Disabled |
| Minor events in the lifetime of jobs. | Disabled |
| Job processing details. | Disabled |
In RPM or Debian package installs, these logs will be configured to be
rotated via logrotate
. See
/etc/logrotate.d/globus-job-manager
for details on the default log
rotation configuration. for details on the default log rotation
configuration.
There are also a few configuration options related to the TCP ports the the Job Manager users. This port configuration is useful when dealing with firewalls that restrict incoming or outgoing ports. To restrict incoming ports (those that the Job Manager listens on), add the command-line option -globus-tcp-port-range to the Job Manager configuration file like this:
-globus-tcp-port-range MIN-PORT,MAX-PORT
Where MIN-PORT is the minimum TCP port number the Job Manager will listen on and MAX-PORT is the maximum TCP port number the Job Manager will listen on.
Similarly, to restrict the outgoing port numbers that the job manager connects form, use the command-line option -globus-tcp-source-range, like this:
-globus-tcp-source-range MIN-PORT,MAX-PORT
Where MIN-PORT is the minimum outgoing TCP port number the Job Manager will use and MAX-PORT is the maximum TCP outgoing port number the Job Manager will use.
For more information about Globus and firewalls, see Firewall configuration.
Each LRM adapter has its own configuration file which can help customize
the adapter to the site configuration. Some LRMs use non-standard
programs to launch parallel or MPI jobs, and some might want to provide
queue or project validation to make it easier to translate job failures
into problems that can be described by GRAM5. All of the LRM adapter
configuration files consist of simple variable="value"
pairs, with a
leading #
starting a comment until end-of-line.
Generally, the GRAM5 LRM configuration files are located in the globus
configuration directory, with each configuration file named by the LRM
name (fork
, condor
, pbs
, sge
, or slurm
). The
following are the paths to these configurations:
/etc/globus/globus-
LRM.conf
/etc/globus/globus-
LRM.conf
:
/etc/globus/globus-
LRM.conf
The globus-fork.conf
configuration file can define the following
configuration parameters: configuration file can define the following
configuration parameters:
globus-fork.log
file used by the file used by the globus-fork-starter
and fork SEG module.
mpiexec
and mpirun
for parallel jobs which use MPI. By default, these are not configured. The LRM adapter will use mpiexec
over mpirun
if both are defined.
The globus-condor.conf
configuration file can define the following
configuration parameters: configuration file can define the following
configuration parameters:
OpSys
requirement for condor jobs. If not specified, the system-wide default will be used.
OpSys
requirement for condor jobs. If not specified, the system-wide default will be used.
PATH
.
CONDOR_CONFIG
environment variable, which might be needed to use condor in some cases.
vanilla
universe jobs. This can detect some types of errors before submitting jobs to condor, but only if the filesystems between the condor submit host and condor execution hosts are equivalent. In other cases, this may cause unneccessary job failures.
The globus-pbs.conf
configuration file can define the following
configuration parameters: configuration file can define the following
configuration parameters:
mpiexec
and mpirun
for parallel jobs which use MPI. By default these are not configured. The LRM adapter will use mpiexec
over mpirun
if both are defined.
PATH
.
yes
, then the LRM adapter will attempt to use a remote shell command to launch multiple instances of the executable on different nodes, as defined by the file named by the PBS_NODEFILE
environment variable.
cluster
is set to yes
.
The globus-sge.conf
configuration file can define the following
configuration parameters: configuration file can define the following
configuration parameters:
undefined
, then the LRM adapter will try to determine it from the globus-job-manager
environment, or if not there, the contents of the file named by the sge_config
configuration parameter.
undefined
, then the LRM adapter will try to determine it from the globus-job-manager
environment, or if not there, the contents of the file named by the sge_config
configuration parameter.
SGE_ROOT
and the SGE_CELL
environment variables.
PATH
.
mprun
and mpirun
for parallel jobs which use MPI. By default these are not configured. The LRM adapter will use mprun
over mpirun
if both are defined.
parallel_environment
RSL attribute to choose one.
yes
, then the LRM adapter will verify that the parallel_environment
RSL attribute value matches one of the parallel environments supported by this GridEngine service.
validate_pes
is set to yes
. If validation is being done but this value is not set, then the LRM adapter will query the GridEngine service to determine available parallel environments at startup.
yes
, then the LRM adapter will verify that the queue
RSL attribute value matches one of the queues supported by this GridEngine service.
validate_queues
is set to yes
. If validation is being done but this value is not set, then the LRM adapter will query the GridEngine service to determine available queues at startup.
In order to use the Scheduler Event Generator with GridEngine, the job
reporting feature must be enabled, and ARCo database storage must not be
enabled. To enable this, use the command qconf -mconf
and modify
the reporting_params
parameter so that the options reporting
and
joblog
are set to true
.
The globus-slurm.conf
configuration file can define the following
configuration parameters: configuration file can define the following
configuration parameters:
The globus-gram-audit
configuration defines information about
the database to load the GRAM5 audit records into. This configuration is
located in:
/etc/globus/gram-audit.conf
/etc/globus/gram-audit.conf
/etc/globus/gram-audit.conf
This configuration file contains the following attributes. Each
attribute is defined by a ATTRIBUTE:VALUE
pair.
Table 3.2. Audit Configuration Attributes
Attribute Name | Value | Default |
---|---|---|
DRIVER | The name of the Perl 5 DBI driver for the database to be used. The supported drivers for this program are <literal>SQLite</literal>, <literal>Pg</literal> (for PostgreSQL), and <literal>mysql</literal>. </simpara> |
|
DATABASE | The DBI data source specfication to contact the audit database. |
|
USERNAME | Username to authenticate as to the database | |
PASSWORD | Password to use to authenticate with the database | |
AUDITVERSION | Version of the audit database table schemas to use. May be |
|
GRAM5 uses the RSL language to
encode job descriptions. The attributes supported by gram are defined in
RSL Validation Files. These
definitions contain information about when the different RSL attributes
are valid and what their default values might be if not present. GRAM5
will look in /etc/globus/gram/job-manager.rvf
and and
/etc/globus/gram/LRM.rvf
for site-specfic changes to the RSL
validation file. for site-specfic changes to the RSL validation file.
Table of Contents
GRAM5 includes mechanisms to provide access to audit and accounting information associated with jobs that GRAM5 submits to a local resource manager (LRM) such as Torque, GridEngine, or Condor.
In some scenarios, it is desirable to get general information about the usage of the underlying LRM, such as:
The following three use cases give a better overview of the meaning and purpose of auditing and accounting:
Audit logging in GRAM5 is done when a job completes.
While audit and accounting records may be generated and stored by different entities in different contexts, we make the following assumptions in this chapter:
Audit Records | Accounting Records | |
---|---|---|
Generated by: | GRAM service | LRM to which the GRAM service submits jobs |
Stored in: | Database, indexed by GJID | LRM, indexed by JID |
Data that is stored: | See list below. | May include all information about the duration and resource-usage of a job |
The audit record of each job contains the following data:
The rest of this chapter focuses on how to configure GRAM5 to enable Audit-Logging.
Audit logging is turned off by default. To enable GRAM5 audit logging, in the job manager, add the command-line option '-audit-directory ' to the job manager configuration in one of the following locations:
$GLOBUS_LOCATION/etc/globus-job-manager.conf
to enable it for all job manager services to enable it for all job manager services
$GLOBUS_LOCATION/etc/grid-services/LRM_SERVICE_NAME
to enable it for a particular job manager service for a particular LRM. to enable it for a particular job manager service for a particular LRM.
The globus-gram-audit
program reads GRAM5 audit records and
loads those records into a SQL database. This program is available as
part of the globus_gram_job_manager_auditing
package. It must be
configured by installing and running the
globus_gram_job_manager_auditing_setup_scripts
setup package via
gpt-postinstall
. This setup script creates the
$GLOBUS_LOCATION/etc/globus-job-manager-audit.conf
configuration
file described below and creates database tables needed by the audit
system. configuration file described below and creates database tables
needed by the audit system.
The globus-gram-audit
program support three database systems:
MySQL, PostgreSQL, and SQLite.
Table of Contents
GRAM5 runs different parts of itself under different privilege levels.
The globus-gatekeeper
runs as root, and uses its root privilege
to access the host’s private key. It uses the grid map file to map Grid
Certificates to local user ids and then uses the setuid()
function to change to that user and execute the
globus-job-manager
program
The globus-job-manager
program runs as a local non-root account.
It receives a delegated limited proxy certificate from the GRAM5 client
which it uses to access Grid storage resources via GridFTP and to
authenticate job signals (such as client cancel requests), and send job
state callbacks to registered clients. This proxy is generally
short-lived, and is automatically removed by the job manager when the
job completes.
The globus-job-manager
program uses a publicly-writable
directory for job state files. This directory has the sticky bit
set, so users may not remove other users files. Each file is named by a
UUID, so it should be unique.
Table of Contents
GRAM requires a host certificate and private key in order for the
globus-gatekeeeper
service to run. These are typically located
in /etc/grid-security/hostcert.pem
and and
/etc/grid-security/hostkey.pem
, but the path is configurable in the
, but the path is configurable in the
gatekeeper
configuration file. The key must be protected by file permissions
allowing only the root user to read it.
GRAM also (by default) uses a grid-mapfile
to authorize Grid users
as local users. This file is typically located in to authorize Grid
users as local users. This file is typically located in
/etc/grid-security/grid-mapfile
, but is configurable in the , but is
configurable in the
gatekeeper
configuration file.
Problems in either of these configurations will show up in the
gatekeeper log described below. See the GSI
documentation for
more detailed information about obtaining and installing host
certificates and maintaining a grid-mapfile
. .
GRAM relies on the globus-gatekeeper
program and (in some cases)
the globus-scheduler-event-generator
programs to process jobs.
If the former is not running, jobs requests will fail with a "connection
refused" error. If the latter is not running, GRAM jobs will appear to
"hang" in the PENDING
state.
The globus-gatekeeper
is typically started via an init script
installed in /etc/init.d/globus-gatekeeper
. The command . The
command /etc/init.d/globus-gatekeeper status
will indicate
whether the service is running. See
Starting
and Stopping GRAM5 services for
more information about starting and stopping the
globus-gatekeeper
program.
If the globus-gatekeeper
service fails to start, the output of
the command globus-gatekeeper -test
will output information
describing some types of configuration problems.
The globus-scheduler-event-generator
is typically started via an
init script installed in
/etc/init.d/globus-scheduler-event-generator
. It is only needed when
the LRM-specific "setup-seg" package is installed. The command . It is
only needed when the LRM-specific "setup-seg" package is installed. The
command /etc/init.d/globus-scheduler-event-generator status
will
indicate whether the service is running. See
Starting
and Stopping GRAM5 services for
more information about starting and stopping the
globus-scheduler-event-generator
program.
The globus-gatekeeper
program starts the
globus-job-manager
service with different command-line
parameters depending on the LRM being used. Use the command
globus-gatekeeper-admin -l
to list which LRMs the gatekeeper is
configured to use.
The globus-job-manager-script.pl
is the interface between the
GRAM job manager process and the LRM adapter. The command
/usr/share/globus/globus-job-manager-script.pl -h
will print the
list of available adapters.
% /usr/share/globus/globus-job-manager-script.pl -h USAGE: /usr/share/globus/globus-job-manager-script.pl -m MANAGER -f FILE -c COMMAND Installed managers: condor fork
The globus-scheduler-event-generator
also uses an LRM-specific
module to generate scheduler events for GRAM to reduce the amount of
resources GRAM uses on the machine where it runs. To determine which
LRMs are installed and configured, use the command
globus-scheduler-event-generator-admin -l
.
% globus-scheduler-event-generator-admin -l fork [DISABLED]
If any of these do not show the LRM you are trying to use, install the relevant packages related to that LRM and restart the GRAM services. See the GRAM Administrator’s Guide for more information about starting and stopping the GRAM services.
All GRAM5 LRM adapters have a configuration file for site customizations, such as queue names, paths to executables needed to interface with the LRM, etc. Check that the values in these files are correct. These files are described in LRM Adapter Configuration.
The /var/log/globus-gatekeeper.log
file contains information about
service requests from clients, and will be useful when diagnosing
service startup failures, authentication failures, and authorization
failures. file contains information about service requests from
clients, and will be useful when diagnosing service startup failures,
authentication failures, and authorization failures.
GRAM uses GSI to authenticate client job requests. If there is a problem with the GSI configuration for your host, or a client is trying to connect with a certificate signed by a CA your host does not trust, the job request will fail. This will show up in the log as a "GSS authentication failure". See the GSI Administrator’s Guide for information about diagnosing authentication failures.
After authentication is complete, GRAM maps the Grid identity to a local
user prior to starting the globus-job-manager
process. If this
fails, an error will show up in the log as "globus_gss_assist_gridmap()
failed authorization". See the GSI
Administrator’s Guide for information about managing gridmap files.
A per-user job manager log is typically located in
/var/log/globus/gram_$USERNAME.log
. This log contains information
from the job manager as it attempts to execute GRAM jobs via a local
resource manager. The logs can be fairly verbose. Sometimes looking for
log entries near those containing the string . This log contains
information from the job manager as it attempts to execute GRAM jobs via
a local resource manager. The logs can be fairly verbose. Sometimes
looking for log entries near those containing the string level=ERROR
will show more information about what caused a particular failure.
Once you’ve found an error in the log, it is generally useful to find
log entries related to the job which hit that error. There are two job
IDs associated with each job, one a GRAM-specific ID, and one an
LRM-specific ID. To determine the GRAM ID associated with a job, look
for the attribute gramid
in the log message. Finding that, looking
for all other log messages which contain that gramid
value will give
a better picture of what the job manager is doing. To determine the
LRM-specific ID, look for a message at TRACE
level with the matching
GRAM ID found above with the response
value matching
GRAM_SCRIPT_JOB_ID:
LRM-ID. You can then find follow the state of
the LRM-ID as well as the GRAM ID in the log, and correlate the
LRM-ID information with local resource manager logs and administrative
tools.
If all else fails, please send information about your problem to gram-user@globus.org. You’ll have to subscribe to a list before you can send an e-mail to it. See here for general e-mail lists and information on how to subscribe to a list and here for GRAM-specific lists. Depending on the problem, you may be requested to file a bug report to the Globus project’s Issue Tracker.
Table of Contents
globus-gatekeeper
[-help
]
[-conf
PARAMETER_FILE]
[-test
] -d
| -debug
-inetd
| -f
-p
PORT | -port
PORT
[-home
PATH] -l
LOGFILE | -logfile
LOGFILE [-lf
LOG_FACILITY]
[-acctfile
ACCTFILE]
[-e
LIBEXECDIR]
[-launch_method
fork_and_exit
| fork_and_wait
| dont_fork
]
[-grid_services
SERVICEDIR]
[-globusid
GLOBUSID]
[-gridmap
GRIDMAP]
[-x509_cert_dir
TRUSTED_CERT_DIR]
[-x509_cert_file
TRUSTED_CERT_FILE]
[-x509_user_cert
CERT_PATH]
[-x509_user_key
KEY_PATH]
[-x509_user_proxy
PROXY_PATH]
[-k
]
[-globuskmap
KMAP]
[-pidfile
PIDFILE]
The globus-gatekeeper
program is a meta-server similar to
inetd
or xinetd
that starts other services after
authenticating a TCP connection using GSSAPI and mapping the client’s
credential to a local account.
The most common use for the globus-gatekeeper
program is to
start instances of the globus-job-manager(8)
service. A single
globus-gatekeeper
deployment can handle multiple different
service configurations by having entries in the /etc/grid-services
directory. directory.
Typically, users interact with the globus-gatekeeper
program via
client applications such as globusrun(1)
, globus-job-submit
,
or tools such as CoG jglobus or Condor-G.
The full set of command-line options to globus-gatekeeper
consists of:
globus-gatekeeper
process, service home directory, service execution directory, and X.509 subject name and then exits.
globus-gatekeeper
process in the foreground.
globus-gatekeeper
process was started via inetd
or a similar super-server. If this flag is set and the globus-gatekeeper
was not started via inetd, a warning will be printed in the gatekeeper log.
globus-gatekeeper
process should run in the foreground. This flag has no effect when the globus-gatekeeper
is started via inetd.
globus-gatekeeper
is started via inetd or a similar service. If not specified and the gatekeeper is running as root, the default of 2119
is used. Otherwise, the gatekeeper defaults to an ephemeral port.
GLOBUS_LOCATION
environment variable in the service environment. If not specified, the gatekeeper looks for service executables in /usr/sbin
, configuration in , configuration in /etc
, and writes logs and accounting files to , and writes logs and accounting files to /var/log
..
logoff
or LOGOFF
, then logging will be disabled, both to file and to syslog.
LOG_DAEMON
will be used as the default when using syslog.
sbin
subdirectory of the parameter to subdirectory of the parameter to -home is used, or /usr/sbin
if that is not set. if that is not set.
fork_and_exit
|fork_and_wait
|dont_fork
fork_and_exit
(the service runs completely independently of the gatekeeper, which exits after creating the new service process), fork_and_wait
(the service is run in a separate process from the gatekeeper but the gatekeeper does not exit until the service terminates), or dont_fork
, where the gatekeeper process becomes the service process via the exec()
system call.
GLOBUSID
environment variable to GLOBUSID. This variable is used to construct the gatekeeper contact string if it can not be parsed from the service credential.
X509_CERT_DIR
to this value.
X509_USER_CERT
environment variable to this value.
X509_USER_KEY
environment variable to this value.
X509_USER_PROXY
environment variable to this value.
globus-k5
command to acquire Kerberos 5 credentials before starting the service.
globus-gatekeeper
to the file named by PIDFILE.
If the following variables affect the execution of
globus-gatekeeper
:
/etc/grid-services/SERVICENAME
/etc/grid-security/grid-mapfile
/etc/globuskmap
/etc/globus-nologin
globus-gatekeeper
program.
/var/log/globus-gatekeeper.log
The globus-gatekeeper-admin
program manages service entries
which are used by the globus-gatekeeper
to execute services.
Service entries are located in the /etc/grid-services
directory. The
directory. The globus-gatekeeper-admin
can list, enable, or
disable specific services, or set a service as the default. The -h
command-line option shows a brief usage message.
The -l command-line option to globus-gatekeeper-admin
will
cause it to list all of the services which are available to be run by
the globus-gatekeeper
. In the output, the service name will be
followed by its status in brackets. Possible status strings are
ENABLED
, DISABLED
, and ALIAS to
, where NAME is another
service name.
If the -n ' is used, then only information about the service named 'NAME is printed.
The '-e ' command-line option to globus-gatekeeper-admin
will
cause it to enable a service so that it may be run by the
globus-gatekeeper
.
If the -n ' option is used as well, then the service will be enabled with the alias 'NAME.
The -E command-line option to globus-gatekeeper-admin
will
cause it to enable a service alias with the name jobmanager
. The
globus-gatekeeper-admin
program will choose the first service it
finds as the default. To enable a particular service as the default, use
the -e parameter described above with the -n parameter.
globus-gram-audit
[--conf
CONFIG_FILE] [--create
] | [--update=
OLD-VERSION] [--check
] [--delete
] [--audit-directory
AUDITDIR] [--quiet
]
The globus-gram-audit
program loads audit records to an
SQL-based database. It reads
$GLOBUS_LOCATION/etc/globus-job-manager.conf
by default to determine
the audit directory and then uploads all files in that directory that
contain valid audit records to the database configured by the by
default to determine the audit directory and then uploads all files in
that directory that contain valid audit records to the database
configured by the globus_gram_job_manager_auditing_setup_scripts
package. If the upload completes successfully, the audit files will be
removed.
The full set of command-line options to globus-gram-audit
consist of:
The globus-gram-audit
uses the following files (paths relative
to $GLOBUS_LOCATION
).
etc/globus-gram-job-manager.conf
etc/globus-gram-audit.conf
globus-job-manager
-type
LRM [-conf
CONFIG_PATH] [-help
] [-globus-host-manufacturer
MANUFACTURER] [-globus-host-cputype
CPUTYPE] [-globus-host-osname
OSNAME] [-globus-host-osversion
OSVERSION] [-globus-gatekeeper-host
HOST] [-globus-gatekeeper-port
PORT] [-globus-gatekeeper-subject
SUBJECT] [-home
GLOBUS_LOCATION] [-target-globus-location
TARGET_GLOBUS_LOCATION] [-condor-arch
ARCH] [-condor-os
OS] [-history
HISTORY_DIRECTORY] [-scratch-dir-base
SCRATCH_DIRECTORY] [-enable-syslog
] [-stdio-log
LOG_DIRECTORY] [-log-pattern
PATTERN] [-log-levels
LEVELS] [-state-file-dir
STATE_DIRECTORY] [-globus-tcp-port-range
PORT_RANGE] [-globus-tcp-source-range
SOURCE_RANGE] [-x509-cert-dir
TRUSTED_CERTIFICATE_DIRECTORY] [-cache-location
GASS_CACHE_DIRECTORY] [-k
] [-extra-envvars
VAR=VAL,…] [-seg-module
SEG_MODULE] [-audit-directory
AUDIT_DIRECTORY] [-globus-toolkit-version
TOOLKIT_VERSION] [-disable-streaming
] [-disable-usagestats
] [-usagestats-targets
TARGET] [-service-tag
SERVICE_TAG]
The globus-job-manager
program is a servivce which starts and
controls GRAM jobs which are executed by a local resource management
system, such as LSF or Condor. The globus-job-manager
program is
typically started by the globus-gatekeeper
program and not
directly by a user. It runs until all jobs it is managing have
terminated or its delegated credentials have expired.
Typically, users interact with the globus-job-manager
program
via client applications such as globusrun
,
globus-job-submit
, or tools such as CoG jglobus or Condor-G.
The full set of command-line options to globus-job-manager
consists of:
globus-job-manager
program.
-globus-host-manufacturer
MANUFACTURER::
Indicate the manufacturer of the system which the jobs will execute on. This parameter sets the value of the $(GLOBUS_HOST_MANUFACTURER)
RSL substitution to MANUFACTURER
$(GLOBUS_HOST_CPUTYPE)
RSL substitution to CPUTYPE
$(GLOBUS_HOST_OSNAME)
RSL substitution to OSNAME
$(GLOBUS_HOST_OSVERSION)
RSL substitution to OSVERSION
$(GLOBUS_GATEKEEPER_HOST)
RSL substitution to HOST
$(GLOBUS_GATEKEEPER_PORT)
RSL substitution to PORT
$(GLOBUS_GATEKEEPER_SUBJECT)
RSL substitution to SUBJECT
$(GLOBUS_LOCATION)
RSL substitution to TARGET_GLOBUS_LOCATION
scratch_dir
attribute.
$(HOME)
, $(LOGNAME)
, etc, as well as the special RSL substition $(DATE)
which will be resolved at log time to the date in YYYYMMDD form.
FATAL
, ERROR
, WARN
, INFO
, DEBUG
, and TRACE
. Multiple values can be combined with the |
character. The default value of logging when enabled is FATAL|ERROR
.
$GLOBUS_LOCATION/tmp/gram_job_state/
. This directory must be writable by all users and be on a file system which supports POSIX advisory file locks. . This directory must be writable by all users and be on a file system which supports POSIX advisory file locks.
GLOBUS_TCP_PORT_RANGE
environment variable.
GLOBUS_TCP_SOURCE_RANGE
environment variable.
X509_CERT_DIR
environment variable.
GLOBUS_GASS_CACHE_DEFAULT
environment variable.
VAR
is the variable name and VAL
is the variable’s value. If the value is not specified, then the value of the variable in the job manager’s environment is used. This option may be present multiple times on the command-line or the job manager configuration file to append multiple environment settings.
globus-job-manager-event-generator
must be running to process events for the LRM into a generic format that the job manager can parse.
globus-gram-audit
program.
untagged
will be used.
all
(which enables all tags) and default
may be used, or a sequence of characters for the various tags. If this option is not present in the configuration, then the default of usage-stats.globus.org:4810 is used.
If the following variables affect the execution of
globus-job-manager
HOME
LOGNAME
JOBMANAGER_SYSLOG_ID
JOBMANAGER_SYSLOG_FAC
JOBMANAGER_SYSLOG_LVL
GATEKEEPER_JM_ID
GATEKEEPER_PEER
GLOBUS_ID
GLOBUS_JOB_MANAGER_SLEEP
GRID_SECURITY_HTTP_BODY_FD
globus-gatekeeper
.
X509_USER_PROXY
globus-gatekeeper
program to be used by the job manager.
GRID_SECURITY_CONTEXT_FD
GLOBUS_USAGE_TARGETS
GLOBUS_TCP_PORT_RANGE
GLOBUS_TCP_SOURCE_RANGE
$HOME/.globus/job/HOSTNAME/LRM.TAG.red
$HOME/.globus/job/HOSTNAME/LRM.TAG.lock
$HOME/.globus/job/HOSTNAME/LRM.TAG.pid
$HOME/.globus/job/HOSTNAME/LRM.TAG.sock
$HOME/.globus/job/HOSTNAME/JOB_ID/
$HOME/.globus/job/HOSTNAME/JOB_ID/stdin
$HOME/.globus/job/HOSTNAME/JOB_ID/stdout
$HOME/.globus/job/HOSTNAME/JOB_ID/stderr
$HOME/.globus/job/HOSTNAME/JOB_ID/x509_user_proxy
$GLOBUS_LOCATION/tmp/gram_job_state/job.HOSTNAME.JOB_ID
$GLOBUS_LOCATION/tmp/gram_job_state/job.HOSTNAME.JOB_ID.lock
$GLOBUS_LOCATION/etc/globus-job-manager.conf
$GLOBUS_LOCATION/etc/grid-services/jobmanager-LRM
$GLOBUS_LOCATION/etc/globus/gram/job—manager.rvf
$GLOBUS_LOCATION/etc/globus/gram/lrm.rvf
The globus-rvf-check
command is a utility which checks the
syntax of a RSL validation file, and prints out parse errors when
encountered. It can also parse the RVF file contents and then dump
file’s contents to stdout, after canonicalizing values and quoting. The
exit code of globus-rvf-check
is 0 if all files specified on the
command line exist and have no parse errors.
The full set of command-line options to globus-rvf-check
consists of:
globus-rvf-check
just prints a diagnostic message to standard output indicating whether the file could be parsed.
The globus-rvf-edit
command is a utility which opens the default
editor on a specified RSL validation file, and then, when editing
completes, runs the globus-rvf-check
command to verify that the
RVF file syntax is correct. If a parse error occurs, the user will be
given an option to rerun the editor or discard the modifications.
The full set of command-line options to globus-rvf-edit
consists
of:
globus-scheduler-event-generator - Process LRM events into a common format for use with GRAM
globus-scheduler-event-generator
-s
LRM
[-t
TIMESTAMP] [-d
DIRECTORY]
[-b
] [-p
PIDFILE]
The globus-scheduler-event-generator
program processes
information from a local resource manager to generate LRM-independent
events which GRAM can use to track job state changes. Typically, the
globus-scheduler-event-generator
is started at system boot time
for all LRM adapters which have been installed. The only required
parameter to globus-scheduler-event-generator
is '-s ', which
indicates what LRM-specific module to load. A list of available modules
can be found by using the globus-scheduler-event-generator-admin
command.
Other options control how the globus-scheduler-event-generator
program runs and where its output goes. These options are:
globus-scheduler-event-generator
will process events from the time it was started, and not look for
historical events.
globus-scheduler-event-generator
program in the
background.
globus-scheduler-event-generator
to
PIDFILE.
/var/lib/globus/globus-seg-LRM/YYYYMMDD
globus-scheduler-event-generator
The globus-scheduler-event-generator-admin
program manages SEG
modules which are used by the globus-scheduler-event-generator
to monitor a local resource manager or batch system for events. The
globus-scheduler-event-generator-admin
can list, enable, or
disable specific SEG modules. The -h command-line option shows a brief
usage message.
The -l command-line option to
globus-scheduler-event-generator-admin
will cause it to list all
of the SEG modules which are available to be run by the
globus-scheduler-event-generator
. In the output, the service
name will be followed by its status in brackets. Possible status strings
are ENABLED
and DISABLED
.
The '-e ' command-line option to
globus-scheduler-event-generator-admin
will cause it to enable
the module so that the init script for the
globus-scheduler-event-generator
will run it.
Table of Contents
The following usage statistics are sent by default in a UDP packet (in addition to the GRAM component code, packet version, timestamp, and source IP address) at the end of each job.
GLOBUS_GRAM_PROTOCOL_JOB_STATE_UNSUBMITTED
GLOBUS_GRAM_PROTOCOL_JOB_STATE_FILE_STAGE_IN
GLOBUS_GRAM_PROTOCOL_JOB_STATE_PENDING
GLOBUS_GRAM_PROTOCOL_JOB_STATE_ACTIVE
GLOBUS_GRAM_PROTOCOL_JOB_STATE_FAILED
GLOBUS_GRAM_PROTOCOL_JOB_STATE_FILE_STAGE_OUT
GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE
count
RSL attribute
globus-gram-job-manager.rvf
The following information can be sent as well in a job status packet but it is not sent unless explicitly enabled by the system administrator:
In addition to job-related status, the job manager sends information periodically about its execution status. The following information is sent by default in a UDP packet (in addition to the GRAM component code, packet version, timestamp, and source IP address) at job manager start and every 1 hour during the job manager lifetime:
Also, please see our policy statement on the collection of usage statistics.