Test Readiness Review and Testing Feedback

39 posts / 0 new
Last post
Test Readiness Review and Testing Feedback

Reply to this thread with testing related feedback.

Delivery Effort Stage: 
  • I don't have info on the PSC test resource. ( how to access, what is it, what is already setup on it )?  root access via ??? Is it running a batch system ???  I have no starting point at this time.
  • Which parts of the test plan should be run with the test resource.  There are a couple ways to install IPF, do you have a preference?  I could do up to 2 if needed.
  • The test plan would like "Ideally, the tests in this section will be performed on all of the target platforms listed above." . Not being root, uh, no. But if it is installed on one them I'd be happy to look around and try to verify some functionality at the user level. If you know of a system where I could do that point it out.

Hi Galen,

Derek said he would set you up with a test resource with slurm logs.  We'll check status today on the call.

Thanks!

Shava

Galen,

I added an account on info.psc.xsede.org for you (arnoldg). You should be able to login using your PSC (Kerberos) password.

- Derek

I can get into bridges ok with arnoldg and my recently updated psc password, but not info.psc.xsede.org ... "permission denied".  Can you see what's happening?  Is there some propagation delay there on a password reset?

I probably forgot a step - will check and get back to you. - Derek

Galen, please retry your ssh login to info.psc.xsede.org - SSH should let you in now...

Hi Derek,

Can Galen access the Slurm logs from this host?

Thanks!

Shava

Yes, Galen should be able to read the Bridges SLURM logs at /var/log/slurmctld.log

[dsimmel@info ~]$ ls -l /var/log/slurmctld.log

-rw-r--r-- 1 root root 1153204740 Mar  9 19:53 /var/log/slurmctld.log

Thanks!

You should try the pip install method of installation, particularly if you are testing on resources where you don't have root.  You shouldn't need root to test most functionality.

It would be good if you could run through at least a minimal RPM install on a system where you have root (even if it is not a real resource), just to sanity check the RPM.

Ideally, all tests in the main part of the test plan (not the appendix) should be run, though not all are applicable on all resources.  You probably don't need special permissions for anything except possibly looking at the slurmctl.log for the job updates flow, or running the workflows as services, which isn't terribly important for testing.

I didn't find any reference in the Test Plan to "service ipf-RESOURCE_NAME-glue2-activity start" ( Install guide test example 3. ).  Is that ok to leave out of this testing?  

Yes, we don't necessarily need to run the workflows as services for this testing plan--it can be left out.

bash-4.2# uname -a
Linux js-17-228.jetstream-cloud.org 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
bash-4.2# cat /etc/centos-release
CentOS Linux release 7.7.1908 (Core)
bash-4.2# rpm -i http://software.xsede.org/development/repo/repos/XSEDE-Developm
ent/[K-config.centos-7-1.noarch.rpm
warning: /var/tmp/rpm-tmp.e1dP6R: Header V3 RSA/SHA256 Signature, key ID 20423dbb: NOKEY
bash-4.2# rpm --import /etc/pkt[Ki/rpm-gpg/RPM-GPG-KEY-XSEDE-DE[Kevelopment
bash-4.2# yum install ipf-xsede
Loaded plugins: changelog, fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.den01.meanservers.net
 * epel: pubmirror2.math.uh.edu
 * extras: centos.den.host-engine.com
 * updates: repos-va.psychz.net
XSEDE-Development                                        | 2.9 kB     00:00    
XSEDE-Development/7/x86_64/primary_db                      | 111 kB   00:00    
Resolving Dependencies
--> Running transaction check
---> Package ipf-xsede.noarch 0:1.5b1-2 will be installed
--> Processing Dependency: python-amqp < 2 for package: ipf-xsede-1.5b1-2.noarch
Package python-amqp is obsoleted by python2-amqp, but obsoleting package does not provide for requirements
--> Processing Dependency: python-setuptools >= 1.4 for package: ipf-xsede-1.5b1-2.noarch
--> Processing Dependency: python-amqp >= 1.4 for package: ipf-xsede-1.5b1-2.noarch
--> Processing Dependency: python3 for package: ipf-xsede-1.5b1-2.noarch
--> Running transaction check
---> Package ipf-xsede.noarch 0:1.5b1-2 will be installed
--> Processing Dependency: python-amqp < 2 for package: ipf-xsede-1.5b1-2.noarch
Package python-amqp is obsoleted by python2-amqp, but obsoleting package does not provide for requirements
--> Processing Dependency: python-setuptools >= 1.4 for package: ipf-xsede-1.5b1-2.noarch
---> Package python2-amqp.noarch 0:2.4.0-1.el7 will be installed
--> Processing Dependency: python2-vine >= 1.1.3 for package: python2-amqp-2.4.0-1.el7.noarch
---> Package python3.x86_64 0:3.6.8-10.el7 will be installed
--> Processing Dependency: python3-libs(x86-64) = 3.6.8-10.el7 for package: python3-3.6.8-10.el7.x86_64
--> Processing Dependency: python3-setuptools for package: python3-3.6.8-10.el7.x86_64
--> Processing Dependency: python3-pip for package: python3-3.6.8-10.el7.x86_64
--> Processing Dependency: libpython3.6m.so.1.0()(64bit) for package: python3-3.6.8-10.el7.x86_64
--> Running transaction check
---> Package ipf-xsede.noarch 0:1.5b1-2 will be installed
--> Processing Dependency: python-amqp < 2 for package: ipf-xsede-1.5b1-2.noarch
Package python-amqp is obsoleted by python2-amqp, but obsoleting package does not provide for requirements
--> Processing Dependency: python-setuptools >= 1.4 for package: ipf-xsede-1.5b1-2.noarch
---> Package python2-vine.noarch 0:1.2.0-1.el7 will be installed
---> Package python3-libs.x86_64 0:3.6.8-10.el7 will be installed
---> Package python3-pip.noarch 0:9.0.3-5.el7 will be installed
---> Package python3-setuptools.noarch 0:39.2.0-10.el7 will be installed
--> Finished Dependency Resolution
Error: Package: ipf-xsede-1.5b1-2.noarch (XSEDE-Development)
           Requires: python-amqp < 2
           Available: python-amqp-1.4.9-1.el7.noarch (XSEDE-Development)
               python-amqp = 1.4.9-1.el7
           Installing: python2-amqp-2.4.0-1.el7.noarch (epel)
               python-amqp = 2.4.0-1.el7
Error: Package: ipf-xsede-1.5b1-2.noarch (XSEDE-Development)
           Requires: python-setuptools >= 1.4
           Installed: python-setuptools-0.9.8-7.el7.noarch (@base)
               python-setuptools = 0.9.8-7.el7
Error: Package: ipf-xsede-1.5b1-2.noarch (XSEDE-Development)
           Requires: python-amqp < 2
           Available: python-amqp-1.4.9-1.el7.noarch (XSEDE-Development)
               python-amqp = 1.4.9-1.el7
           Available: python2-amqp-2.4.0-1.el7.noarch (epel)
               python-amqp = 2.4.0-1.el7
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest
bash-4.2# date
Mon Mar  9 15:30:08 CDT 2020
bash-4.2# exit
exit
Script done on Mon 09 Mar 2020 03:30:22 PM CDT

Well, that's unfortunate.  I'll look into what needs doing w/r/t the python amqp library and rpms.

 
...is missing *.tgz bundle to use with the "pip install ipf" method as a user on an xsede login node.  That's also the directory for /latest/ . 
 
"3) Install the XSEDE version of the Information Publishing Framework (ipf-xsede):
   a) Download the .tgz file from http://software.xsede.org/production/ipf/ipf-xsede/latest/ "

"latest" is a symlink that always points to the most recent upload.

I've uploaded the tarball--though it shouldn't be necessary for a pip install:  ipf-1.5b1-2 has been uploaded to pypi.org so pip should get it from there.

ipf and amqp installed fine with pip, but without the tar bundle, where are the scripts for $INSTALL_DIR/ipv-VERSION/  ?  If you show me how to proceed without the tar bundle using just pip, I'll give it a try ( and you can fix the install guide ... )

Yes, this will obviously need to go in the install guide as well.

Your venv location /lib/python-3.x/site-packages/ will have an "ipf" directory which is the base of the ipf installation.  It will also have an "etc" directory that contains the workflow json directories, and the venv will have a "bin" that contains "ipf_configure_xsede" and "ipf_workflow"

 

 

awesome !
 
[arnoldg@login018 .local]$ ls -R `find . -name etc -print`
./lib/python3.6/site-packages/etc:
ipf
./lib/python3.6/site-packages/etc/ipf:
init.d  logging.conf  workflow  xsede
./lib/python3.6/site-packages/etc/ipf/init.d:
ipf-WORKFLOW
./lib/python3.6/site-packages/etc/ipf/workflow:
glue2  ipfinfo.json  ipfinfo_publish_periodic.json  sysinfo.json  sysinfo_publish.json  sysinfo_publish_periodic.json
./lib/python3.6/site-packages/etc/ipf/workflow/glue2:
templates
./lib/python3.6/site-packages/etc/ipf/workflow/glue2/templates:
abstractservice.json       extmodulesremote.json  modules.json            serviceremotepublish.json  slurm_compute.json
catalina_pbs_compute.json  ipfinfo_publish.json   openstack_compute.json  sge_activity.json
condor_compute.json        lmod.json              pbs_activity.json       sge_compute.json
extmodules.json            moab_pbs_compute.json  pbs_compute.json        slurm_activity.json
./lib/python3.6/site-packages/etc/ipf/xsede:
ca_certs.pem
[arnoldg@login018 .local]$ ls bin
ipf_configure_xsede  ipf_workflow
[arnoldg@login018 .local]$

For the install ( pip as user ) , i'm stuck at 4) d) trying to " ipf_workflow sysinfo.json".  I can't find a sysinfo.json file under ~/.local/lib/python3.6/site-packages/ipf, and ipf_workflow is up a few levels at ~/.local/bin/ .  How to proceed ?

This is not in the test plan yet, just the ipf INSTALL.md .

Yes, there are some unresolved pip install issues that I'm now working on.  Since the tarball is now in place, could you try a tarball install so that you won't be blocked on me and pip install?

OK, the key to running the sysinfo workflow from a pip install is:

You need to set the env variable IPF_ETC_PATH, and it needs to look like:

/Users/blau/ipf-1.5-release/venv/lib/python3.6/site-packages/etc/ipf/

My sysinfo.json is in .//lib/python3.6/site-packages/etc/ipf/workflow/sysinfo.json

It appears that pip is putting the etc directory not in ./lib/python3.6/site-packages/ipf/etc in the venv but straight in ./lib/python3.6/site-packages/etc/ which is confusing some things, but should be workable w/ the right env variables.

 

Still not finding the latest tar ( .tgz ) bundle in development/ipf/ipf-xsede/latest ...  Recall where it was uploaded to ( URL ) ?

Sorry, I didn't see this on Friday afternoon.  

The tarball is now definitively in place at https://software.xsede.org/development/ipf/ipf-xsede/ipf-xsede-1.5b1-2/i...

Cool.  If I get to look at that later in the week, I'll let you know how it goes.   I plan to proceed with just the tarball since I don't have all of the documentation for what happens with the pip install ( what environment vars to set and where ).  I did not install into a venv but into my own $HOME/.local : pip install --user ... That would seem to be a common practice.

Interesting.  That's actually a scenario that I wasn't considering (pip install --user).  I'll have to take a look at it myself.

I've got a cert with "myproxy-logon -s myproxy.xsede.org".  I don't have my notes on hand from the last time I tested IPF ( stranded on my office machine and backups ).  Can somebody remind me how to respond to "certificate" below.  I think key is the file in /tmp/x509up... ??

 

bin/ipf_configure_xsede

...

 

Will you authenticate using an X.509 certificate and key or a username and password? (X.509):
  (1) X.509
  (2) username/password
: 1

Where is your certificate? (/etc/grid-security/xdinfo-hostcert.pem):

Where is your key? (/etc/grid-security/xdinfo-hostkey.pem): /tmp/x509up_u19475

If I recall correctly, myproxy-logon doesn't retrieve your key--if you don't already have a copy of it locally you should be able to use myproxy-retrieve to get it

 

http://grid.ncsa.illinois.edu/myproxy/man/myproxy-retrieve.1.html

It's been awhile since I've used myproxy, though

i'm still blocked at this step.  Unsure of how to proceed.

Oh, wait, the proxy contains both a cert and a key (for the limited time proxy).  Try specifying the same file for both cert and key.

That sounds familiar...proceeding that way...

 

Note, since I'm doing a user-side test and pip install ( not root ) , the INSTALL.md test steps 1-3 will not apply "service ... start" .  I've done what I can in INSTALL.

ipf_configure_xsede seemed to run ok and I have the .json files it created as expected.

If there are no objections, I'll continue to the test plan.

Sounds good to me.

I'll ping you via email with the 1st partial test log Eric.  It's close to working for me but missing something.

7.3 not working...

/ipf_workflow glue2/bridges_extmodules.json

...

2020-03-19 12:08:37,084 - ipf.ipfinfo.IPFInformationStep - DEBUG - step-5 - sending output data None of type ipf.ipfinfo.IPFInformation to step step-2
Process ExtendedModApplicationsStep-116:
Traceback (most recent call last):
  File "/opt/packages/python/gnu_openmpi/3.6.4_np1.14.5/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/arnoldg/xci-677/ipf-1.5b1/ipf/glue2/application.py", line 249, in run
    self._output(self._run())
  File "/home/arnoldg/xci-677/ipf-1.5b1/ipf/glue2/modules.py", line 260, in _run
    self._addPath(path, path, module_paths, apps)
  File "/home/arnoldg/xci-677/ipf-1.5b1/ipf/glue2/modules.py", line 274, in _addPath
    module_path, module_paths, apps)
  File "/home/arnoldg/xci-677/ipf-1.5b1/ipf/glue2/modules.py", line 274, in _addPath
    module_path, module_paths, apps)
  File "/home/arnoldg/xci-677/ipf-1.5b1/ipf/glue2/modules.py", line 288, in _addPath
    self._addModule(os.path.join(path, file), name, file, apps)
  File "/home/arnoldg/xci-677/ipf-1.5b1/ipf/glue2/modules.py", line 306, in _addModule
    pathhashobject = hashlib.md5(path)
TypeError: Unicode-objects must be encoded before hashing
2020-03-19 12:08:37,175 - ipf.engine - DEBUG - no more inputs to step step-2
2020-03-19 12:08:37,176 - ipf.engine - DEBUG - no more inputs to step step-3
2020-03-19 12:08:37,176 - ipf.engine - DEBUG - no more inputs to step step-4
2020-03-19 12:08:37,177 - ipf.engine - DEBUG - no more inputs to step step-5
2020-03-19 12:08:37,177 - ipf.engine - DEBUG - no more inputs to step step-6
2020-03-19 12:08:37,277 - ipf.engine - ERROR - workflow failed
2020-03-19 12:08:37,278 - ipf.engine - INFO -       step-1 succeeded (ResourceNameStep)
2020-03-19 12:08:37,278 - ipf.engine - ERROR -      step-2 failed    (ExtendedModApplicationsStep)

 

Also, the other window fails :

[arnoldg@login005 test]$ python subscribe_amqp.py -k $PROXY -c $PROXY -a $IPF/et/ipf/xsede/ca_certs.pem -s info1.dyn.xsede.org -v xsede -e glue2.applications -f"*.*.bridges.psc.xsede"
connecting to info1.dyn.xsede.org:5671 with certificate and key
Traceback (most recent call last):
  File "subscribe_amqp.py", line 136, in <module>
    conn = connect(options)
  File "subscribe_amqp.py", line 78, in connect
    heartbeat=60)
  File "/home/arnoldg/.local/lib/python3.6/site-packages/amqp/connection.py", line 165, in __init__
    self.transport = self.Transport(host, connect_timeout, ssl)
  File "/home/arnoldg/.local/lib/python3.6/site-packages/amqp/connection.py", line 186, in Transport
    return create_transport(host, connect_timeout, ssl)
  File "/home/arnoldg/.local/lib/python3.6/site-packages/amqp/transport.py", line 297, in create_transport
    return SSLTransport(host, connect_timeout, ssl)
  File "/home/arnoldg/.local/lib/python3.6/site-packages/amqp/transport.py", line 199, in __init__
    super(SSLTransport, self).__init__(host, connect_timeout)
  File "/home/arnoldg/.local/lib/python3.6/site-packages/amqp/transport.py", line 95, in __init__
    raise socket.error(last_err)
OSError: [Errno 110] Connection timed out

 

 

 

Hmm, the first looks like a python3 related change that slipped through.  I'll fix it and we can have you test a patch to confirm before creating another rpm/tarball/pip revision.

 

Less sure off the cuff about the second error

[arnoldg@login006 test]$ python -m pdb subscribe_amqp.py -k $PROXY -c $PROXY -a $IPF/etc/ipf/xsede/ca_certs.pem -s info1.dyn.xsede.org -v xsede -e glue2.applications -f "*.*.bridges.psc.xsede"
> /home/arnoldg/xci-677/ipf-1.5b1/ipf/xsede/test/subscribe_amqp.py(3)<module>()
-> import getpass
(Pdb) c
connecting to info1.dyn.xsede.org:5671 with certificate and key
Traceback (most recent call last):
  File "/opt/packages/python/gnu_openmpi/3.6.4_np1.14.5/lib/python3.6/pdb.py", line 1667, in main
    pdb._runscript(mainpyfile)
  File "/opt/packages/python/gnu_openmpi/3.6.4_np1.14.5/lib/python3.6/pdb.py", line 1548, in _runscript
    self.run(statement)
  File "/opt/packages/python/gnu_openmpi/3.6.4_np1.14.5/lib/python3.6/bdb.py", line 431, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/home/arnoldg/xci-677/ipf-1.5b1/ipf/xsede/test/subscribe_amqp.py", line 3, in <module>
    import getpass
  File "/home/arnoldg/xci-677/ipf-1.5b1/ipf/xsede/test/subscribe_amqp.py", line 78, in connect
    heartbeat=60)
  File "/home/arnoldg/.local/lib/python3.6/site-packages/amqp/connection.py", line 165, in __init__
    self.transport = self.Transport(host, connect_timeout, ssl)
  File "/home/arnoldg/.local/lib/python3.6/site-packages/amqp/connection.py", line 186, in Transport
    return create_transport(host, connect_timeout, ssl)
  File "/home/arnoldg/.local/lib/python3.6/site-packages/amqp/transport.py", line 297, in create_transport
    return SSLTransport(host, connect_timeout, ssl)
  File "/home/arnoldg/.local/lib/python3.6/site-packages/amqp/transport.py", line 199, in __init__
    super(SSLTransport, self).__init__(host, connect_timeout)
  File "/home/arnoldg/.local/lib/python3.6/site-packages/amqp/transport.py", line 95, in __init__
    raise socket.error(last_err)
OSError: [Errno 110] Connection timed out
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /home/arnoldg/.local/lib/python3.6/site-packages/amqp/transport.py(95)__init__()
-> raise socket.error(last_err)
(Pdb) p host
'info1.dyn.xsede.org'
(Pdb) p connect_timeout
None
(Pdb) p ssl
<module 'ssl' from '/opt/packages/python/gnu_openmpi/3.6.4_np1.14.5/lib/python3.6/ssl.py'>
(Pdb)

 

Contact Eric Blau <blau@mcs.anl.gov> for assistance in testing

Eric I'll fwd you a couple things I ran into for the install of 1.5b2 directly instead of making more mess here.

Log in to post comments