dhurba
Regular ATK user

Reputation: 0
Offline
 India
Posts: 48
|
 |
« on: July 10, 2012, 12:49 » |
|
Hi i have connected 4 servers(1 is the master where the license is installed) using procedure given in the Parallel Guide tutorial.we have used mpich2.
now when we use the command mpdtrace , it gives "command not found" .
Also the command lmxendutil -licstat shows that there is no license server available..
|
|
|
|
|
Logged
|
ATK 11.8.0
|
|
|
|
Anders Blom
|
 |
« Reply #1 on: July 12, 2012, 14:34 » |
|
You will need to troubleshoot the license server first. Is it really running? Is it properly configured? Try running lmxendutil -licstat on the license server itself to see this, and also check the license server log files for hints that something might be wrong.
Next you must have UPD and TCP/IP ports 6200 open on the license server and your machine, in order to see the license across the network. This also assumes the machines are both on the same network segment, otherwise you need to manually specify the IP number of the license server via the environment variable QUANTUM_LICENSE_PATH, as described in the installation guide.
|
|
|
|
|
Logged
|
|
|
|
dhurba
Regular ATK user

Reputation: 0
Offline
 India
Posts: 48
|
 |
« Reply #2 on: July 13, 2012, 07:08 » |
|
Please see the screen shot of lmxendutil on the server itself and lmx-serv on the server
|
|
|
|
|
Logged
|
ATK 11.8.0
|
|
|
dhurba
Regular ATK user

Reputation: 0
Offline
 India
Posts: 48
|
 |
« Reply #3 on: July 13, 2012, 08:16 » |
|
The screen shot at the slaves
|
|
|
|
|
Logged
|
ATK 11.8.0
|
|
|
|
kstokbro
|
 |
« Reply #4 on: July 13, 2012, 08:25 » |
|
The slaves does not seem to have connection with the license server. What is the IP of the license server? Try to see if you can ping the license server from the slaves. ping IP Somehow there must be a problem with the internet connection between the slaves and the license server
|
|
|
|
|
Logged
|
|
|
|
dhurba
Regular ATK user

Reputation: 0
Offline
 India
Posts: 48
|
 |
« Reply #5 on: July 13, 2012, 08:47 » |
|
the IP of the license server is 10.0.0.1
slaves are 10.0.0.2, 10.0.0.4, 10.0.0.5
ping 10.0.0.1 gives reply from all the 3 slaves
|
|
|
|
|
Logged
|
ATK 11.8.0
|
|
|
|
Nordland
|
 |
« Reply #6 on: July 13, 2012, 09:10 » |
|
Is the correct port open?
If yes, try to disable auto discovery and enter the ip-address manually as described in the manual.
|
|
|
|
|
Logged
|
|
|
|
|
Anders Blom
|
 |
« Reply #7 on: July 13, 2012, 12:19 » |
|
That is, run export QUANTUM_LICENSE_PATH=6200@10.0.0.1 lmxendutil -licstat
|
|
|
|
|
Logged
|
|
|
|
dhurba
Regular ATK user

Reputation: 0
Offline
 India
Posts: 48
|
 |
« Reply #8 on: July 16, 2012, 06:36 » |
|
Thanks it worked.. but next when i run the test script given in parallel guide mpiexec -n 4 /opt/QuantumWise/atk-12.2.0/atkpython/bin/atkpython /home/user/test_mpi.py > /home/user/test_mpi.log it returns as ssh:vlsi1: Temporary failure in name resolution [code]
[/code]
|
|
|
|
|
Logged
|
ATK 11.8.0
|
|
|
dhurba
Regular ATK user

Reputation: 0
Offline
 India
Posts: 48
|
 |
« Reply #9 on: July 17, 2012, 08:08 » |
|
When we use the command mpiexec -n 4 /opt/QuantumWise/atk-12.2.0/atkpython/bin/atkpython /home/user/my_script.py > /home/user/my_script.log [code] to submit a job (my_script.py) the simulation hangs after some time indicating that the slaves are not working.what might be the cause ?[/code]
|
|
|
|
|
Logged
|
ATK 11.8.0
|
|
|
|
Anders Blom
|
 |
« Reply #10 on: July 17, 2012, 11:01 » |
|
It's very difficult to say, it seems like some network problem. The error message is not printed by ATK, but by ssh showing that it is having problems connecting to the slave nodes. Check that mpd and is properly set up, and make sure the "test_mpi.py" script finishes properly before moving on to running more complex scripts.
|
|
|
|
|
Logged
|
|
|
|
dhurba
Regular ATK user

Reputation: 0
Offline
 India
Posts: 48
|
 |
« Reply #11 on: July 17, 2012, 11:23 » |
|
When i changed the host name to the IP ( for example 10.0.0.4) the message stopped coming ..
now mpiexec run on 10.0.0.4 gives the following log file
|
|
|
|
|
Logged
|
ATK 11.8.0
|
|
|
|
Nordland
|
 |
« Reply #12 on: July 17, 2012, 11:59 » |
|
When i changed the host name to the IP ( for example 10.0.0.4) the message stopped coming ..
now mpiexec run on 10.0.0.4 gives the following log file
I think you setup your machine file, such that it sends the job to other machines instead of 4 processes on the same machine.
|
|
|
|
|
Logged
|
|
|
|
dhurba
Regular ATK user

Reputation: 0
Offline
 India
Posts: 48
|
 |
« Reply #13 on: July 18, 2012, 07:30 » |
|
It's very difficult to say, it seems like some network problem. The error message is not printed by ATK, but by ssh showing that it is having problems connecting to the slave nodes. Check that mpd and is properly set up, and make sure the "test_mpi.py" script finishes properly before moving on to running more complex scripts.
Yes when i type 'mpdtrace' or 'mpdboot' it gives as "command not found" Doesnt MPD installs automatically with mpich2. the installation procedure we got does not have any mention about mpd it shows only to install mpich2 by unzipping in the usr folder ./configure make make install and editing bash
|
|
|
|
|
Logged
|
ATK 11.8.0
|
|
|
|
Anders Blom
|
 |
« Reply #14 on: July 18, 2012, 12:25 » |
|
No, mpd is not installed with MPICH2 anymore, as I recall. The default process manager is now hydra, which doesn't need the mpd* commands.
However, your parallel environment seems ok, however all MPI process run on the same node since you have not told it on which machines you want to run. For that you need a machinefile (see the MPICH2 documentation about that).
|
|
|
|
|
Logged
|
|
|
|
|