Matthew Lowe
2010-06-22 10:25:05 UTC
I'm receiving a connection timed out error while trying to boot from a SRP target. The same SRP target works perfectly under a locally running OS (tested in Linux and Windows).
My setup includes one Linux box connected in a back-to-back configuration to a desktop. My goal is to boot the desktop from the Linux server's fast raid array. Each computer has a Mellanox MHES18-XTC (MT25204) card in it, which is running the latest firmware. I've followed the instructions for SRP booting in the Wiki. I think my problem is related to the arbel driver.
On boot gPXE loads fine, it loads the arbel driver, it detects the card, it obtains an IP address from the DHCP server, gets the root-path string and then attempts a sanboot. This is where it times out.
To troubleshoot I compiled gPXE with DEBUG=srp:3 and only got more frequent reporting of the timeout error. The following set of messages repeat several times:
SRP 0x1d9d4 TX request tag 00000000000000000000001b SRP 0x1d9d4 socket closed: Connection timed out (0x4c206035)
The Linux box is a little more helpful. As we can see below, the connection is established just fine, but eventually times out on this end too.
Any help would be greatly appreciated.
- Matt
[24923.754700] [19451]: scst: scst_register_target:469:Target ib_srpt_target_4 (rel ID 5) for template ib_srpt registered successfully
[24955.191467] ib_srpt: Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c9020021b829, t_port_id 0x2c9020021b8d0:0x2c9020021b8d0 and it_iu_len 80 on port 1 (guid=0xfe80000000000000:0x2c9020021b8d1)
[24955.192272] [1207]: scst: scst_init_session:6166:Using security group "Default" for initiator "0x00000000000000000002c9020021b829"
[24955.192282] [1207]: scst_alloc_add_tgt_dev:2976:Device vdisk0 on SCST lun=0
[24955.192290] [1207]: scst_alloc_set_UA:6004:Adding new UA to tgt_dev ffff8800d77928a0
[24955.192537] [19457]: scst: scst_cmd_thread:4171:Processing thread vdisk00_0 (PID 19457) started
[24955.192546] [19457]: scst_cmd_thread:4192:Alloced new IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24955.192573] [19458]: scst: scst_cmd_thread:4171:Processing thread vdisk00_1 (PID 19458) started
[24955.192576] [19458]: scst_cmd_thread:4211:Linked IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24955.192588] [19459]: scst: scst_cmd_thread:4171:Processing thread vdisk00_2 (PID 19459) started
[24955.192591] [19459]: scst_cmd_thread:4211:Linked IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24955.192599] [19460]: scst: scst_cmd_thread:4171:Processing thread vdisk00_3 (PID 19460) started
[24955.192601] [19460]: scst_cmd_thread:4211:Linked IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24955.192612] [19461]: scst: scst_cmd_thread:4171:Processing thread vdisk00_4 (PID 19461) started
[24955.192615] [19461]: scst_cmd_thread:4211:Linked IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24955.192622] [19462]: scst: scst_cmd_thread:4171:Processing thread vdisk00_5 (PID 19462) started
[24955.192625] [19462]: scst_cmd_thread:4211:Linked IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24955.192637] [19463]: scst: scst_cmd_thread:4171:Processing thread vdisk00_6 (PID 19463) started
[24955.192641] [19463]: scst_cmd_thread:4211:Linked IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24955.192659] [19464]: scst: scst_cmd_thread:4171:Processing thread vdisk00_7 (PID 19464) started
[24955.192663] [19464]: scst_cmd_thread:4211:Linked IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24967.090591] ib_srpt: Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c9020021b829, t_port_id 0x2c9020021b8d0:0x2c9020021b8d0 and it_iu_len 80 on port 1 (guid=0xfe80000000000000:0x2c9020021b8d1)
[24967.090601] [1207]: scst_unregister_session:6380:Unregistering session ffff880015b0c9c0 (wait 0)
[24967.090610] [1207]: scst_rx_mgmt_fn:5907:TM fn 10
[24967.090614] [1207]: scst_rx_mgmt_fn:5917:sess=ffff880015b0c9c0, tag_set 0, tag 0, lun_set 1, lun=0, cmd_sn_set 0, cmd_sn 0, priv (null)
[24967.090622] [1207]: scst_post_rx_mgmt_cmd:5836:Adding mgmt cmd ffff8801154a8958 to active mgmt cmd list
[24967.090630] ib_srpt: ***ERROR***: rejected SRP_LOGIN_REQ because another login request is being processed.
[24967.090734] [4319]: scst_tm_thread:5722:Deleting mgmt cmd ffff8801154a8958 from active cmd list
[24967.090738] [4319]: scst_abort_all_nexus_loss_sess:5280:Nexus loss for sess ffff880015b0c9c0 (mcmd ffff8801154a8958)
[24967.090742] [4319]: scst_call_dev_task_mgmt_fn:4540:Calling dev handler vdisk_fileio task_mgmt_fn(fn=10)
[24967.090745] [4319]: scst_call_dev_task_mgmt_fn:4544:Dev handler vdisk_fileio task_mgmt_fn() returned 1
My setup includes one Linux box connected in a back-to-back configuration to a desktop. My goal is to boot the desktop from the Linux server's fast raid array. Each computer has a Mellanox MHES18-XTC (MT25204) card in it, which is running the latest firmware. I've followed the instructions for SRP booting in the Wiki. I think my problem is related to the arbel driver.
On boot gPXE loads fine, it loads the arbel driver, it detects the card, it obtains an IP address from the DHCP server, gets the root-path string and then attempts a sanboot. This is where it times out.
To troubleshoot I compiled gPXE with DEBUG=srp:3 and only got more frequent reporting of the timeout error. The following set of messages repeat several times:
SRP 0x1d9d4 TX request tag 00000000000000000000001b SRP 0x1d9d4 socket closed: Connection timed out (0x4c206035)
The Linux box is a little more helpful. As we can see below, the connection is established just fine, but eventually times out on this end too.
Any help would be greatly appreciated.
- Matt
[24923.754700] [19451]: scst: scst_register_target:469:Target ib_srpt_target_4 (rel ID 5) for template ib_srpt registered successfully
[24955.191467] ib_srpt: Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c9020021b829, t_port_id 0x2c9020021b8d0:0x2c9020021b8d0 and it_iu_len 80 on port 1 (guid=0xfe80000000000000:0x2c9020021b8d1)
[24955.192272] [1207]: scst: scst_init_session:6166:Using security group "Default" for initiator "0x00000000000000000002c9020021b829"
[24955.192282] [1207]: scst_alloc_add_tgt_dev:2976:Device vdisk0 on SCST lun=0
[24955.192290] [1207]: scst_alloc_set_UA:6004:Adding new UA to tgt_dev ffff8800d77928a0
[24955.192537] [19457]: scst: scst_cmd_thread:4171:Processing thread vdisk00_0 (PID 19457) started
[24955.192546] [19457]: scst_cmd_thread:4192:Alloced new IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24955.192573] [19458]: scst: scst_cmd_thread:4171:Processing thread vdisk00_1 (PID 19458) started
[24955.192576] [19458]: scst_cmd_thread:4211:Linked IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24955.192588] [19459]: scst: scst_cmd_thread:4171:Processing thread vdisk00_2 (PID 19459) started
[24955.192591] [19459]: scst_cmd_thread:4211:Linked IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24955.192599] [19460]: scst: scst_cmd_thread:4171:Processing thread vdisk00_3 (PID 19460) started
[24955.192601] [19460]: scst_cmd_thread:4211:Linked IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24955.192612] [19461]: scst: scst_cmd_thread:4171:Processing thread vdisk00_4 (PID 19461) started
[24955.192615] [19461]: scst_cmd_thread:4211:Linked IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24955.192622] [19462]: scst: scst_cmd_thread:4171:Processing thread vdisk00_5 (PID 19462) started
[24955.192625] [19462]: scst_cmd_thread:4211:Linked IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24955.192637] [19463]: scst: scst_cmd_thread:4171:Processing thread vdisk00_6 (PID 19463) started
[24955.192641] [19463]: scst_cmd_thread:4211:Linked IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24955.192659] [19464]: scst: scst_cmd_thread:4171:Processing thread vdisk00_7 (PID 19464) started
[24955.192663] [19464]: scst_cmd_thread:4211:Linked IO context ffff88004b1bfbd0 (p_cmd_threads ffff8800d7792990)
[24967.090591] ib_srpt: Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c9020021b829, t_port_id 0x2c9020021b8d0:0x2c9020021b8d0 and it_iu_len 80 on port 1 (guid=0xfe80000000000000:0x2c9020021b8d1)
[24967.090601] [1207]: scst_unregister_session:6380:Unregistering session ffff880015b0c9c0 (wait 0)
[24967.090610] [1207]: scst_rx_mgmt_fn:5907:TM fn 10
[24967.090614] [1207]: scst_rx_mgmt_fn:5917:sess=ffff880015b0c9c0, tag_set 0, tag 0, lun_set 1, lun=0, cmd_sn_set 0, cmd_sn 0, priv (null)
[24967.090622] [1207]: scst_post_rx_mgmt_cmd:5836:Adding mgmt cmd ffff8801154a8958 to active mgmt cmd list
[24967.090630] ib_srpt: ***ERROR***: rejected SRP_LOGIN_REQ because another login request is being processed.
[24967.090734] [4319]: scst_tm_thread:5722:Deleting mgmt cmd ffff8801154a8958 from active cmd list
[24967.090738] [4319]: scst_abort_all_nexus_loss_sess:5280:Nexus loss for sess ffff880015b0c9c0 (mcmd ffff8801154a8958)
[24967.090742] [4319]: scst_call_dev_task_mgmt_fn:4540:Calling dev handler vdisk_fileio task_mgmt_fn(fn=10)
[24967.090745] [4319]: scst_call_dev_task_mgmt_fn:4544:Dev handler vdisk_fileio task_mgmt_fn() returned 1