Click here to Skip to main content
15,940,173 members
Home / Discussions / Linux Programming
   

Linux Programming

 
QuestionDAOS Command Fails with "Transport layer mercury error" on CentOS 7.9 Pin
northernlights from Bombay10-Jun-24 11:06
northernlights from Bombay10-Jun-24 11:06 
Hello,

I'm encountering an issue when running the daos cont create command on my DAOS setup. The command fails with a "Transport layer mercury error." Below are the details of the error and my setup:

Command and Error Message:

[root@client2 ~]# daos cont create tank --label mycont
external ERR # [5323.920594] mercury->msg: [error] /builddir/build/BUILD/mercury-2.1.0rc4/src/na/na_ofi.c:3047
# na_ofi_msg_send(): fi_tsend() failed, rc: -2 (No such file or directory)
external ERR # [5323.921055] mercury->hg: [error] /builddir/build/BUILD/mercury-2.1.0rc4/src/mercury_core.c:2727
# hg_core_forward_na(): Could not post send for input buffer (NA_NOENTRY)
hg ERR src/cart/crt_hg.c:1104 crt_hg_req_send_cb(0x2a5c8c0) [opc=0x1020004 (DAOS) rpcid=0x18ea69b600000000 rank:tag=0Blush | :O ] RPC failed; rc: DER_HG(-1020): 'Transport layer mercury error'
mgmt ERR src/mgmt/cli_mgmt.c:882 dc_mgmt_pool_find() tank: failed to get PS replicas from 1 servers, DER_HG(-1020): 'Transport layer mercury error'
pool ERR src/pool/cli.c:198 dc_pool_choose_svc_rank() 00000000:tank: dc_mgmt_pool_find() failed, DER_HG(-1020): 'Transport layer mercury error'
pool ERR src/pool/cli.c:503 dc_pool_connect_internal() 00000000:tank: cannot find pool service: DER_HG(-1020): 'Transport layer mercury error'
ERROR: daos: DER_HG(-1020): Transport layer mercury error



Environment Details:

DAOS Version: daos-2.0.3-5.el7.x86_64
DAOS Client Version: daos-client-2.0.3-5.el7.x86_64
Libfabric Version: libfabric-1.15.1-1.el7.x86_64
Mercury Version: mercury-2.1.0~rc4-9.el7.x86_64
CentOS Version: CentOS 7.9
Fabric Interface: enp0s3



Additional Information:

[root@server ~]# ip addr
1: lo: <loopback,up,lower_up> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <broadcast,multicast,up,lower_up> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:95:c2 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3
valid_lft 564sec preferred_lft 564sec
inet6 fe80::e25:a2fd:9904:a8ac/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: enp0s8: <broadcast,multicast,up,lower_up> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bb:cb:4d brd ff:ff:ff:ff:ff:ff
inet 192.168.56.104/24 brd 192.168.56.255 scope global noprefixroute dynamic enp0s8
valid_lft 370sec preferred_lft 370sec
inet6 fe80::8d85:5b39:5f73:6e0b/64 scope link noprefixroute
valid_lft forever preferred_lft forever


I have also mentioned the DAOS server, client, and agent configuration files for reference.

DAOS Server
## default: daos_server
name: daos_server
#
#
## Access points
## Immutable after running "dmg storage format".
#
## To operate, DAOS will need a quorum of access point nodes to be available.
## Must have the same value for all agents and servers in a system.
## Hosts can be specified with or without port. The default port that is set
## up in port: will be used if a port is not specified here.
#
## default: hostname of this node
access_points: ['10.0.2.15']
#
#
## Default control plane port
#
## Port number to bind daos_server to. This will also be used when connecting
## to access points, unless a port is specified in access_points:
#
## default: 10001
port: 10001
#
#
## Transport credentials specifying certificates to secure communications
#
transport_config:
# # In order to disable transport security, uncomment and set allow_insecure
# # to true. Not recommended for production configurations.
allow_insecure: true
#
# # Location where daos_server will look for Client certificates
client_cert_dir: /etc/daos/certs/clients
# # Custom CA Root certificate for generated certs
ca_cert: /etc/daos/certs/daosCA.crt
# # Server certificate for use in TLS handshakes
cert: /etc/daos/certs/server.crt
# # Key portion of Server Certificate
key: /etc/daos/certs/server.key
provider: ofi+sockets
socket_dir: /var/run/daos_server
nr_hugepages: 4096
control_log_mask: DEBUG
control_log_file: /tmp/daos_server.log
helper_log_file: /tmp/daos_admin.log
engines:
-
targets: 8
nr_xs_helpers: 0
fabric_iface: enp0s3
fabric_iface_port: 31316
log_mask: INFO
log_file: /tmp/daos_engine_0.log
env_vars:
- CRT_TIMEOUT=30

scm_mount: /mnt/daos0
scm_class: ram
scm_size: 8

DAOS Control file

# default: daos_server
name: daos_server

# Default destination port to use when connecting to hosts in the hostlist.
# default: 10001
port: 10001

# Hostlist, a comma separated list of addresses (hostnames or IPv4 addresses).
# default: ['localhost']
hostlist: ['10.0.2.15']

## Transport Credentials Specifying certificates to secure communications

transport_config:
# # In order to disable transport security, uncomment and set allow_insecure
# # to true. Not recommended for production configurations.
allow_insecure: true
#
# # Custom CA Root certificate for generated certs
ca_cert: /etc/daos/certs/daosCA.crt
# # Admin certificate for use in TLS handshakes
cert: /etc/daos/certs/admin.crt
# # Key portion of Admin Certificate
key: /etc/daos/certs/admin.key

DAOS Agent file
# default: daos_server
name: daos_server

# Management server access points
# Must have the same value for all agents and servers in a system.
# default: hostname of this node
access_points: ['10.0.2.15']

# Force different port number to connect to access points.
# default: 10001
port: 10001

## Transport Credentials Specifying certificates to secure communications
#
transport_config:
# # In order to disable transport security, uncomment and set allow_insecure
# # to true. Not recommended for production configurations.
allow_insecure: true
#
# # Custom CA Root certificate for generated certs
ca_cert: /etc/daos/certs/daosCA.crt
# # Agent certificate for use in TLS handshakes
cert: /etc/daos/certs/agent.crt
# # Key portion of Agent Certificate
key: /etc/daos/certs/agent.key
# Use the given directory for creating unix domain sockets
#
# NOTE: Do not change this when running under systemd control. If it needs to
# be changed, then make sure that it matches the RuntimeDirectory setting
# in /usr/lib/systemd/system/daos_agent.service
#
# default: /var/run/daos_agent
#runtime_dir: /var/run/daos_agent

# Full path and name of the DAOS agent logfile.
# default: /tmp/daos_agent.log
log_file: /tmp/daos_agent.log

# Manually define the fabric interfaces and domains to be used by the agent,
# organized by NUMA node.
# If not defined, the agent will automatically detect all fabric interfaces and
# select appropriate ones based on the server preferences.
#
#fabric_ifaces:
#-
# numa_node: 0
# devices:
# -
# iface: ib0
# domain: mlx5_0
# -
# iface: ib1
# domain: mlx5_1
#-
# numa_node: 1
# devices:
# -
# iface: ib2
# domain: mlx5_2
# -
# iface: ib3
# domain: mlx5_3


Any assistance or insights into resolving this issue would be greatly appreciated. Thank you!
GeneralRe: DAOS Command Fails with "Transport layer mercury error" on CentOS 7.9 Pin
k505426-Jun-24 5:30
mvek505426-Jun-24 5:30 
Questionrfcomm ? Pin
Salvatore Terress21-May-24 9:37
Salvatore Terress21-May-24 9:37 
AnswerRe: rfcomm ? Pin
jschell21-May-24 12:29
jschell21-May-24 12:29 
GeneralRe: rfcomm ? Pin
Salvatore Terress21-May-24 18:04
Salvatore Terress21-May-24 18:04 
AnswerRe: rfcomm ? Pin
trønderen21-May-24 21:17
trønderen21-May-24 21:17 
GeneralRe: rfcomm ? Pin
Salvatore Terress23-May-24 7:05
Salvatore Terress23-May-24 7:05 
GeneralRe: rfcomm ? Pin
trønderen23-May-24 8:33
trønderen23-May-24 8:33 
AnswerRe: rfcomm ? Pin
Richard MacCutchan21-May-24 22:42
mveRichard MacCutchan21-May-24 22:42 
GeneralRe: rfcomm ? Pin
Peter_in_278023-May-24 12:21
professionalPeter_in_278023-May-24 12:21 
QuestionHow to redirect command output - general reference wanted... Pin
Salvatore Terress4-Mar-24 9:15
Salvatore Terress4-Mar-24 9:15 
AnswerRe: How to redirect command output - general reference wanted... Pin
Richard MacCutchan4-Mar-24 9:17
mveRichard MacCutchan4-Mar-24 9:17 
QuestionBluetooth - dead end again... Pin
Salvatore Terress22-Feb-24 14:26
Salvatore Terress22-Feb-24 14:26 
AnswerRe: Bluetooth - dead end again... Pin
RedDk22-Feb-24 18:45
RedDk22-Feb-24 18:45 
GeneralRe: Bluetooth - dead end again... Pin
Salvatore Terress23-Feb-24 2:07
Salvatore Terress23-Feb-24 2:07 
Question3dDeconvolve coding Pin
Sevinc Bayar18-Jan-24 7:31
Sevinc Bayar18-Jan-24 7:31 
AnswerRe: 3dDeconvolve coding Pin
Richard MacCutchan18-Jan-24 22:04
mveRichard MacCutchan18-Jan-24 22:04 
AnswerRe: 3dDeconvolve coding Pin
Dave Kreskowiak19-Jan-24 4:05
mveDave Kreskowiak19-Jan-24 4:05 
AnswerRe: 3dDeconvolve coding Pin
jschell19-Jan-24 5:06
jschell19-Jan-24 5:06 
AnswerRe: 3dDeconvolve coding Pin
Dr.Walt Fair, PE20-Feb-24 10:50
professionalDr.Walt Fair, PE20-Feb-24 10:50 
Questionhcitool numbering hcix Pin
Salvatore Terress18-Dec-23 13:15
Salvatore Terress18-Dec-23 13:15 
AnswerRe: hcitool numbering hcix Pin
jschell1-Jan-24 2:12
jschell1-Jan-24 2:12 
Questionhow to use "whereis" recursively ? Pin
Salvatore Terress28-Nov-23 9:17
Salvatore Terress28-Nov-23 9:17 
AnswerRe: how to use "whereis" recursively ? Pin
Richard MacCutchan28-Nov-23 21:59
mveRichard MacCutchan28-Nov-23 21:59 
AnswerRe: how to use "whereis" recursively ? Pin
jschell29-Nov-23 5:01
jschell29-Nov-23 5:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.