IEPM - BW: Infrastructure development experiences
The first instantiation was for making iperf and bbcp memory to memory (from /dev/zero to /dev/null) measurements from a single site, SLAC, to a set of 20-30 hosts at remote sites. The hosts were selected to be at PPDG sites, or sites with strong collaborations with SLAC (usually High Energy or Nuclear Physics sites or Internet performance measurement sites). For each remote host we needed an account with ssh logon access. We successfully demonstrated the first instantiation for the SC2001 Bandwidth Challenge: Bandwidth to the World project in November 2001.
One of the first steps was to contact people at the remote sites to request the accounts. It took about 7 weeks to get accounts on suitable hosts at about 25 sites. The variety of forms and procedures required at the sites was a revelation in itself, ranging from just a phone call to multiple paper forms that had to be FAXed, or web forms requiring considerable personal details.
We logged onto the first few remote hosts, set up the ssh keys, and copied over and installed the various initial applications (e.g. iperf, bbcp) by hand.
The diversity of remote host: hardware, operating systems (OS), NICs, directory structures (e.g. the userid, where to find various applications, the home directory) required a simple database to enable remote ssh access to execute commands. This was implemented as a perl require script. This database also enabled us to provide an alias for each remote hosts so some level of privacy could be maintained, as well as customizing how to call the measurement tools, and keep the email address of the contact for each host.
We used the Unix cron facility to schedule tasks at the measurement host. The ssh keys were saved in an AFS file on the measurement host and its tokens timed out after 25 hours so we had to use an AFS unattended token renew mechanism (trscron) to renew the tokens for cron jobs. We also added code to the measurements to verify that we had a token.
Early on we had many problems with the iperf server becoming non-functional on the remote host. There were several causes including the host was rebooted and iperf was not restarted, in other cases the iperf process just disappeared or was still present but not responding (though in some cases it still had the TCP port attached). Some of these needs to restart the iperf server may have been due to an exhaustion of threads observed in Linux 2.4.1x.and reported by Jin Guokun of LBNL. We were also concerned about leaving servers like iperf running all the time since that could assist in a denial of service attack. We therefore decided to start the remote server before each measurement and kill it when the measurement was complete. This was a big help, even though it increased the complexity of the measurement process.
We also found it necessary to time out processes since some would hang up and run forever, or others would run for elongated times. This complicated the code since it required it to fork processes.
We added ping and traceroute (with only one measurement per hop to reduce execution time) to the measurement suite in order to have an ongoing record of RTT and routes. We also started to add other measurements to the suite for test and comparison purposes. These initially included bbcp disk to disk, bbftp and pipechar. Since pipechar tended to run for long periods compared to the other tools, we reduced the frequency of pipechar measurements for each host to one in 4 rounds of measurements. We also found we had to increase the frequency for each round of measurements from 1 hour to 2 hours in order to complete each round in time. Most of the delays were caused by timeouts, so we also worked on optimizing the timeouts by reviewing the reasons.
We developed a tool (remoteos.pl) to automate much of the remote host initial and update installation procedures and documented the procedures. In addition we developed a tool (getbwversions.pl) to query and report on the configuration* of the remote host (MHz, number of cpus, TCP window sizes, OS etc.) as well as identifying what versions of the measurement tools were installed.
We added about 7 more remote hosts to the monitoring during this phase, and as a result documented and simplified the procedures for adding new remote hosts.
We ran into problems getting ssh to work properly when the remote host was running version 2 and while the measurement host was defaulting to version 1. This was tracked down to an ssh mis-configuration error in the measurement host. We also ran into difficulties in capturing all the ssh output from commands, especially when running multiple processes. A third ssh challenge was making ssh work through a gateway machine which required cascading the ssh commands. We are still working on trying to make scp work through the gateway. When using an OpenSSH client with an SSH Communications, Inc. server we found we had to reformat the public key (see the FAQ) before saving it on the server host. There was also confusion about exactly where to save the public key on the server especailly for protcol version 2 servers: the directory was sometimes called .ssh and other times .ssh2; sometimes the public key was appended to the file auhorized_keys, other times authorized_keys2; sometimes it was placed in a separate file with a pointer being placed in a file called authorization. A big help in understanding the various ssh problems was found in SSH: The Secure Shell (The Definitive Guide).
Usually we were able to copy the measurement executable that had been built at SLAC for the appropriate OS version, to the remote host. However, in some cases there were library incompatibilities. In about 40% of the iperf cases, and 20% of bbcp and bbftp cases we had to make the executables on the remote host. The information on whether an executable had to be made on the remote host was kept in the configuration database. Executables such as ping, traceroute and pipechar did not need anything to be installed on the remote host.
When measuring disk to disk throughput on fast links we had to be careful to understand the effects of caching. We used the Solaris Unix File System mount forcedirectio facility to ensure that the source files were not cached (use the Solaris man mount_ufs command for more details) when we were reading them on the measurement host. Though this gave us a realistic estimation of disk read speed for large files, it also meant that for high speed links the gating factor in overall file transfer rates was often the speed of the disk reads. To understand the effects of disk I/O on overall file transfer rates, we made a separate study of Disk Throughputs for various Operating Systems and file systems, with and without caching and with and without commiting the writes, on about 25 different hosts. If possible we requested large amounts disk space at the remote host. Until we have sufficient disk space set aside we used space in /tmp. At the same we checked and recorded whether the /tmp space was using memory (e.g. swap space on Solaris) see Remote host configurations.
Some hosts blocked a protocol, or rate limited. If this was permanent, for example a host did not respond to pings, then it was simple to add this to the configuration database. In other cases, ssh access or the applications server port would be blocked due to security concerns. To detect such failures we logged attempt information and developed a tool (codeanal) to analyze the logs to highlight repeated failures, so we could send email to the host contact. For cases where we were unsure if the port was blocked, we tested by running the iperf server on the port at the remote host, and then running the iperf client at SLAC to see if the port was accessible. We checked what ports were required by a particular application by reading the man pages and also by tracing the packets by running tcpdump. We documented which ports needed to be open to the remote host in Host requirements.
At any given time, we observed that about 20% of the hosts would be unreachable via ssh. Included among the reasons were: the host was changed or removed; diffficulty in getting account/password or getting ssh to work on a changed host; problems with using Kerberos credentials to access the remote host in an unattended fashion; concerns at the remote site about charging for usage; difficulties in interworking between various versions of ssh; host was wrongly configured; link to host was down for a long period (e.g. several weeks in one case where a new link from Europe to Chicago was being brought up).