February 13, 2004
from 3:00 - 4:30 pm
SPEAKERS:
Russ Hobby, Internet2 (Moderator)
Jim Ferguson, NLANR [html] [ppt]
Paul Schopis, OARnet
Connie Logg, SLAC [html] [ppt]
Attendees: 50-55 (SRO w/folks in the hall)
Russ Hobby gave an overview of the BoF topics and introduced Jim Ferguson to give an overview of the status of The Performance Advisor. E2Epi/piPEs project has been very supportive and have provided collaborative materials that have been of use to the Advisor. The GGF Network Measurement Working group has also been very supportive and have provided materials that have helped shape the development of the Advisor.
A big part of the effort, development of a network tool interface that applies the various tools that network engineers might be interested in using, has been contacting potential users to find out what is needed. This tool is also designed to emulate the junior network engineer for use by novice end-users – it aims to give advice to users to help them identify what they can do. Sometimes it is as simple as telling the user that they need to contact an engineer; other times, that the performance is ‘as good as it gets’.
Version 1.2 is currently available – Version 2.0 should be coming out in 2-3 weeks. Bundles (methods of using specific known tools) for Iperf, OWAMP (Internet2 piPEs), ping, top, pathchar, ifconfig, traceroute, netstat, and pathload are included in the Version 2.0 release. All code is available via anonymous CVS.
Jim gave an overview of the Advisor architecture – he noted that their PDC (performance data collector) was the only part of the project that overlapped the work of the piPEs project (very similar to piPEs’ Performance Measurement Points) and a brief overview of the features. PDC can be distributed to various machines within an administrative domain so that one network engineer can collect data from a variety of points on campus. He also commented that the bundles are automatically updated, reducing update overhead.
Version 2.0 will allow users to make requests for information (using the GGF Request Schema) – data will come first from the Historical Archiver (caching proxy of sorts) unless the data is not available or the user specifically requests a recent/immediate test. He noted that the test schema was the current one available from the GGF, but that a newer version is due out sometime in the near future. Eric Boyd commented that the GGF NMWG is meeting Wednesday afternoon to review the most recent version of the schema and, hopefully, get it out – he said that all attendees were welcome.
He noted that the most difficult part of the project is the development of their analysis engine – the way to determine what symptoms indicate specific, known problems (i.e., when x symptoms are present, look for a duplex mismatch problem). Jim reiterated that this is designed to sit on end user’s machine -- a Java executable with an easy-to-use GUI front end. He noted that the Advisor is most effective on a machine with the Web-100 kernel – it takes advantage of all the data that can be collected in this manner. He listed the items that will not be complete in Version 2.0 and features they would like to add in later releases. For more information, see: http://dast.nlanr.net/projects/advisor.
Paul Schopis gave an overview of the updates to the H.323 Beacon, a project jointly developed by OARnet and Internet2’s E2Epi, which is designed to troubleshoot videoconferences. He gave a brief overview of voice and VoIP performance measurement problems – the media uses rtp packets for timing control. There are lots of good ICMP and UPD based tools but they don’t capture the details that voice requires. They used a mean opinion score (MOS) concept – based on both subjective and objective feedback. They identified the root causes of ‘experience’ problems (congestion along a path, lack of connectivity, etc.) – they tied this to a list of identified common e2e problem causes (poor equipment, poorly trained users, etc.).
They experimented with existing measurement tools – list available in the slides – and commented on their strengths and weaknesses for Voice and VoIP. None of the existing tools met all the needs of this community (cheap and effective); as they developed it, they used the open source H.323 libraries. He gave an overview of the problem responses and the details you receive to report on the status of the session (whether or not the sessin is successful). New features include an audio/video loopback feature, an emodel base dobject MOS ranking, and slider based subjective MOS rating. You can now customize tests in the most recent version.
Paul reported on several use-cases, where use of H.323 Beacon allowed the users to identify the problems and fix them. He also reported on the best ways to setup H.323 Beacon in a network measurement infrastructure. For more information on the H.323 Beacon, see http://www.osc.edu/oarnet/itecohio.net/beacon/.
(One participant noted that this appears to be an inexpensive method to test connectivity before a location purchases a codec.)
Russ introduced Connie Logg, who reported on the DataGrid WAN – originally demoed for SC01 (called IEPM-BW); afterwards it was adapted for TeraPaths monitoring project (spring 2004). It is currently deployed at SLCA, Caltech, Stanford, LBNL, and CERN. She gave an overview of the architecture – developed from a flat file to MySQL database – and how it is automatically updated. She listed the tests she is currently running – setup to avoid running on top of each other. Results from the probes are written to a data directory and are loaded by a daemon that assures that the dbase is not bombarded by excessive test results.
She noted that the tests currently running can be expanded, at will, and that they are working on allowing on-demand tests. Limits are built in to ensure that tests are run in a timely manner or the request is deleted before it can backup the system and create a potential overload. She noted they break down data into individual components (over plotting). She described their traceroute testing – that routing changes sometimes occur cause data to flip columns! They focus on time series plots, diurnal analysis, traceroute analysis, and bandwidth change analysis with IEPM-BW. The tool is under constant modification and improvement – a number of developers at SLAC and around the world are updating the analysis algorithms.
They are currently developing CGI utilities and are very interested in how that will work in the future. She maintains informational web pages for all of the tools, data, and problems encountered. They aim to make data available to web services, such as MonALISA (Caltech), in the near future and increase the anomaly detection techniques.
She gave a brief overview of her questions/considerations regarding tools she didn’t include – BWCTL (not installed everywhere, doesn’t do multiple iperf streams, and she may want other tests that bwctl doesn’t provide for) and OWAMP (needs special NTP configuration – she wants to discuss this with the OWAMP developers this week).
When asked if the traceroute analysis was useable as a standalone process, she said yes, but the user needs to apply specific formats. When asked about scheduling (a table of schedules), how was the scheduling distributed to the sites? She responded that each location has its own schedule of tables – one ended tests – which allows each location to run their own range of tests but the data is available via web services (tbd). She clarified that there was a single machine at each site doing the tests – though the machines are also running their own tests, in addition to the ones on her schedule. A participant noted that the heavyweight tests could be overlapping (one from SLAC and CERN to Caltech at the same time) and that was why she was considering BWCTL (which has a component to prevent overlapping tests).
|