Bridging the Gap: The Problem Defined [BTG Home]
In solving old problems and making new discoveries, scientific communities such as Astronomy, Earth Science, and High Energy Physics rely on globally-distributed high-performance computing environments. These environments have helped advance collaborative scientific projects, but not nearly at levels commensurate with their potential. For example, local area Ethernet networks (FastEthernet and Gigabit Ethernet) could readily provide 100 Mbps and 1000 Mbps data transfers, respectively, in campus networks. Yet 90% of end-users' transactions achieve less than 10 Mbps data rates; and only 1% receive anything close to 100 Mbps! Two problems inhibit a more advantageous use of computing environments for greater advances in scientific projects:
- User expectations for shared infrastructure are too low. With 10 Mbps rates as the perceived norm, many flaws that could be fixed go unreported and unresolved, slowly degrading the system. Resetting expectation levels of scientists and support staff is a major challenge that we propose to address.
- Practices for resolving computing problems often take longer than necessary. Delays are caused, in part, by an opaque network that presents the same symptom of slow performance to end-users regardless of the underlying cause. The challenge involved in addressing this problem is developing procedures and guidelines that truly connect with users’ needs and knowledge and direct them in identifying and accurately defining reported problems so that they are efficiently resolved by the appropriate people.
Proposal and Significance
In the NSF-funded Bridging the Gap Workshop (BTG) in August 2005, we brought together 40 scientists, network operators, application developers, and network wizards from across the country. From these diverse perspectives, the group examined the causes of and potential resolutions to the two fundamental problems mentioned above. Based on Workshop findings, we have developed an approach for reducing the impact of these problems that we propose to implement in a pilot site.
This approach uses a combination of automatic tools to aid in the detection of multiple problems coupled with self-guided troubleshooting documentation tailored to the needs, priorities, and levels of understanding of stakeholders from each of the four groups. Our proposed pilot project will assure the availability of tools and will develop, implement, and test this self-guided documentation with a scientific community.
This proposed project rests on scaleable methods. If our pilot succeeds, results will be expanded across scientific communities so that end-users in any community can easily find and use diagnostic resources to promote progressive escalation through multiple levels of technical support. These fundamental improvements in communication across functional areas will enable engineers and scientists to collaboratively realize network performance potential, ushering in dramatic advances in research.
[NOTE: Weekly Abilene data is available from http://netflow.internet2.edu/weekly/ ]
This material is based in whole or in part on work supported by the National Science Foundation under the NSF Special Projects In Network Research Grant No. SCI-0443254. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).
|