Internet2
Site Index | Internet2 Searchlight |
Membership | Communities | Services | Projects | Tools | Events | Newsroom | About
 | Home
End-to-End Performance Initiative

Resources
> Tools
> Presentations
> Case Studies


Network Performance
> perfSONAR-PS
> BWCTL
> OWAMP
> NDT
> Workshops
> pS-NPToolkit
> Phoebus


Community Engagement
> Working Groups
> Collaborations
∑∑

Return to Table of Contents: Bridging the Gap Workshop Report


Appendix E: Functional Community Comments

Researcher comments included:

  • The important thing is to have the expertise available to the researcher – not the expectation that the researcher WAS the expert
  • People don’t know what they need. Don’t know what to expect. Don’t know that they should contact someone about their ‘performance’.
  • If you want to move a gig file on Internet2 from point A to B, a list of metrics so you know what you should see. Such as with GridFTP with x number of channels.
  • That could end up being a list with about 30 variables – a whole matrix.
  • Even if you could get people to the general order of magnitude of what to expect. General OS information.
  • The netflows that Internet2 captures weekly can show the peak/curve – a lot of people are getting slow performance and there are specific groups that are getting good perf – they’ve ‘figured it out.’ Campuses could do that on their campus.
  • Internet2 has cakeboxes – reference systems that could be made available so that people could run tests to check their network. Internet2 could provide. It might help problem-solvers if you could ship them a box that had been ‘qualified’.
  • Could Internet2 be a collection point for high-speed experiences – have the researchers say what they’re working. Here’s what we’ve used, here’s what we’ve been getting with it, etc.
  • You want to post it on the web but… that could be as simple as sending mail to a mail list or it could go on a web page, database, wiki, what? R: database – you could refer researchers to that dbase so they could see what people are using.
  • One of the problems with this is the network is ‘invisible’ in this – unless syou have Web100 and can grab the stats on that, the performance is based on the quality of the network you were using at the given moment of the test.
  • You might not even be able to put up such a system that duplicates (parallel to the ones you’ve read about) another persons experience because technology changes so fast.
  • People be come outraged that they cannot transfer data at ‘x’ rate, because they don’t have reasonable expectations. The anecdotal evidence – set expectations.
  • Why are we worrying about ‘moving files’? the technology should just happen.
  • Ultimately you’re still moving files and ultimately people still care about the speed at which they can do it.
  • It may be part of what we need to diffuse the benefit from this meeting is to have meetings at JTs, Internet2 meetings, NANOG to have sub-meetings that focus on performance tuning. It would be good to understand what a highly-tuned, high-perf box should look like. That way, the designers of boxes will not just ‘lop off’ the valuable segments.
  • We’ve talked about that at Internet2, having a ‘certified performance package’ discussion so that people could just order it from x provider.
  • We’ve also talked about having random tests to locations at universities in our membership to ‘rank’ the quality of the networks of our members.
  • Somewhere in Internet2, there are tables on whether there are jumbo frames etc. there are things that should be added to those tables. You’d like to indicate whether there are hosts USING the jumbo frames, etc.
  • This ‘rating’ is fraught with political problems – people don’t want to ‘look bad’.
  • You could publicly list the 10 best and privately provide each campus with a report card that they could take to their provost.
  • There’s a project underway to get campuses and GigaPoPs to deploy some of our tools (Network Diagnostic Tool; Bandwidth Test Control Tool, and One-Way Ping) at the GigaPoP and campus-edge level so that there are places to which people could run tests.
  • CENIC, for example, have agreed to run these so they have testing points deployed.

 

Applications Developer comments:

  • Researchers will run tests at one center, the data will be sent in small packets to each sub-center and then, when that packets has been reviewed, it gets heaved and another section of the whole (another small packet) gets sent along to the researchers.
  • Meta-data is an important part of this – network data storage requires meta-data to make it easier to identify what pieces are where.
  • How much thought goes into the decision to have storage in one place and the compute-cluster of machines in another?
  • You need to meta-schedule that – how do you arrange to have your schedule of computers match up with the schedule of data transfer time and match up with specific storage areas. You can get latency if the storage for the data is far away so you want to meta-schedule it such that the transfers happen at the best time, etc.
  • do you think most researchers understand the difficulties with this?
  • I don’t think most of them do – with physics, they understand that idea that they need to move large data sets long distances on a routine basis.
  • More of it is determining the major problems that might be occurring rather than the twiddling of dials to fine tune the transfer itself.
  • The system and the whole needs to worry about how it optimizes the jobs and get the most work done. Complicated calculation – need to understand what you can do with the network (any changes I can make to make the network more efficient?).
  • I’m not sure, except in some cases (like Sean’s), that instrumenting the application will take longer than increasing the bw so I don’t need to.
  • What I often see is that I ask something to go ‘at x speed’ (vs. ‘as fast as possible’) – in many cases, knowing the bw would be more useful than going as fast as possible. I can schedule transfer time, etc. if I know when it would start and how long it will take. We’re going to add scheduling into GridFTP based on the BW/Disk capability.

 

Network Engineers comments:

  • One thing I got out of this conference was ‘availability’ – the biggest hurdle is getting people together. Setting it up (weekly conference call, etc.) where there’s a set time so people would know that ‘these resources will be there’.
  • For VDT, they have ‘office hours’ where they go to a virtual space at a specific time and people can join to ask questions.
  • Getting the tools out there that the apps developers can run so the network engineers can use the data to solve problems.
  • How do you bootstrap this – there is very little overlap between network engineers who are busy running their networks and researchers who are doing their research.
  • Getting everyone into one place seems to be the answer. Shaving off the separation layers.
  • NEES has done that pretty well. Could you describe.
  • Everything comes in via one point-of-contact. We have user training, where they go around and hold workshops to show users how to run tests, etc. We are more interested in how we work with the sys admins at specific research sites. We need to figure that out.
  • We work a lot with Europe and Asia – we can’t jus tsay ‘we ave tools, we’re going to deploy them, etc.”. We have to work with their ‘system’ of dealing with problems which can take too long for the data logs lifecycle. It comes out to be a bigger problem sometimes because of the 
  • We’ve run into that – system people have a single cube ticket; someone goes into that ticket from the network end and, if they don’t get access into the system quickly, it can be a long and tedious process to solve a simple problem.
  • Network engineers like to know each other
  • It is required for folks to know each other – it can be fatal for people to spend a day finding the one person they need to have help them with a ‘simple’ fix.
  • I’m just getting into this field and it is very hard to get up to speed to meet the right people, etc.
  • Do the folks who know each other in Europe, do they bypass the ‘system’ to get things done?
  • I haven’t seen that, so far.
  • it is important to have a culture where all the parties involved get TOGETHER to solve the problem vs. having the tools or handing the problem off to someone else to ‘solve it’ or solve that segment.
  • The reason that Internet2 can quickly solve problems is that 1) we know lots of people and 2) we can call on a lot more people. It is very hard if you don’t know people to get 
  • How do you grow that cadre of connections?
  • I go to JTs to talk with people.
  • You develop it over time – you work out a problem in concentric circles – I don’t have a rolodex of names I could hand over to someone new. Besides, there’s a trust-relationship that you have to build up.
  • Problems DO get prioritized – even if you have a contact, they have other priorities.
  • It is reciprocal – you have to be willing to give time out to get time back from someone.
  • Is JTs international?
  • We just changed the name – we’ve been getting lots of international registrations. Primarily US (co sponsored by Internet2 and ESnet) but, because we deal with global research projects we’re going more global. The last one was with CANARIE and a previous one was with APAN.
  • very informal networking.
  • how do you socialize people in the network engineering area into the discipline issue problems? Maybe need internships – work in a community.
  • every group is very busy with what they are doing and don’t want to be bothered
  • How many network engineers were invited, how were they selected and how many declined to come?
  • as many of you know, this was rescheduled – it was done so because we had no researchers and apps developers who were available to attend. We had to go out and actively recruit people.
  • the biggest response was from network engineers – they wanted to come.
  • we arranged for the wizards ahead of time but the apps developers and researchers we had to solicit from people we know. So, even for this workshop, it was hard to create the interest in this intersection.

 

Wizards comments:

  • I learned the problem is bigger than I thought it was. There are a lot more things I hadn’t been considering that researchers need to keep in mind. Problems I’ve never had to worry about because, in my org, there’s a ‘line’ on ‘network performance debugging’ that is far beyond what other orgs would consider necessary.
  • Bit of déjà vu from the organizing meting on forming the E2Epi – need for cookbooks, war stories, success stories, etc. that we’ heard about again here. There is a latent need for more mundane things on how to get the information out – such as best practices, etc. I’m also interested in the intersection between the perf diagnostic problem and the network failure diagnostic problem (introduced by the ‘security’ gizmos).  Wondering what we need to do to put pressure on router developers, etc. to deal with why they don’t tell us why they’ve dropped packets, etc. fundamental issues that get extremely difficult (info out of the black box). I’m not sure how to proceed to make progress on either of those but I think they’re important.
  • I’d reiterate the ‘déjà vu’ thing – in many areas, we’ve tried to do ‘war stories’ and put info on the web; still need to do that more.
  • We’ve tried to do that and had a difficult time collecting them!
  • Need Internet2 to sponsor the LAN diagnostic record!
  • do we need to get at this when they’re closer to their student years? When folks are learning to code. I don’t know if we’re discussing the root problem.
  • if you were learning to code in 1992, you were concerned with congestion control, I think that if you’re teaching them the problem will change underneath them.
  • I think we’re getting at something deeper – more than just learning the knowledge; there’s the process that is missing. The community is needed – everyone is doing their own thing in a vacuum. To get that going at a student-level would be very important. 
  • If we’d started training folks to work together 10 years ago, we’d have a different view of the process.
  • I hope that, as we’re leaving, folks understand that there’s problems across all these areas and that they’re interconnected. HENP has known that they are interconnected with their problems and have been working on it for years; VLBI is just learning it. How about the other discipline communities? Do we leave them to learn it on their own? Do we try to communicate with them about this? Do we get the funding agencies to understand that this is an important process for them.
  • Is there a knowledge transfer in Internet2 communities  to researchers in various communities?
  • The business I’m in is, inherently, international. One of the things that’s so useful to us is the International reception so that we can make f2f contacts with people we’ve been talking with for some period. Knowing who to talk to in various countries really helps out.
  • one thing I’m curious about – do you think that the bidding on the new array is going to a) increase collaboration or b) increase competition?
  • both. It is a multi-billion dollar project – lots of jockeying at this point. The radio astronomy community is pretty small so we’re always working together anyway. It forces the collaboration.
  • It also forces the competition, a lot with funding, depending on how much it is being pushed in specific countries.
  • That’s healthy – there needs to be competition for ideas and methods, etc.  It is, however, a playing field that is tilted, hugely, by the amount of money that is available for the research…
  • google is the answer because everyone uses it.
  • google directs you to methods you can use to find ways to solve a problem.
  • What are the keywords to put into web docs to get it googled?
  • Will it be used? How? What do they need?
  • How much time should be spent on this? How much time has been spent?
  • The hard problem is the trickle-down aspect – I can write for my peers but writing for people down the pipe is harder. I don’t know what they know?
  • If you go down that path, people expect you to finish it.
  • Who is in charge of this?
  • Whoever touches it last. Sometimes I don’t touch something because I don’t want to ‘own’ it.
  • How do you approach this?
  • You feed a wiki with false information to anger one of the Matt’s!
  • You start a blog or wiki or message boards so you have a complete loop.
  • Do you think this is a good idea? What is the reliability of the source? 
  • Post, post, post. They can try out whatever they want or appears to meet their needs. If it doesn’t work, they’ll provide feedback.
  • That was one of the things we were trying to establish this – what you’re missing is the ‘why’ – why should I search for this? If I’m searching for A, should I also be searching for B, C, and D? The Matt’s know this but how do I know it? Only through talking to them!
  • Need to get down to the next level – keep going over the same ground because you haven’t trickled the info out and down or up to funding agents. What things are people needing? Need to identify the kinds of real problems and priorities you need to address – otherwise, it is a jump from a problem to a solution.
  • Documentation is good and dissemination is not impossible but it takes  a lot of time!
  • Very important point – it is ‘the last thing we do’ but you keep talking about how it is ‘built into the process’ (as has been mentioned throughout) – but it keeps getting done at the end on people’s personal time. It has been clear that you have a wide span of users – some are near expert and some need things ‘idiot-proofed’. Knowing what your range of users are will address the range of documentation.  Funding/focusing on the importance of documentation.
  • Some orgs identify that they need to reach out to their users but the sticking point is funding – we need to get the funding agencies involved in this at a larger level.
  • Would it scale? Every community needs a different Point of Contact.
  • A central point of contact – point people to the right place.
  • Another model is to ask a central organization, such as Internet2, to fund it out of subscriber fees.
  • Work with your community but there are thousands of them.
  • Next step up is the funding agency that is funding all the research projects.
  • Incentive for the program officers needs to come from the research community and the network people.
  • There’s been research on this topic – knowledge management; it is widespread across many fields – people leaving an org (retirement) that have valuable info that needs to be collected and maintained. Very hard problem to solve based on the breadth of  it. 
  • There’s confusion between many terms; some look like the same thing but are very different. Sometimes someone uses one term ‘correctly’ but someone else will use the same term ‘incorrectly’ – how do you differentiate between what the user wants to use.
  • Discussion of the different meanings of ‘lag’ – different causes, different symptoms.
  • Note: this is clear that this is related to the dissemination of information. Akin to a patient telling a physician that they have a stomach ache – the physician knows of a wide range of causes for this and a wide range of smaller symptoms to suggest.
  • Who are the right people to bring to the table: novice researchers, folks outside the U.s., resource allocation folks (those with control of the $), vendors (Cisco, etc.), community of people doing domain-science research (not ‘network wizards’ but ‘network researchers’) – they have a different view of what ‘performance’ means. 
  • If this wasn’t in A2, would you (a researcher) have come to this workshop?
  • If it was convenient and I didn’t have anything of a higher priority, yes. I’m glad I cam and I learned a lot (besides having my problem solved – if I’d KNOWN it would be solved, I would have come, despite any difficulties!) that made it valuable but I didn’t expect it to be as valuable as it was.
  • We keep talking about tts and talking to the NOC… need representatives from the NOC/TT workers.
  • We’ve dealt with the ‘high-speed networking for dummies’ material – some problems are more intermittent, more demanding. Some groups have a regularly-scheduled ‘hack-fest’ to work on problems. We’d need to id who has major problems that would respond well to group collaboration.
  • Communication – shared vocabulary, create a ‘Rosetta stone’ of terms, and keep an open line of communication – regular phone calls/meetings? Intentional conferences. Dinner with 12 strangers – setup meeting with 12 people who don’t really know each other to expand trust relationships. Gets more new people into the grouping. 
  • Re: setting expectations re: performance, would it be reasonable to setout specifications of what each out-of-the-box component is capable of performing?
  • We’re looking at documenting what we’ve achieved with various hardware but, at this point, the data is unreliable and incomplete. We have default recommendations for vendors on things like TCP stacks, etc. Vendors are concerned about changing defaults because that opens them up to additional potential problems.
  • what is the time/cost tradeoff is for ‘good enough’ – everyone has a different view of what you need and what that’s worth.
  • the MonaLISA EMMA client gets local information when running tests like NDT that provides details re: NIC, etc.  Over time, you could collect information on various pieces of equipment that would identify what the average user is actually ‘getting’ with specific hardware. But who hosts the central repository and how do you protect that from DoS attacks…

 

General Comments

  • The one that crosses the boundaries and makes/maintains the lines of communication. Even if you talk about having that Liaison role, who funds it? Still need to understand that there is that jump between the problem and the solution – need to identify how to improve communication so that this breaks down the problems to more correctly address the problem range.
  • What kind of action items should we take out of this to keep this effort going? Getting groups together – mixture of folks (discipline basis? Regional basis?)
  • interact with user disciplines where they are already attending – get the network folks to the disciplines.
  • A very hard to make a presentation there – is there a way around that? A poster session? 
  • Or via NSF? They fund much of this research – if the funding agency recognizes it it will help.
  • who should come – the networker? The researcher? The Internet2 connection?
  • Internet2 rep and/or someone from a similar basis. Develop a more-or-less canned presentation that opens the eyes of the researchers on what they can do, what they should expect, and serve as a conduit for more efforts.
  • Network ‘swat’ team that could go to network-specific conferences.
  • this is exactly the stuff that NLARN has done and , for various reasons, it hasn’t continued.
  • I have had a positive experience with this, via Internet2, but several other communities have NOT had a similar experience. Internet2 was there from the beginning and challenged/nurtured us.
  • How much effort was involved on Charles’ part – is that scalable?
  • By going to some of their conferences, I managed to convince them to hold a 1day workshop on network performance. After badgering them, they grudgingly convinced them to do so but after holding the conference, they realized they could have done several more days worth – they will continue doing this. I was fairly successful with VLBI because they were already going pretty well, knew what they were doing, but the VLBI folks really needed a guide.
  • We were directed at an early stage of our work with the physics community and wrote several proposals with them; but the problem that we ran into was that NSF funding was available for the physics portion but not for the astronomy portion.
  • Ways to engage projects seems successful – another approach might be campus-level.
  • Having done an Internet2 day at a campus, it is very difficult to engage with someone unless they are involved in a project and they have encountered problems. Campuses have so many diffuse needs but we haven’t been as successful
  • You might engage a single point of contact on campuses to provide them with information with so they can reach out to the researchers on campuses to offer help with problems.
  • The work with a research group really requires someone listening to what the group is doing, seeing their needs and problems, and having discussions with them efore bringing in experts.
  • Would campuses be willing to pay to bring 3-4 people to a campus to help a range of folks?
  • Might offend network folks on campus.
  • We can have some trainings that are done by video for a little less but folks would be willing to pay to fly experts out sometimes.
  • Doesn’t really scale.
  • We need to find some method to ‘hide’ the network from the end-user, eventually, so they don’t need to have any expertise; just run their app and not have to worry about the pipelines and speed etc.
  • The biggest barrier to having a transparent network for the end-user is that, ironically, there is too much hidden from the network from the network engineers. Re: swat team coming out, how much could they do without the time/effort of the campus engineers?
  • SWAT teams are stymied by things they cannot control – “the part will be in next week,” or “the person you need to talk to isn’t here,” etc.
  • We talked about ‘virtual office hours’ – if people knew what the hours were – or an email address for ‘interesting network problems?
  • What the Quilt was trying to do was getting adequate resources together to solve problems
  • Many pronged attack – virtual office hours, email list, regular meetings, etc.
  • Who isn’t complaining loud enough?
  • There are many out there who have either given up or don’t know who to complain to, yet.

 


Return to Table of Contents: Bridging the Gap Workshop Report