|
Return to Table of Contents: Bridging the Gap Workshop
Report
Researcher comments included:
- The important thing is to have the expertise available to the researcher – not
the expectation that the researcher WAS the expert
- People don’t know what they need. Don’t know what to expect.
Don’t know that they should contact someone about their ‘performance’.
- If you want to move a gig file on Internet2 from point A to B, a list of metrics
so you know what you should see. Such as with GridFTP with x number of channels.
- That could end up being a list with about 30 variables – a whole
matrix.
- Even if you could get people to the general order of magnitude of what
to expect. General OS information.
- The netflows that Internet2 captures weekly can show the peak/curve – a
lot of people are getting slow performance and there are specific groups
that are getting good perf – they’ve ‘figured it out.’ Campuses
could do that on their campus.
- Internet2 has cakeboxes – reference systems that could be made available
so that people could run tests to check their network. Internet2 could provide.
It might help problem-solvers if you could ship them a box that had been ‘qualified’.
- Could Internet2 be a collection point for high-speed experiences – have
the researchers say what they’re working. Here’s what we’ve
used, here’s what we’ve been getting with it, etc.
- You want to post it on the web but… that could be as simple as sending
mail to a mail list or it could go on a web page, database, wiki, what? R:
database – you could refer researchers to that dbase so they could
see what people are using.
- One of the problems with this is the network is ‘invisible’ in
this – unless syou have Web100 and can grab the stats on that, the
performance is based on the quality of the network you were using at the
given moment of the test.
- You might not even be able to put up such a system that duplicates (parallel
to the ones you’ve read about) another persons experience because technology
changes so fast.
- People be come outraged that they cannot transfer data at ‘x’ rate,
because they don’t have reasonable expectations. The anecdotal evidence – set
expectations.
- Why are we worrying about ‘moving files’? the technology should
just happen.
- Ultimately you’re still moving files and ultimately people still
care about the speed at which they can do it.
- It may be part of what we need to diffuse the benefit from this meeting
is to have meetings at JTs, Internet2 meetings, NANOG to have sub-meetings that
focus on performance tuning. It would be good to understand what a highly-tuned,
high-perf box should look like. That way, the designers of boxes will not
just ‘lop off’ the valuable segments.
- We’ve talked about that at Internet2, having a ‘certified performance
package’ discussion so that people could just order it from x
provider.
- We’ve also talked about having random tests to locations at universities
in our membership to ‘rank’ the quality of the networks of our
members.
- Somewhere in Internet2, there are tables on whether there are jumbo frames etc.
there are things that should be added to those tables. You’d like to
indicate whether there are hosts USING the jumbo frames, etc.
- This ‘rating’ is fraught with political problems – people
don’t want to ‘look bad’.
- You could publicly list the 10 best and privately provide each campus with
a report card that they could take to their provost.
- There’s a project underway to get campuses and GigaPoPs to deploy
some of our tools (Network Diagnostic Tool; Bandwidth Test Control Tool, and One-Way Ping) at the GigaPoP and campus-edge level
so that there are places to which people could run tests.
- CENIC, for example, have agreed to run these so they have testing points
deployed.
Applications Developer comments:
- Researchers will run tests at one center, the data will be sent in small
packets to each sub-center and then, when that packets has been reviewed,
it gets heaved and another section of the whole (another small packet) gets
sent along to the researchers.
- Meta-data is an important part of this – network data storage requires
meta-data to make it easier to identify what pieces are where.
- How much thought goes into the decision to have storage in one place and
the compute-cluster of machines in another?
- You need to meta-schedule that – how do you arrange to have your
schedule of computers match up with the schedule of data transfer time and
match up with specific storage areas. You can get latency if the storage
for the data is far away so you want to meta-schedule it such that the transfers
happen at the best time, etc.
- do you think most researchers understand the difficulties with this?
- I don’t think most of them do – with physics, they understand
that idea that they need to move large data sets long distances on a routine
basis.
- More of it is determining the major problems that might be occurring rather
than the twiddling of dials to fine tune the transfer itself.
- The system and the whole needs to worry about how it optimizes the jobs
and get the most work done. Complicated calculation – need to understand
what you can do with the network (any changes I can make to make the network
more efficient?).
- I’m not sure, except in some cases (like Sean’s), that instrumenting
the application will take longer than increasing the bw so I don’t
need to.
- What I often see is that I ask something to go ‘at x speed’ (vs. ‘as
fast as possible’) – in many cases, knowing the bw would be more
useful than going as fast as possible. I can schedule transfer time, etc.
if I know when it would start and how long it will take. We’re going
to add scheduling into GridFTP based on the BW/Disk capability.
Network Engineers comments:
- One thing I got out of this conference was ‘availability’ – the
biggest hurdle is getting people together. Setting it up (weekly conference
call, etc.) where there’s a set time so people would know that ‘these
resources will be there’.
- For VDT, they have ‘office hours’ where they go to a virtual
space at a specific time and people can join to ask questions.
- Getting the tools out there that the apps developers can run so the network
engineers can use the data to solve problems.
- How do you bootstrap this – there is very little overlap between
network engineers who are busy running their networks and researchers who
are doing their research.
- Getting everyone into one place seems to be the answer. Shaving off the
separation layers.
- NEES has done that pretty well. Could you describe.
- Everything comes in via one point-of-contact. We have user training, where
they go around and hold workshops to show users how to run tests, etc. We
are more interested in how we work with the sys admins at specific research
sites. We need to figure that out.
- We work a lot with Europe and Asia – we can’t jus tsay ‘we
ave tools, we’re going to deploy them, etc.”. We have to work
with their ‘system’ of dealing with problems which can take too
long for the data logs lifecycle. It comes out to be a bigger problem sometimes
because of the
- We’ve run into that – system people have a single cube ticket;
someone goes into that ticket from the network end and, if they don’t
get access into the system quickly, it can be a long and tedious process
to solve a simple problem.
- Network engineers like to know each other
- It is required for folks to know each other – it can be fatal for
people to spend a day finding the one person they need to have help them
with a ‘simple’ fix.
- I’m just getting into this field and it is very hard to get up to
speed to meet the right people, etc.
- Do the folks who know each other in Europe, do they bypass the ‘system’ to
get things done?
- I haven’t seen that, so far.
- it is important to have a culture where all the parties involved get TOGETHER
to solve the problem vs. having the tools or handing the problem off to someone
else to ‘solve it’ or solve that segment.
- The reason that Internet2 can quickly solve problems is that 1) we know lots of
people and 2) we can call on a lot more people. It is very hard if you don’t
know people to get
- How do you grow that cadre of connections?
- I go to JTs to talk with people.
- You develop it over time – you work out a problem in concentric circles – I
don’t have a rolodex of names I could hand over to someone new. Besides,
there’s a trust-relationship that you have to build up.
- Problems DO get prioritized – even if you have a contact, they have
other priorities.
- It is reciprocal – you have to be willing to give time out to get
time back from someone.
- Is JTs international?
- We just changed the name – we’ve been getting lots of international
registrations. Primarily US (co sponsored by Internet2 and ESnet) but, because we
deal with global research projects we’re going more global. The last
one was with CANARIE and a previous one was with APAN.
- very informal networking.
- how do you socialize people in the network engineering area into the discipline
issue problems? Maybe need internships – work in a community.
- every group is very busy with what they are doing and don’t want
to be bothered
- How many network engineers were invited, how were they selected and how
many declined to come?
- as many of you know, this was rescheduled – it was done so because
we had no researchers and apps developers who were available to attend. We
had to go out and actively recruit people.
- the biggest response was from network engineers – they wanted to
come.
- we arranged for the wizards ahead of time but the apps developers and researchers
we had to solicit from people we know. So, even for this workshop, it was
hard to create the interest in this intersection.
Wizards comments:
- I learned the problem is bigger than I thought it was. There are a lot
more things I hadn’t been considering that researchers need to keep
in mind. Problems I’ve never had to worry about because, in my org,
there’s a ‘line’ on ‘network performance debugging’ that
is far beyond what other orgs would consider necessary.
- Bit of déjà vu from the organizing meting on forming the
E2Epi – need for cookbooks, war stories, success stories, etc. that
we’ heard about again here. There is a latent need for more mundane
things on how to get the information out – such as best practices,
etc. I’m also interested in the intersection between the perf diagnostic
problem and the network failure diagnostic problem (introduced by the ‘security’ gizmos). Wondering
what we need to do to put pressure on router developers, etc. to deal with
why they don’t tell us why they’ve dropped packets, etc. fundamental
issues that get extremely difficult (info out of the black box). I’m
not sure how to proceed to make progress on either of those but I think they’re
important.
- I’d reiterate the ‘déjà vu’ thing – in
many areas, we’ve tried to do ‘war stories’ and put info
on the web; still need to do that more.
- We’ve tried to do that and had a difficult time collecting them!
- Need Internet2 to sponsor the LAN diagnostic record!
- do we need to get at this when they’re closer to their student years?
When folks are learning to code. I don’t know if we’re discussing
the root problem.
- if you were learning to code in 1992, you were concerned with congestion
control, I think that if you’re teaching them the problem will change
underneath them.
- I think we’re getting at something deeper – more than just
learning the knowledge; there’s the process that is missing. The community
is needed – everyone is doing their own thing in a vacuum. To
get that going at a student-level would be very important.
- If we’d started training folks to work together 10 years ago, we’d
have a different view of the process.
- I hope that, as we’re leaving, folks understand that there’s
problems across all these areas and that they’re interconnected. HENP
has known that they are interconnected with their problems and have been
working on it for years; VLBI is just learning it. How about the other discipline
communities? Do we leave them to learn it on their own? Do we try to communicate
with them about this? Do we get the funding agencies to understand that this
is an important process for them.
- Is there a knowledge transfer in Internet2 communities to researchers in
various communities?
- The business I’m in is, inherently, international. One of the things
that’s so useful to us is the International reception so that we can
make f2f contacts with people we’ve been talking with for some period.
Knowing who to talk to in various countries really helps out.
- one thing I’m curious about – do you think that the bidding
on the new array is going to a) increase collaboration or b) increase competition?
- both. It is a multi-billion dollar project – lots of jockeying at
this point. The radio astronomy community is pretty small so we’re
always working together anyway. It forces the collaboration.
- It also forces the competition, a lot with funding, depending on how much
it is being pushed in specific countries.
- That’s healthy – there needs to be competition for ideas and
methods, etc. It is, however, a playing field that is tilted, hugely,
by the amount of money that is available for the research…
- google is the answer because everyone uses it.
- google directs you to methods you can use to find ways to solve a problem.
- What are the keywords to put into web docs to get it googled?
- Will it be used? How? What do they need?
- How much time should be spent on this? How much time has been spent?
- The hard problem is the trickle-down aspect – I can write for my
peers but writing for people down the pipe is harder. I don’t know
what they know?
- If you go down that path, people expect you to finish it.
- Who is in charge of this?
- Whoever touches it last. Sometimes I don’t touch something because
I don’t want to ‘own’ it.
- How do you approach this?
- You feed a wiki with false information to anger one of the Matt’s!
- You start a blog or wiki or message boards so you have a complete loop.
- Do you think this is a good idea? What is the reliability of the source?
- Post, post, post. They can try out whatever they want or appears to meet
their needs. If it doesn’t work, they’ll provide feedback.
- That was one of the things we were trying to establish this – what
you’re missing is the ‘why’ – why should I search
for this? If I’m searching for A, should I also be searching for B,
C, and D? The Matt’s know this but how do I know it? Only through talking
to them!
- Need to get down to the next level – keep going over the same ground
because you haven’t trickled the info out and down or up to funding
agents. What things are people needing? Need to identify the kinds of real
problems and priorities you need to address – otherwise, it is a jump
from a problem to a solution.
- Documentation is good and dissemination is not impossible but it takes a
lot of time!
- Very important point – it is ‘the last thing we do’ but
you keep talking about how it is ‘built into the process’ (as
has been mentioned throughout) – but it keeps getting done at
the end on people’s personal time. It has been clear that you have
a wide span of users – some are near expert and some need things ‘idiot-proofed’.
Knowing what your range of users are will address the range of documentation. Funding/focusing
on the importance of documentation.
- Some orgs identify that they need to reach out to their users but the sticking
point is funding – we need to get the funding agencies involved in
this at a larger level.
- Would it scale? Every community needs a different Point of Contact.
- A central point of contact – point people to the right place.
- Another model is to ask a central organization, such as Internet2, to fund it
out of subscriber fees.
- Work with your community but there are thousands of them.
- Next step up is the funding agency that is funding all the research projects.
- Incentive for the program officers needs to come from the research community
and the network people.
- There’s been research on this topic – knowledge management;
it is widespread across many fields – people leaving an org (retirement)
that have valuable info that needs to be collected and maintained. Very hard
problem to solve based on the breadth of it.
- There’s confusion between many terms; some look like the same thing
but are very different. Sometimes someone uses one term ‘correctly’ but
someone else will use the same term ‘incorrectly’ – how
do you differentiate between what the user wants to use.
- Discussion of the different meanings of ‘lag’ – different
causes, different symptoms.
- Note: this is clear that this is related to the dissemination of information.
Akin to a patient telling a physician that they have a stomach ache – the
physician knows of a wide range of causes for this and a wide range of smaller
symptoms to suggest.
- Who are the right people to bring to the table: novice researchers, folks
outside the U.s., resource allocation folks (those with control of the $),
vendors (Cisco, etc.), community of people doing domain-science research
(not ‘network wizards’ but ‘network researchers’) – they
have a different view of what ‘performance’ means.
- If this wasn’t in A2, would you (a researcher) have come to this
workshop?
- If it was convenient and I didn’t have anything of a higher priority,
yes. I’m glad I cam and I learned a lot (besides having my problem
solved – if I’d KNOWN it would be solved, I would have come,
despite any difficulties!) that made it valuable but I didn’t expect
it to be as valuable as it was.
- We keep talking about tts and talking to the NOC… need representatives
from the NOC/TT workers.
- We’ve dealt with the ‘high-speed networking for dummies’ material – some
problems are more intermittent, more demanding. Some groups have a regularly-scheduled ‘hack-fest’ to
work on problems. We’d need to id who has major problems that would
respond well to group collaboration.
- Communication – shared vocabulary, create a ‘Rosetta stone’ of
terms, and keep an open line of communication – regular phone calls/meetings?
Intentional conferences. Dinner with 12 strangers – setup meeting with
12 people who don’t really know each other to expand trust relationships.
Gets more new people into the grouping.
- Re: setting expectations re: performance, would it be reasonable to setout
specifications of what each out-of-the-box component is capable of performing?
- We’re looking at documenting what we’ve achieved with various
hardware but, at this point, the data is unreliable and incomplete. We have
default recommendations for vendors on things like TCP stacks, etc. Vendors
are concerned about changing defaults because that opens them up to additional
potential problems.
- what is the time/cost tradeoff is for ‘good enough’ – everyone
has a different view of what you need and what that’s worth.
- the MonaLISA EMMA client gets local information when running tests like
NDT that provides details re: NIC, etc. Over time, you could collect
information on various pieces of equipment that would identify what the average
user is actually ‘getting’ with specific hardware. But who hosts
the central repository and how do you protect that from DoS attacks…
General Comments
- The one that crosses the boundaries and makes/maintains the lines of communication.
Even if you talk about having that Liaison role, who funds it? Still need
to understand that there is that jump between the problem and the solution – need
to identify how to improve communication so that this breaks down the problems
to more correctly address the problem range.
- What kind of action items should we take out of this to keep this effort
going? Getting groups together – mixture of folks (discipline basis?
Regional basis?)
- interact with user disciplines where they are already attending – get
the network folks to the disciplines.
- A very hard to make a presentation there – is there a way around
that? A poster session?
- Or via NSF? They fund much of this research – if the funding agency
recognizes it it will help.
- who should come – the networker? The researcher? The Internet2 connection?
- Internet2 rep and/or someone from a similar basis. Develop a more-or-less canned
presentation that opens the eyes of the researchers on what they can do,
what they should expect, and serve as a conduit for more efforts.
- Network ‘swat’ team that could go to network-specific
conferences.
- this is exactly the stuff that NLARN has done and , for various reasons,
it hasn’t continued.
- I have had a positive experience with this, via Internet2, but several other communities
have NOT had a similar experience. Internet2 was there from the beginning and challenged/nurtured
us.
- How much effort was involved on Charles’ part – is that scalable?
- By going to some of their conferences, I managed to convince them to hold
a 1day workshop on network performance. After badgering them, they grudgingly
convinced them to do so but after holding the conference, they realized they
could have done several more days worth – they will continue doing
this. I was fairly successful with VLBI because they were already going pretty
well, knew what they were doing, but the VLBI folks really needed a guide.
- We were directed at an early stage of our work with the physics community
and wrote several proposals with them; but the problem that we ran into was
that NSF funding was available for the physics portion but not for the astronomy
portion.
- Ways to engage projects seems successful – another approach might
be campus-level.
- Having done an Internet2 day at a campus, it is very difficult to engage with
someone unless they are involved in a project and they have encountered problems.
Campuses have so many diffuse needs but we haven’t been as successful
- You might engage a single point of contact on campuses to provide them
with information with so they can reach out to the researchers on campuses
to offer help with problems.
- The work with a research group really requires someone listening to what
the group is doing, seeing their needs and problems, and having discussions
with them efore bringing in experts.
- Would campuses be willing to pay to bring 3-4 people to a campus to help
a range of folks?
- Might offend network folks on campus.
- We can have some trainings that are done by video for a little less but
folks would be willing to pay to fly experts out sometimes.
- Doesn’t really scale.
- We need to find some method to ‘hide’ the network from the
end-user, eventually, so they don’t need to have any expertise; just
run their app and not have to worry about the pipelines and speed etc.
- The biggest barrier to having a transparent network for the end-user is
that, ironically, there is too much hidden from the network from the network
engineers. Re: swat team coming out, how much could they do without the time/effort
of the campus engineers?
- SWAT teams are stymied by things they cannot control – “the
part will be in next week,” or “the person you need to talk to
isn’t here,” etc.
- We talked about ‘virtual office hours’ – if people knew
what the hours were – or an email address for ‘interesting
network problems?
- What the Quilt was trying to do was getting adequate resources together
to solve problems
- Many pronged attack – virtual office hours, email list, regular meetings,
etc.
- Who isn’t complaining loud enough?
- There are many out there who have
either given up or don’t know who
to complain to, yet.
Return to Table of Contents: Bridging the Gap
Workshop Report
|