| Summer of Code Program |
[These are ideas for 2005. If you're interested in the Summer of Code, see project ideas for 2006 instead.]
Bulk Transport Projects | Rich Presence Projects
Get paid by Google to write open-source code this summer! The deadline for submitting an application was June 14, 2005; decision on applications already submitted should be reached by June 24, 2005. The following are ideas for student projects (other projects are possible, depending on individual interest) for Google's Summer of Code. Internet2 is proud to be a mentoring organization for the initiative.
To ask a question or to clarify anything, contact us. If you see nothing on the list below that is a good match for what you want to do, but if you think that Internet2 could mentor you, get in touch, too. (Contact information is below.)
To apply:
Q: Is project X taken?
A: This is a competitive proposal process. We don't even (yet)
see the applications filed with Google. We don't know how many
applications are filed for a particular project. Duplication of
effort is allowed; that is, if you make a strong case in your
proposal, it is likely to be funded regardless of the number of
competing proposals. What matters is the quality of your proposal,
not the number of other people who want to do the same thing. That
said, if you design your own project (the first ``project'' on the
list), you will have little competition, so there's almost no chance
that your proposal would be rejected only because of the number of
other similar proposals. To repeat: just write a good proposal and
put it in. If two projects interest you, do so for two projects.
Q: My university is not an Internet2 member (usually because
I am not in the U.S.). Can I still apply?
A: Definitely. Google sets the rules of who is eligible, and
there's nothing about Internet2 membership in the rules. You won't,
in any way, be at a disadvantage.
Q: Do I need to check with you before I file the application
with Google?
A: Feel free to check, but it is not required. If you have no
questions, there is little to discuss at this stage.
Unfortunately, we can't help you with writing of the proposal---that
part you'll need to do yourself. List of things you could include in
the proposal is above. If you do have any questions, don't be shy.
You can write a better application if you're clear about the project.
Q: I can't think of anything for item X on the list of
things to include in the proposal. What do I do?
A: Think again. Still nothing? Omit it and go on. Make the
proposal as detailed as you can, but don't obsess about items or
checklists. We need to know what you want to do and why you think you
can do it.
Q: I am a high-school student. Can I apply?
A: Sorry, but high-school students are not eligible. If
everything goes well this time, it's possible that Google might repeat
the exercise in the future when you're in college. Not this time,
though. You can still work on open-source code this summer; this will
improve your karma, programming skills, chances of getting any future
open-source stipends, and chances of getting a good job. (NB: this
answer should not be used as any grounds for speculation that there
might be a next time. We, as a mentoring organization, have no
inside information pertaining to this. Only Google would know.)
Read the document Design Space for a Bulk Transport Tool. Consider your abilities. Is there a portion of this you'd like to do? The Internet2 Bulk Transport working group would help any student working on the problem.
gettimeofday() system call involves two context switches
(to and from the kernel). With an appropriate library, one could have
a replacement call for obtaining absolute time entirely in user space,
eliminating roughly 10,000 cycles of overhead associated with
obtaining a timestamp through the kernel. In addition, lack of
context switch for getting the time means a decreased probability of
losing the execution context, which is important for applications
that, subsequently to obtaining a timestamp, send it through the
network or use it to create a record on disk.
The frequency of updates of the TSC register can drift slowly; in the short term, the drift is mostly caused by temperature changes, which, in turn, depend on the pattern of CPU use, and, as such, are not easily predicted. The frequency of updates can also change abruptly when the CPU frequency itself changes (due to, e.g., a power management event); note that CPU frequency change does not necessarily result in the change of frequency of TSC updates. Both of these situations need to be considered and covered. To accommodate frequency drift is a problem, in essence, identical to the one an NTP client with a single server solves; consequently, the solution will probably be similar. One complication is that NTP stores the time adjustments in the time itself; since TSC frequency or offset cannot be easily changed by a user-space application, the adjustments would need to be stored elsewhere in a conversion table. The second hurdle, frequency steps, should be relatively easy to detect with simple sanity checks; subsequent action needs to be decided upon.
The conversion table discussed above should be the same for all
processes running on the same machine; otherwise, processes with
different time views can produce results difficult to predict and
debug (consider, just as an example, the case of make).
In addition, it is advantageous to keep the conversion table around
and to continually refine the coefficients in the same mode NTP does.
This necessitates a daemon and a means to distribute its conversion
table, via some IPC mechanism, to all the processes that are using the
library.
A very limited and naive shim implementation of this idea can be
found in the source code for thrulay (below) in files
tsc.h and tsc.c. That implementation, of
course, lacks practically all of the features discussed above; however,
it might still be interesting to look at or perhaps even start with.
The library must, at minimum, work on Linux and FreeBSD. Other desirable platforms are Windows and Solaris.
This project involves programming in C and requires the implementor to learn about time synchronization loops. The project is self-contained and involves a fair amount of independent work under guidance.
Background reading: Design Space for a Bulk Transport Tool.
Let us define the noise to be measured. Suppose the signal we'd like to measure is network delay. Then the measurements obtained by comparing depart and arrive timestamps, minus the signal, is the noise.
The noise, of course, would depend on the test environment. A variety of environments would work best. However, in all cases the network delay (the signal) needs to be known---otherwise, it becomes impossible to separate from noise before the characteristics of noise are known. So, a back-to-back environment could work. What's more important is to vary the operating system and the load on the machines (it's the machines themselves that are the source of the noise; the rest, by definition, is signal).
The purpose of this project is to provide input for building the Internet2 bulk transport tool. Therefore, the amounts of data that traverse the link(s) during the test must be substantial and approach the level of saturation.
C, some understanding of statistics, standard sockets API.
thrulaySome clarification about the statistics part: Currently, in UDP mode, the only two things that are reported are loss percentage and the minimum delay. Other statistics that could be reported include reordering (perhaps using the n-reordering definition), duplication, median delay as well as other quantiles, and possibly the loss burst metrics.
Integration of the results of work done by the TSC timekeeping project (above) might help as well.
It would probably help to download the source and look at the TODO file as well as to search for the string ``FIXME'' in the source. This is all work that needs to be done.
This project involves programming in C and is a good way to take the plunge into network measurement.
Gaim and Adium are popular open-source multi-protocol instant messaging and presence clients. Unfortunately, neither supports presence and IM using the IETF's SIP and SIMPLE standards. This project, would create a SIP/SIMPLE plug-in for either Gaim or Adium. Page-mode messaging should be supported. Support (or at least strong design consideration) should be given to the Message Session Relay Protocol (MSRP). The Internet2 PIC working group will assist any student that takes on this project by providing interoperability testing and access to a reference SIMPLE presence agent implementation.
A wide variety of client-only and client/server calendaring applications exist (e.g. Outlook, Evolution, Chandler, WebDAV, CalDAV, CAP). Pick one or two and integrate with SER's presence agent (PA). This is an important step towards automated, rich presence, which could, for example, show a meeting's participants as "busy" for the duration of a meeting or a person as "in flight" while he is on a plane. (IM could also be send to participants to remind them about the meeting.) One possible implementation, would be to send PUBLISH messages to the presence agent. Any student taking on this project will be provided with access to a reference SIMPLE presence agent implementation.
The Internet2 PIC working group has built upon SER to create a SIMPLE presence agent that could be the nucleus of a campus/enterprise rich presence solution. We need a second, independently-developed open-source PA implementation. Create one, perhaps starting with the SIPfoundry code base.
Extend PlaceLab to use SIMPLE and RPID to PUBLISH location presence to a presence agent (PA).