Homework must be submitted electronically to
<csci555@usc.edu>.
It should have a subject of “Homework
2.” You must submit ASCII text without
embedded formatting commands or markup. That means, among other things,
no postscript, no Microsoft Word, no PDF, no FrameMaker, no TeX, no
groff, no DocBook, no HTML, no XML, and no JavaDoc. Do not submit your
homework as an attachment to your e-mail. Do not base 64 encode it. Do
not Rot13 encode it. Plain ASCII text. You may PGP sign it, but are
not required to do so. Do not PGP encrypt it. If you submit something
that is not unmarked-up ASCII, it is functionally the same as turning
nothing in.
Homework turned in on the due date is not penalized, one day late is 25% off (that is the grade will be multiplied by 0.75), 2 days late is 50% off, and 3 days late is 75% off. No work will be accepted more than 3 days late. I will generally use the Date: line of the mail, but should the situation merit it, I am not above looking through mail system logs to confirm the submission time. I should not have to mention it, but forging a Date: line to avoid a late deduction is grounds for an F.
Do the work yourself. Computer science is a collaborative science, and I encourage you to talk over the ideas in the homework with other students. However, the final submission, that is, the text of the homework, must be composed individually by each student. If you hand in homework that is identical to another student, you risk failing the class. (In fact the only way that you would not fail the class in such a circumstance would be if one student had copied another student without the knowledge of the copied student; the copied student would not be penalized.) That is an awfully large risk for 10% of your total grade. Do the work yourself.
As with all work for csci555, this work is subject to the USC code of Student Conduct. Read it, learn it, live it. Should you have any questions on how to apply the code, do not hesitate to contact me or the Office of Student Judicial Affairs and Community Standards. Should it prove possible, do not plagiarize work from sources outside the class. Plagiarizing homework is grounds for failing the class. It is perfectly all right to properly cite external sources, should you find some that are useful.
Answers will not be graded on their beauty of expression. Answers will be graded on whether they show a logical approach and sensible explanation. Short, simple sentences are fine. What is important is that your ideas are clear to the reader, and that they answer the question. Of course, no answer will be penalized because it is beautifully expressed, either.
Each question has equal weight.
| 1. | In general, attribute-based names are more difficult to resolve than hierarchical names. Explain why. Given that resolution is more difficult, what property makes attribute-based names attractive? You may find an example helpful in your answer. |
| Answer: | Hierarchical names are easier to resolve because there is an overriding order to the components of the name being resolved. This allows the implementation to partition information along the lines of the hierarchy, for example by storing all DNS names in the same domain or all files in same directory together. On the contrary, attributes are not inherently ordered, the storage plan and resolution algorithms need to be open to more possibilities. Attribute based names are effective because they are descriptive without ordering constraints. When naming objects that can be described many ways, or along many axes, attribute-based names are very helpful. |
| Grading: | 5 points for each explanation. Give a little more room on these, because these are difficult concepts to quantify. The key point to note is that hierarchy imposes an ordering and that ordering can be exploited for performance, but may be limiting in some applications. |
| 2. | Consider an alternative implementation of the tracer in Connections.[Soules05] Rather than collecting real time information, the tracer walks the file system and collects access times from the files and correlates them. Explain an advantage and a disadvantage of this implementation. |
| Answer: | The big advantage is that real-time tracing of file system accesses is intrusive and somewhat difficult to implement, eliminating that removes run time performance impact of the system and simplifies it. A big disadvantage is that the information recorded in the filesystem is less detailed. One cannot tell which process made the recorded access or detect access types that the filesystem does not record. As a result, false associations may be detected, or real connections overlooked. |
| Grading: | 5 points for each explanation. These are not exhaustive lists of possible advantages/disadvantages. |
| 3. | The Weighted Voting[Gifford79] paper mentions that the system can continue to operate in the face of representative failures. Explain more specifically how representative failures affect the ability to read or write files. |
| Answer: | The key is how failures affect the ability of readers and writers to gather quorums. Each file must have at least a read quorum's worth of votes at functional representatives to allow reads to proceed. For writes the situation is more complex. A write requires a previous read to acquire the version number of the file that must be captured in the write quorum. Users will be unable to write a file if either quorum cannot be acquired. It's also worth considering the possibility that a failure takes down all copies of a write before the write is propagated. There are cases where it is possible to hold a write quorum that only covers one physical system, e.g., when a write quorum is 1 or when a particular representative has enough votes to fulfill the write requirements. If a write is made to such a representative and that representative fails before the file changes can be propagates, inconsistencies can arise. |
| Grading: | 5 points for noting that reads cannot continue with out a read quorum worth of active votes and a write cannot proceed without a write quorum worth of votes being up. 3 points for noting that a write requires both, and 2 more for noting the single site failure possibility. |
| 4. | In the Domain Name System[Mockapetris87], the amount of time that client data can be inconsistent is affected by the caching time-to-live (TTL) values. These are under the control of the sub-domain administrator. Explain how the administrator of a sub-domain can modulate to TTL to reduce the impact of a large change to the sub-domain when the timing of the change is known in advance. |
| Answer: | The administrator can change the TTLs over time. While the domain is known to be stable, the TTLs can be large, allowing a lot of caching. As the change time approaches, the admin can reduce the TTLs, reducing caching (and increasing server load), but making clients more responsive to change. After the changes have been made, the TTLs can be re-increased. Notice that the administrator does have to plan ahead. If the TTLs are not reduced far enough in advance that all the cached copies at clients have been discarded, old, inconsistent values may stay in the system. |
| Grading: | 7 points for the basic TTL adjusting mechanism. 3 more for making it clear that old data must be purged, and that only time and a change of TTL will do that. |
[Soules05] “Connections: using context to enhance file search,” 20th ACM symposium on Operating systems principles, ACM Press, 2005, 119-132,
[Gifford79] “Weighted Voting for Replicated Data,” Seventh Symposium on Operating Systems Principles, ACM, December 1979, 150-162,
[Mockapetris87] “Domain Names - Concepts And Facilities,” RFC-1034, November 1987,