LOGIWEB(7) File formats and protocols LOGIWEB(7)NAMElogiweb - Logiweb protocol
DESCRIPTION
Logiweb is a system for storing, locating, and transmitting Logiweb
pages. Logiweb pages may contain free text mixed with machine intelli‐
gible objects like computer programs, testsuites, and formal proofs.
Logiweb defines a referencing scheme in which each Logiweb page has a
unique Logiweb reference. A Logiweb reference is typically around 30
bytes long. A Logiweb reference contains, among other, a RIPEMD-160
hash key of the referenced page.
The purpose of the Logiweb protocol is to locate a Logiweb page, given
its reference.
To maximize efficiency, the Logiweb protocol was originally intended to
be a protocol of its own, using its own UDP port.
As application for a UDP port turned out to be too complicated, how‐
ever, the Logiweb protocol will be channeled through http instead. A
new protocol will be defined based on the protocol defined below, c.f.
logiweb(5). The present document is included as logiweb(7) until the
new protocol becomes available.
PROTOCOL DEFINITION
Internet Draft K. Grue
<draft-grue-logiweb-protocol-1-00.txt> Associate Professor
Category: Experimental DIKU
Expires September 8, 2007 March 8, 2007
Logiweb Protocol Version 1
<draft-grue-logiweb-protocol-1-00.txt>
Status of this Memo
Distribution of this memo is unlimited.
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups
may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
When publishing mechanically verified mathematics on the Internet,
there is a need for referencing previously published documents. As an
example, referenced documents may contain needed definitions, lemmas,
and proofs. References from one mechanically verified document to
another is much like any other Uniform Resource Locator, but there is
a need to ensure that referenced documents do not change after
publication. This is so because otherwise a change of e.g. a
definition in a referenced document could invalidate the correctness
of a referencing document.
The present document describes the protocol used by an experimental
system named "Logiweb" which allows to publish mechanically verified,
immutable mathematical documents.
Table of Contents
1. Introduction ....................................................3
2. Protocol ........................................................4
2.1. Cardinals ..................................................5
2.2. Identifiers ................................................6
2.3. Logiweb identifier .........................................6
2.4. Timestamps .................................................6
2.5. Ping requests ..............................................7
2.6. Pong responses .............................................7
2.7. Event responses ............................................8
2.8. Nop requests ...............................................8
2.9. Prefix messages ............................................9
2.10. Vectors ...................................................9
2.11. Get messages .............................................10
2.12. Server states ............................................11
2.13. Server states are binary trees ...........................12
2.14. The type attribute .......................................12
2.15. The update attribute .....................................13
2.16. The left and right attributes ............................13
2.17. The sibling attribute ....................................13
2.18. The url attribute ........................................15
2.19. The leap attribute .......................................16
2.20. Other attribute classes ..................................17
2.21. The initial state ........................................17
2.22. Got messages .............................................18
2.23. Put messages .............................................20
3. Security Considerations ........................................21
3.1. Unwanted outgoing information .............................21
3.2. State corruption ..........................................22
3.3. Incoming denial-of-service attacks ........................23
3.4. Outgoing denial-of-service attacks ........................23
4. IANA Considerations ............................................24
4.1. Well Known Port 332 .......................................24
4.2. MIME type application/prs.logiweb .........................24
5. References .....................................................25
5.1. Normative References .........................................25
5.2. Informative References .......................................25
1. Introduction
This document defines the 'Logiweb protocol' version 1.
Logiweb is a system for publication of immutable documents of high
typographic quality which contain computer programs and mathematical
definitions, theorems, and proofs [Logiweb].
To understand the Logiweb protocol, only the following features of
the Logiweb system are needed:
o A Logiweb document is a sequence of bytes. A Logiweb document
consists of a version number followed by a RIPEMD-160 hash key
[RIPEMD] followed by a time stamp followed by a sequence of
bytes.
o Any Logiweb document has a 'Logiweb reference'. The reference
is a sequence of bytes. The reference of a document is the
version number followed by the hash key followed by the time
stamp of the document.
o It is assumed (c.f. the section on security considerations
later in this document) that any two Logiweb documents with the
same reference are identical. This is ensured by the RIPEMD-160
hash key in all probability.
o To be considered 'published', a Logiweb document must be
accessible using the World Wide Web (WWW). A published Logiweb
document may be mirrored such that it is available under more
than one Uniform Resource Locator (URL) [RFC3986]. A published
Logiweb document may be moved and copies of it may be deleted
such that the set of URLs associated with a Logiweb document
may change with time.
o The Logiweb system comprises Logiweb 'servers' and Logiweb
'clients'.
o A Logiweb server is a running computer program which
communicates with Logiweb clients and other Logiweb servers
using the Logiweb protocol, and which provides the services
described in the following.
o A Logiweb client is a running computer program which
communicates with Logiweb servers using the Logiweb protocol,
and which uses the services described in the following.
The main task of Logiweb servers is to keep track of the relationship
between Logiweb references and their associated fluctuating set of
URLs. The main service provided by Logiweb servers is to translate
Logiweb references to URLs. All Logiweb servers on the Internet shall
cooperate on this.
As mentioned above, a Logiweb document must be accessible using the
WWW to be considered 'published'. In addition, the URL of at least
one copy of the document must be known to at least one of the
cooperating Logiweb servers.
As secondary services, a Logiweb server can identify itself as a
Logiweb server, it can tell what time it is according to the servers
clock, and it can tell what leap seconds have occurred.
Logiweb servers are not supposed to deliver Logiweb documents. They
are merely supposed to translate Logiweb references to URLs. The
actual delivery of Logiweb documents is supposed to be performed by
http servers.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. Protocol
The Logiweb protocol defines the syntax and semantics of 'Logiweb
messages'. Logiweb messages are the units of exchange when using the
Logiweb protocol.
The Logiweb protocol is an application layer protocol. The Logiweb
protocol can build on top of connection-based transport protocols
like TCP [RFC0793] as well as datagram protocols like UDP [RFC0768].
When using a datagram protocol, each datagram contains one and only
one Logiweb message. When using a connection-based protocol, Logiweb
messages are transmitted back-to-back.
The Logiweb protocol specifies that some messages require a response
and some do not. However, as an overall rule, whenever an
application receives a datagram containing a Logiweb message, the
application is allowed not to respond. Furthermore, whenever an
application receives Logiweb messages over a connection-based
transport, the application is allowed to close the connection at any
time.
Applications should respond to message which require a response
unless they have a reason for not doing so. Reasons for not
responding to a datagram or closing a connection could be that the
application is short of outgoing bandwidth, that the application
thinks it is suffering a denial-of-service attack, or that the
application thinks that the other end of the communication is broke
or malicious.
Furthermore, whenever an application receives a message which
requires a response, the application is allowed to respond with a
Logiweb 'Sorry' message. A 'Sorry' message indicates that the
application is unwilling to answer at the given time but may be
willing to answer later if the same question is sent again.
Whenever an application receives a message which requires a response
via a connection-based protocol, the application is required to
respond properly OR respond with a 'Sorry' message OR disconnect.
Not responding is no option when using a connection based transport.
The syntax of Logiweb messages is expressed in ABNF [RFC4234] in the
following. The semantics is expressed in clear.
2.1. Cardinals
middle-septet = %d128-255
end-septet = %d000-127
cardinal = *middle-septet end-septet
A middle-septet X represents the number X minus 128. An end-septet X
represents the number X. A cardinal represents a non-negative
integer using little endian base 128. As an example, the cardinal
129 002
represents the non-negative integer 1 + 128 * 2 = 513. The cardinal
129 130 000
also represents 513.
2.2. Identifiers
x00 = %d000 / %d128 x00
x01 = %d001 / %d129 x00
x02 = %d002 / %d130 x00
x03 = %d003 / %d131 x00
x04 = %d004 / %d132 x00
x05 = %d005 / %d133 x00
x06 = %d006 / %d134 x00
x07 = %d007 / %d135 x00
The syntax class x03 covers all cardinals whose value is three. The
other syntax classes are similar.
2.3. Logiweb identifier
L = %d204
o = %d239
g = %d231
i = %d233
w = %d247
e = %d229
b = %d226
id-version = %d001
id-Logiweb = L o g i w e b id-version
Logiweb applications use id-Logiweb for indicating that they use the
Logiweb protocol. Note that the 204 is a middle septet which
represents the number 204 - 128 = 76 which is the Unicode [Unicode]
of a Latin capital letter L.
2.4. Timestamps
timestamp = mantissa exponent
mantissa = cardinal
exponent = cardinal
Logiweb measures the time at which an event occurred in 'Logiweb
time'. Logiweb time measures the number of seconds that have elapsed
according to International Atomic Time (TAI) since TAI 00:00:00 of
Modified Julian Day (MJD) 0.
For information, MJD 0 is November 17, 1858 in the Gregorian
calender. TAI 00:00:00 of MJD 0 equals Universal Coordinated Time
(UTC) 00:00:10 of MJD 0 since, by convention, TAI and UTC were 10
seconds apart before June 30, 1972. In short, UTC is a time scale in
which it is noon when Greenwich is under the sun.
A timestamp consists of two cardinals, M and E and represents the
number M*10^(-E) where u^v denotes u raised to power v. As an example
129 002 009
denotes 513 nanoseconds past the Logiweb epoch where the Logiweb
epoch is TAI 00:00:00 of MJD 0.
2.5. Ping requests
message = ping
ping = id-ping
id-ping = x02
A ping request represents the question 'who are you, and what time is
it'.
2.6 Pong responses
message =/ pong
pong = id-pong id-Logiweb timestamp
id-pong = x03
A Logiweb server which receives a ping request shall do one of the
following:
o Respond by a pong message containing the current time.
o Respond by a 'Sorry' message.
o Avoid responding if the ping is transported by a datagram.
o Disconnect if the ping is transported by a connection-based
transport.
Logiweb servers are supposed to respond to ping requests. Logiweb
clients should consider the other end of the connection as broke if
it receives a ping request.
Logiweb applications SHALL NOT respond to pong responses.
2.7. Event responses
message =/ event
event = id-event notice
id-event = x01
notice = sorry / received / rejected
sorry = x00
received = x01
rejected = x02
A 'sorry' response indicates that the sender has received a request
which it is unwilling to answer at the given time but may be willing
to answer later. Logiweb applications are allowed to send a 'sorry'
response to any request which requires a response.
A 'received' message indicates that the sender acknowledges the
receipt of a request but is not going to give any further answer. A
'received' message is the proper response to the 'put' request
described later.
A 'rejected' message indicates that the sender acknowledges the
receipt of a request but will not and never will answer that
particular request. Logiweb applications may respond by a 'rejected'
message only when they receive a malformed request.
Logiweb applications SHALL NOT respond to event responses.
2.8. Nop requests
message =/ nop
nop = id-nop
id-nop = x00
Logiweb applications SHALL NOT respond to nop requests. Nop requests
may be used for padding when using connection-based transports. There
is no point in sending nop datagrams. Applications are allowed to
disconnect connection-based transports at any time, so even though
applications are not allowed to respond to nop requests, they may
still disconnect on a 'nop' without violating the protocol.
2.9. Prefix messages
message =/ prefix
prefix = id-prefix code contents
id-prefix = x07
code = cardinal
contents = message
Whenever a Logiweb application receives a prefix message, it shall
process the contents of the message. If the application responds to
the contents, it shall prefix the given code to the response.
Example: Suppose an application receives a ping with two prefixes:
007 100 007 101 002
Furthermore, suppose the application decides to respond with a
'sorry' message. Then the response should be:
007 100 007 101 001 000
Because of prefixes, messages can be arbitrarily long. Messages are
typically less than 100 bytes in length. Applications are suggested
to process message that are up to 65536 bytes long. When receiving
messages longer than that, applications are suggested to disconnect
if the message is received over a connection-based transport and to
discard if the message is received as a mammoth datagram.
2.10. Vectors
vector = length bytes
length = cardinal
bytes = *byte
byte = %d0-255
A vector represents a list of bits. The given length is the number of
bits in the list. The syntax of vectors is NOT context free since the
number of bytes must be equal to the given length divided by eight
and rounded up to the nearest integer. As an example,
012 128 015
represents a list comprising twelve bits. The length field occupies
the first byte. Twelve divided by eight and rounded up equals two,
indicating that the next two bytes are part of the vector.
The vector 012 128 015 is translated to a list of bytes as follows.
First write the bytes in binary big endian:
1000 0000 0000 1111
Then bit swap each byte:
0000 0001 1111 0000
Then pick the first twelve bits:
0000 0001 1111
Sane programmers don't bit swap. Sane programmers realize and utilize
that Logiweb is little endian.
2.11. Get messages
message =/ get
get = id-get address class index
id-get = x04
address = vector
class = update / type / left / right
class =/ sibling / url / leap
update = x00
type = x01
left = x02
right = x03
sibling = x04
url = x05
leap = x06
index = cardinal
Logiweb servers are supposed to maintain a 'state' which is visible
from the outside. Clients and other servers may query the state of a
Logiweb server using get messages. A get message requests a Logiweb
server to return the 'attribute' which the server associates to the
given address, class, and index.
A Logiweb server has no other visible state than what can be queried
using get messages.
2.12. Server states
The state of a server is a function which, given an address and a
class, returns a list of attributes. Addresses and classes were
defined in the previous section. An attribute consists of a timestamp
and a value where the value is a vector as defined in Section 2.10.
Server states may change with time. When a server receives a 'get'
message as described in the previous section, it responds with a
'got' message as described later. The contents of the 'got' message
reflects the server state at the time the 'get' is processed by the
server.
The server state may change at any time. Processing of each 'get'
message is atomic, but the server state may change between any two
'get' messages.
The server state can only change in two ways: an attribute may be
added or an attribute may be removed. Whenever an attribute is
removed, it is removed from the list of attributes it belongs to
without reordering the remaining attributes on that list. Whenever an
attribute is added, it is added at the end of an attribute list. For
that reason, all attribute lists are chronological with the oldest
attribute first.
Every attribute comprises a timestamp and a value. The value is an
arbitrary vector. The timestamp indicates at what time the given
attribute was added to the server state.
A get message with address A, class C, and index I requests the I'th
oldest attribute with address A and class C. The oldest attribute has
index one. A get message with index zero or an index larger than the
number of attributes with the given address and class requests the
newest attribute with the given address and class.
2.13. Server states are binary trees
As mentioned, the state of a server is a function which, given an
address and a class, returns a list of attributes. Addresses are bit
vectors. We shall refer to all attributes with a given address on a
given server as the 'node' at that server at that address.
We shall refer to the empty list of bits as the 'root address' and to
the node with that address as the 'root node'. For all addresses A,
we refer to A with a zero or one bit added at the end as the 'left'
and 'right subaddress', respectively. For non-empty addresses A, we
refer to the A with one bit removed at the end as the 'super-address'
of A.
As an example,
1110 is the left subaddress of 111
1111 is the right subaddress of 111
11 is the superaddress of 111
We shall say that a a server 'has a node with address A' if its state
contains at least one attribute with address A.
A server state is a binary tree in the sense that whenever a server
has a node N1 with non-empty address A then it also has a node N2
whose address is the superaddress of A. We shall refer to N2 as the
supernode of N1.
If a server has a node N with address A, then we shall refer to N as
a 'leaf' node if the server has no nodes whose addresses are the left
or right subaddresses of A. We shall refer to N as a 'branch' node if
the server has nodes for both the left and the right subaddress of A.
Server states only contain leaf and branch nodes. A server state
cannot contain a node that has a left but not a right subnode or vice
versa.
2.14. The type attribute
Every node of a server contains exactly one attribute of class 'type'
(i.e. of class 1). The value of that attribute is the empty bit
vector if the node is a leaf node. The value is a one-element bit
vector whose sole bit is a one-bit if the node is a branch node. The
time stamp of the attribute equals the time at which the node was
created or last changed type.
2.15. The update attribute
Every node of a server has six attributes of class 'update' (i.e. of
class 0). The six update attributes have values 1, 10, 11, 100, 101,
and 110, respectively. The timestamps of those attributes are as
follows:
1 Identical to the timestamp of the 'type' attribute.
10 The time of the last change in the left subtree of the node.
11 The time of the last change in the right subtree of the node.
100 The time of the last change in the 'sibling' attribute list of
the node
101 The time of the last change in the 'url' attribute list of the
node
110 The time of the last change in the 'leap' attribute list of the
node
The time stamps for the update attributes with value 10 and 11 equal
the timestamp of the 'type' attribute for leaf nodes. The time stamps
for the update attributes with value 100, 101, and 110 equal the
timestamp of the 'type' attribute if the node never has had
attributes of class 'sibling', 'url', or 'leap', respectively.
Contrary to other attribute lists, update attribute lists may contain
several attributes with identical timestamps. That occurs when a
single addition or deletion of an attribute has consequential
changes. Among other, all update attributes are set to the current
server time when a node is created.
2.16. The left and right attributes
Server states have no attributes of class 2 (left) or 3 (right).
These two classes only occur as values in update attributes.
2.17. The sibling attribute
Two nodes with the same address on different servers are 'siblings'.
A 'branch sibling' of a node is a sibling which is at the same time a
branch node. Sibling attributes of a node are references to servers
that store branch siblings of the given node.
The value of a sibling attribute is a byte vector, i.e. a bit vector
whose length is a multiple of 8. The bytes part of the bit vector may
have a value like
"udp/logiweb.eu/65535/http://logiweb.eu/logiweb/server/relay/"
The string above contains 60 characters and, hence, 480 bits. For
that reason its encoding is
224 003 117 100 112 047 108 ...
Above, the middle-septet 224 represents 224-128=96 and the length
field 224 003 represents 96+128*3=480. The number 117 is a Latin
small letter u as in "udp". The little-endian nature of bit vectors
has no observable effect here.
In general, sibling attributes have form
protocol "/" host "/" port "/" relay
The protocol may be 'tcp' or 'udp'. The host and port identify the
Logiweb server. The relay must be an URL [RFC3986].
The purpose and function of a 'relay' is outside the scope of the
present document. For information, however, a relay is a special
Logiweb client which runs as a CGI-program [CGI]. If a relay is
invoked with a path of '/64/...' or '/32/...' or '/16/...' where the
dots express a Logiweb reference expressed base 64, 32, or 16, then
the relay contacts a Logiweb server to get the reference translated
to an URL and returns an indirection to that URL. As an example,
looking up http://logiweb.eu/logiweb/server/relay/64/... in a web
browser is supposed to open the Logiweb document with the given
reference. Looking up e.g.
http://logiweb.eu/logiweb/server/relay/64/.../2/index.html is
supposed to do the same but then to back up 2 slashes and then add
index.html.
Logiweb relays typically have further facilities. At the time of
writing, the relay at http://logiweb.eu/server/relay contains a self-
documenting interface to a Logiweb server which allows any user to
experiment with the protocol described in the present document. The
given relay was the first Logiweb relay established on the Internet
and is supposed to exist as long as Logiweb itself exists.
Logiweb relays will not be mentioned any more in the present
document.
We shall refer to sibling attributes as sibling pointers. Sibling
pointers are said to be 'valid' if they point to servers which store
a branch sibling of the given node. A sibling pointer is said to be
'dangling' otherwise. Hence, a sibling pointer is dangling if the
server pointed to stores no sibling of the given node. Furthermore, a
sibling pointer is dangling if the server pointed to does store a
sibling but that sibling is a leaf node.
A server SHALL try its best to avoid dangling pointers. No server can
be perfect here because the state of other servers may change without
notice. But a server is supposed to validate its sibling pointers
regularly.
Furthermore, each server SHALL try its best to populate all its nodes
with sibling pointers. The only excuse for not populating a node with
sibling pointers is if no Logiweb server in the world stores a branch
sibling of the given node.
Finally, each server SHALL do its best to ensure that all branch
siblings in the world of each node of the server are reachable from
the node by following sibling pointers. This is even more difficult
to satisfy than the two previous requirements, however, since not
only may other server states change without notice but, furthermore,
no server has any control over any other server. So, servers are
basically required to be resonable and cooperative.
2.18. The url attribute
The address of a node is a bit vector. A Logiweb reference is also a
bit vector. If the address of a node is a valid Logiweb reference
then the url attributes of the node shall be Uniform Resource
Locators (URLs) [RFC3986] of Logiweb documents with the given
reference.
Url attributes of nodes whose addresses are not valid Logiweb
references are reserved for future extensions.
2.19. The leap attribute
Only root nodes have leap attributes. Each leap attribute indicates
the location of a leap second. Leap attributes are byte vectors, i.e.
bit vectors whose length is a multiple of eight. Leap attributes have
format
leap = step mjd
step = cardinal
mjd = cardinal
Each leap second occurs at the end of a UTC day (i.e. at midnight in
Greenwich). The mjd field indicates which Modified Julian Day (MJD)
is affected by the leap. The step is 1 if that day is prolonged by
one second. The step is 2 if that day is shortened by one second.
Hence, step is 1 for a +1 leap and 2 for a -1 leap. If the
International Earth Rotation Service (IERS) ever decides to make
multiple leaps, the relationship is intended to be as follows:
step 0 1 2 3 4 5 6 ...
leap 0 +1 -1 +2 -2 +3 -3 ...
IERS only intends to use leaps of +1 and -1. Leaps of -1 have never
occurred and maybe never will. IERS intends to let leaps occur at the
end of June 30 and December 31. IERS intends to announce leaps in
advance. Leaps affect the length of the last minute of the last hour
of the affected UTC day.
As for all other attributes, the timestamps of leap attributes
indicate the time at which the attribute entered the state of the
server. At startup, a server is likely to read leap second
information from a configuration file or fetch it from another
Logiweb server. Servers should arrange leaps chronologically with the
oldest leap first.
Leap attributes shall comprise all past leaps announced by the IERS.
Leap attributes should comprise all past and future leaps announced
by the IERS. In other words, newly announced leaps shall enter the
state before the leap occurs.
2.20. Other attribute classes
Only attributes of class 0, 1, 4, 5, and 6 may occur in server
states. Attribute class 2 and 3 never will occur in server states.
Attribute class 7 is reserved for information about which future
classes a server supports. Class 8-15 are reserved for experiments.
Classes from 16 to 2^160-1 inclusive are reserved for first come
first served classes. Classes from 2^160 and up are reserved for
classes based on the value of Logiweb references. Only class 0, 1, 4,
5, and 6 are permitted according to the present document.
2.21. The initial state
When a server starts up, its state contains one node. That node is a
root node and it contains seven attributes: one 'type' attribute and
six 'update' attributes. The value of the 'type' attribute is the
empty bit vector indicating that the root node is a leaf. The values
of the update attributes are 1, 10, 11, 100, 101, and 110. All seven
timestamps are equal and indicate the time at which the root node was
created.
We shall refer to sibling, url, and leap attributes as 'proper'
attributes. After creation of the root node, the state is changed by
adding and removing proper attributes. Update and type attributes
only change as a consequence of adding and removing proper
attributes. At any time, the server must contain the least number of
nodes which are enough to contain the stored proper attributes. For
that reason, removing a proper attribute may cause an avalanche of
node deletions and adding a proper attribute may cause an avalanche
of node creations.
When adding a proper attribute, the timestamp of all consequential
changes must be equal to the timestamp of the new attribute which in
turn must reflect the time at which the attribute was added. When
removing a proper attribute, all consequential changes must have the
same timestamp and that timestamp must reflect the time at which the
attribute was removed. The timestamps of successive additions and
removals of proper attributes must be strictly increasing. If the
resolution of the server clock is insufficient for that, then the
server must fake a higher resolution.
Consequential changes may involve changing the value of update and
type attributes. Such changes shall be treated as a simultaneous
removal of the old attribute and addition of a new one such that the
new attribute appears at the end of its attribute list.
2.22. Got messages
message =/ got
got = id-got address class index
norm count timestamp value
id-got = x05
norm = cardinal
count = cardinal
value = vector
A Logiweb server which receives a get request shall do one of the
following:
o Respond by a got message as described later in this section.
o Respond by a 'Sorry' message.
o Avoid responding if the get is transported by a datagram.
o Disconnect if the get is transported by a connection-based
transport.
Logiweb servers are supposed to respond to get requests. Logiweb
clients should consider the other end of the connection as broke if
it receives a get request.
Logiweb applications SHALL NOT respond to got responses.
If a Logiweb server responds with a 'got' response to a 'get'
request, then the 'got' response shall reflect the state of the
server at the time the 'get' is processed. The address, class, and
index of the 'got' response shall be identical to the address, class,
and index of the associated 'get' request. The norm, count,
timestamp, and value shall be as follows:
CASE 1: the state contains an attribute with the given address,
class, and index. The norm shall be the length of the address. The
count shall be the number of attributes in the state that have the
given address and class. The timestamp and value shall be the time
stamp and value, respectively, of the attribute with the given
address, class, and index.
CASE 2: the state contains an attribute with the given address and
class, but none with the given index. The norm shall be the length of
the address. The count shall be the number of attributes in the state
that have the given address and class. The timestamp and value shall
be the time stamp and value, respectively, of the attribute with the
largest index of the given address and class.
CASE 3: the state contains an attribute with the given address, but
none with the given class. The norm shall be the length of the
address. The count shall be zero. The timestamp shall be the current
server time. The value shall be the empty bit vector.
CASE 4: the state contains no attributes with the given address. In
this case, let A2 be the longest prefix of the given address for
which the state does contain an attribute.
CASE 4A: the state contains a sibling attribute with address A2. The
norm shall be the length of A2. The count shall be the number of
sibling attributes in the state that have address A2. The timestamp
and value shall be the time stamp and value, respectively, of a
randomly picked attribute with address A2 and class sibling.
CASE 4B: the state contains no sibling attributes with address A2.
The norm shall be the length of A2. The count shall be zero. The
timestamp shall be the current server time. The value shall be the
empty bit vector.
CASE 4A covers the case where the given server is unable to answer
the given question (the one encoded in the get request), but is able
to refer to some other server which stores a branch node with address
A2. In other words, CASE 4A covers the case where a server can refer
to a server more knowledgeable on the given question.
CASE 4B covers the case where the given server is unable to answer
the given question and unable to refer to a server which stores a
branch node with address A2. Logiweb servers SHALL try their best to
avoid CASE 4B in cases where there exists a server which has a branch
node with address A2. No server can be perfect here, however, since
all states of all other servers may change without notice. But
servers are required to crawl Logiweb to ensure they have a plentiful
supply of sibling attributes for all their nodes.
Clients who need e.g. to translate a Logiweb reference R into an URL
are supposed to issue a get message with address R, class URL, and
index 0. When the client receives a got message whose norm equals the
length of R, it uses the returned URL (if any). If the client
receives a got message whose norm is less than the length of R, it
resends to get request to the indicated sibling (if any). At each
redirection, the norm is supposed to increase. If the norm does not
increase, then the state of the penultimate server is outdated. In
this case, the client may as a courtesy send the penultimate server a
'put' message which tells the server to remove its dangling sibling
pointer. Put messages are described later.
When a server or a client crawls Logiweb, it may do so iteratively.
As an example, a client may remember when it last visited a given
server. Next time the client visits the server, it may start querying
the server time with a ping request. Then the client may find out
what has changed using update attributes without wasting time on
attribute classes and subtrees that have not changed since last.
Finally, the client may set its time of last visit to the response
from the initial ping.
Whenever such a client reads a changed attribute list, it should read
it in reverse chronological order. To do so, it may start with index
0 to get the newest attribute and the number C of attributes. Then it
may query index C minus one, C minus two, and so on in that order. If
attributes are removed between queries, then the client may receive
the same attribute more than once, but it will never miss an
attribute. For attributes other than update attributes, distinct
attributes have distinct timestamps, so the client can eliminate
duplicates on basis of timestamps.
2.23. Put messages
message =/ put
put = id-put address class operation value
id-put = x06
operation = remove / add
remove = x00
add = x01
A Logiweb server which receives a put request shall do one of the
following:
o Respond by a 'Received' message.
o Respond by a 'Sorry' message.
o Avoid responding if the put is transported by a datagram.
o Disconnect if the put is transported by a connection-based
transport.
Logiweb servers are supposed to respond to put requests. Logiweb
clients should consider the other end of the connection as broke if
it receives a put request.
A server which receives a put message whose operation is 'remove' may
consider to remove an attribute with the given address, class, and
value. The remove message contains no index since the index of an
attribute can decrease at any time because of removal of older
attributes on the same attribute list.
A server which receives a put message whose operation is 'add' may
consider to add an attribute with the given address, class, and
value. The add message contains no timestamp since the timestamp of
the new attribute should be set to the current server time rather
than being supplied.
A server should consider almost all put requests with almost infinite
suspicion. A put request could be forged to corrupt the state of a
server or could be forged to fool the server into participating in a
denial-of-service attack on some other Logiweb server or some other
service on the Internet. This is why a server only tells the sender
of a put request that the server has 'received' the request. It does
not reveal any information about what the server is going to do with
the request. Is is perfectly legitimate for a server to ignore all
put requests.
3. Security Considerations
3.1. Unwanted outgoing information
A Logiweb server provides information to the outside world through
pong responses, event responses, and got responses.
Pong responses identifies the server as a Logiweb server and tells
what time it is. The owner of a Logiweb server must be prepared to
share this information with the world.
Event responses (received, rejected, and sorry responses) tells the
world about the mood of the server. The owner of the server must be
prepared to share that as well.
Got responses tell the world about the publicly available state of
the server. In principle, the owner of the server should be prepared
to share that as well.
A Logiweb server, however, typically indexes given subtrees of the
owners web site. A Logiweb server typically does so by crawling the
file system of the host. In doing so, the server could find documents
whose existence the owner wants to keep secret, and then make the
existence of those documents publicly known. After that, the secret
documents may be retrieved from the owners web server.
As a countermeasure for that, Logiweb servers should only index files
with extension 'lgw' ('lgw' for 'logiweb'). Among those files, the
server should check that the first byte of the file contains the
number 1, and that the next twenty bytes contain the RIPEMD-160 hash
key of the remaining bytes of the file. That ensures with great
likelihood that only genuine Logiweb documents are indexed, avoiding
inadvertent indexing of other kinds of documents. Authors of Logiweb
documents who want their Logiweb documents to remain secret should
keep them out of reach of the local Logiweb server.
As another use of got messages, an attacker may use got responses to
figure out how the server reacts to put requests. Doing so, the
attacker may be able to find a security hole which allows the
attacker to fool the server to participate in a denial-of-service
attack on some other service. The ultimate countermeasure to this is
to let the server ignore all put messages. Otherwise, one must try to
avoid security holes in the server.
3.2. State corruption
Using put messages, an attacker may try to persuade a server to place
incorrect information in the server state. The ultimate
countermeasure to this is to let the server ignore all put messages.
Otherwise, a server should not react directly to put messages.
Rather, the server should repeatedly crawl its host file system to
keep its url attributes up to date and should repeatedly crawl
Logiweb to keep its sibling attributes up to date. In doing so, a
server could take a put message as a hint to crawl some particular
area earlier than it would otherwise do.
One source of put messages are notifications from inside the owners
firewall that some Logiweb document has been added to or removed from
the file system. To respond reasonably, servers are suggested to
classify sender IP's suitably in order to follow up more promptly on
put requests from more trusted senders. This only works, of course,
for sender IP's which an attacker cannot tamper with.
Even if a server is persuaded to place incorrect information in its
state, this will at most prevent clients from finding Logiweb
documents. If a server translates a reference into an URL, then the
client is supposed to retrieve the associated Logiweb document and to
verify using RIPEMD-160 [RIPEMD] that the retrieved document is the
one requested.
3.3. Incoming denial-of-service attacks
If a large number of clients start sending requests to a single
Logiweb server, the ingoing bandwidth of the server may get
saturated. To avoid saturating the outgoing bandwidth if this occurs,
the 'sorry' message has been included in the protocol. The 'sorry'
message allows the server to respond to incoming messages using
little bandwidth and little computational resources. Furthermore, the
protocol allows the server not to respond at all, which accounts for
messages lost due to limitation of ingoing bandwidth.
Logiweb clients should maintain a list of Logiweb servers, and if one
server does not respond or responds with a 'sorry', then the client
should switch to another Logiweb server.
3.4. Outgoing denial-of-service attacks
An attacker may launch an indirect denial-of-service attack by
sending requests to a Logiweb server whose sender field contain the
IP of the victim. To counter for that, the Logiweb protocol specifies
that each request can result in at most one response. In that way, an
attacker cannot use a Logiweb server to 'amplify' the attack.
Logiweb servers are supposed to crawl Logiweb on their own
initiative. Furthermore, put messages may suggest to Logiweb servers
that they should promote crawling of particular servers. An attacker
could use this to persuade a number of Logiweb servers to crawl one
victim simultaneously. To counter for that, the present document does
not specify exactly what a Logiweb server is supposed to do with put
messages. Furthermore, Logiweb servers should approach other servers
gently, waiting for their responses to see that the contacted servers
do respond and do not send out 'sorry' messages. Finally, Logiweb
servers should check that they actually do talk with Logiweb servers
and not with some innocent other service. Logiweb servers may do so
by sending a ping request to services whose identity they are not
sure of.
4. IANA Considerations
4.1. Well Known Port 332
The format of sibling attributes allows Logiweb servers to run on
arbitrary UDP and TCP ports. At present, Logiweb servers use UDP port
65535 by default.
To avoid making the use of port 65535 permanent, udp and tcp Well
Known Port 332 is requested to be registered.
Port number 332 is suggested because 332 = 256 + 76 where 76 is the
Unicode of Latin capital letter L, which is the first letter in
"Logiweb". On some occasions not covered in the present document, the
Logiweb system represents strings by numbers, in which case the one
character string "L" happens to be represented by the number 332.
Furthermore, port 332 is unassigned and appears at the end of an
interval of unassigned numbers so that assignment will not lead to
fragmentation.
Suggested port name: "Logiweb".
4.2. MIME type application/prs.logiweb
As mentioned, the main purpose of Logiweb servers is to translate
Logiweb references into an URL of an associated Logiweb document.
When looking up the URL of the Logiweb document, http servers
currently deliver the Logiweb document with MIME type application/x-
logiweb.
To avoid making the use of MIME type application/x-logiweb permanent,
MIME type application/prs.logiweb is requested to be registered.
The format of Logiweb documents is:
document = id-version ripemd timestamp contents
id-version = %d001
ripemd = 20*20 byte
contents = *byte
byte = %d0-255
For the syntax of timestamps, see the section entitled "Timestamps".
The ripemd field of a document must be the RIPEMD-160 hash key
[RIPEMD] of all bytes following the ripemd field (including the
timestamp).
The reference of a Logiweb document comprises the document with the
contents removed.
The description above of the contents as a sequence of bytes is
sufficient as far as the Logiweb protocol is concerned. A more
complete description may be found at
http://logiweb.eu/logiweb/doc/server/protocol.html#Pages.
5. References5.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", RFC 4234, October 2005.
5.2. Informative References
[Logiweb] http://logiweb.eu/ (see also Grue, K., "Logiweb - A System
for Web Publication of Mathematics", Mathematical Software
- ICMS 2006, Lecture Notes in Computer Science,
pp.343--353, vol.4151, Springer, 2006).
[CGI] http://www.w3.org/CGI/
[RIPEMD] Dobbertin, H., Bosselaers, A., and Preneel, B.,
"RIPEMD-160: A Strengthened Version of RIPEMD", Fast
Software Encryption, 71-82, 1996
[Unicode] http://www.unicode.org/
[RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768,
August 1980.
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC
793, September 1981.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66, RFC
3986, January 2005.
Authors' Address
Klaus Grue
DIKU
University of Copenhagen
Universitetsparken 1
DK-2100 Copenhagen
Denmark
email - grue@diku.dk
Full Copyright Statement
Copyright (C) The IETF Trust (2007).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST, AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed
to pertain to the implementation or use of the technology
described in this document or the extent to which any license
under such rights might or might not be available; nor does it
represent that it has made any independent effort to identify any
such rights. Information on the procedures with respect to
rights in RFC documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use
of such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository
at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention
any copyrights, patents or patent applications, or other
proprietary rights that may cover technology that may be required
to implement this standard. Please address the information to the
IETF at ietf-ipr@ietf.org.
Logiweb JULY 2009 LOGIWEB(7)