Re: Ideas for improvements in packet parsing

Hi all,

I’ve been thinking how to implement this DataParser or PacketParser class. I looked at the implementation of QDataStream and it actually looks almost like the thing we need. It has a status property which might say ReadPastEnd but it does not allow to read from the end of the stream backwards, neither to create a sub-stream, from which one can try to read past the end without actually reading past the end of the sub-stream, i.e. the following data. There might be a way to call functions on the device() to make it seek to end in order to read bytes from the end of the array (as needed in QueryHits) but I don’t like the idea… Another issue is that if we use QDataStream we already have an operator for QByteArray for example, which we cannot just override. Furthermore, we have two types of QByteArrays – one is a zero-terminated C string, the other is just a byte array. And I kind of like the variant with read functions more than the one with operators.

Since I cannot think of a way to reuse QDataStream by means of inheritance or composition, I plan to fork the code of QDataStream and modify it to suite our needs for DataParser (or better DataReader?). If anybody has a better idea, please share it!

Regards,

Peter

Peter Dimov wrote:Hello everybody!

This is a continuation of the discussion we started with Atul regarding improvements in packet parsing. Before starting actual coding I’d like to present you my improvements ideas and hopefully get some useful feedback!

Attached is a file of about 100 lines which shows what I think will be an elegant way to parse a QueryHits packet and a GgepBlock packet extension. With the presented solution we wouldn’t need two functions (prepareReadPayload() and readPayload()) in order to parse a packet. Instead we only use parsePayload(). It calls functions of a PacketParser (suggestion for a better name?) object, which works on a QByteArray object representing the raw packet bytes. No packet structure validation is explicitly done anymore. Instead parsePayload() assumes the packet has correct structure and reads parsed data from the PacketParser object. If the PacketParser runs out of raw bytes it will return zero values and set an internal error state. The caller of parsePayload() would examine this error state after parsePayload() returns and in case of an error the packet will be marked as invalid.

This solution will probably half the size of the packet parsing and writing source code while at the same time increase readability! The trick is in the PacketParser class, which is quite similar to QDataStream but is specially designed for data parsing purposes. These would be some of its member functions:

            PacketParser (QByteArray rawBytes);

uchar       readByte();
QByteArray  readBytes (size_t count);
QByteArray  readCString();
quint16     readUInt16(); // optionally with a ByteOrder argument
quint32     readUInt32(); // optionally with a ByteOrder argument

PacketParser    subParser (size_t count); // allow some part of the underlying raw data to be parsed independently (see GgepBlock)
void            setReadingDirection (ReadingDirection); // ReadFromStart or ReadFromEnd (see QueryHits)

Some generic functions can be declared in a generic base class, e.g. DataParser and for Gnutella we could derive PacketParser, which adds some Gnutella specific types, like VendorCode for example or overwrites the default byte order. This means that the same base class can be reused when parsing BitTorrent packets or bencoded .torrent files.

Any ideas for improvements are welcome! Suggestions for more meaningful class and function names?

Regards,

Peter

Would you like to post a relpy?


This post is a reply to:
Ideas for improvements in packet parsing
Hello everybody! This is a continuation of the discussion we started with Atul regarding improvements in packet parsing. Before starting actual coding I’d like to present you my improvements ideas and (more...)

No follow-ups yet.