Discussion:
[p2p-hackers] q: open topics in DHTs and the like
ianG
2015-01-02 00:51:23 UTC
Permalink
Hi all and happy new year!

Request for comments: what are the open topics in DHTs?

I'm asking on behalf of a 4th year CS student preparing to do the final
year project; with possible extension into Masters. Having worked with
DHTs, the bug appears to have struck...

iang
Greg Troxel
2015-01-02 01:06:33 UTC
Permalink
I wonder what the standard adversary model is for DHTs, and what happens
when some significant fraction of participants are malicious.
Jae Kwon
2015-01-02 01:28:12 UTC
Permalink
I think the standard for most decentralized systems including DHT's and
consensus networks is in the "honest" vs "Byzantine" model, with no regards
to incentives. I think the incentive model is a recent thing since
Bitcoin, and it isn't well accepted.

In the incentive model, you might consider three groups of actors.
"Honest", "rational", and "byzantine". The rational my diverge from the
prescribed protocol for self gain, but the honest deviate.
Post by Greg Troxel
I wonder what the standard adversary model is for DHTs, and what happens
when some significant fraction of participants are malicious.
_______________________________________________
p2p-hackers mailing list
http://lists.zooko.com/mailman/listinfo/p2p-hackers
Christian Huitema
2015-01-02 03:23:48 UTC
Permalink
Post by ianG
Request for comments: what are the open topics in DHTs?
Top of my head, can think of two topics: anonymous participation, and hidden rendezvous.

Anonymous participation is the idea that you can participate in a distributed system without revealing that to third parties, or at least to third parties that you do not trust. Think of the structure of underground networks, where members only communicate with people they really trust, because every contact that you don't know about could work for the Gestapo, or the KGB, or the RIAA. This implies serious restrictions on the topology of the graph, which means different message routing protocols than classic DHT like Kademlia. It also require secure protocols to add/remove contacts, or to connect/reconnect to the graph. Freenet's "dark net" does lots of that, and the question is, what would it take to make it mainstream?

Hidden rendezvous is the idea of enabling something like Skype on a DHT, but hiding who communicates with whom. Suppose that I connect to the network anonymously, e.g. using a public Wi-Fi access point, and that you do the same. How do we find each other without revealing our location to the NSA and its peers? One potential solution is that we agree on a secret and derive from it a series of random numbers, say hash(secret, time-of-day, my-name). Then use that as a key to publish an IP address in the DHT. A bit clumsy, of course. Can you do better?

-- Christian Huitema
Michael Rogers
2015-01-02 12:02:01 UTC
Permalink
Sybil attacks, anonymous lookups and restricted topologies (including
social topologies) are all interesting areas with open problems.

Or go back to square one and ask what other pointer-based data
structures would lend themselves to distributed implementation...

Cheers,
Michael
Post by ianG
Hi all and happy new year!
Request for comments: what are the open topics in DHTs?
I'm asking on behalf of a 4th year CS student preparing to do the
final year project; with possible extension into Masters. Having
worked with DHTs, the bug appears to have struck...
iang _______________________________________________ p2p-hackers
http://lists.zooko.com/mailman/listinfo/p2p-hackers
Will Holcomb
2015-01-03 06:04:10 UTC
Permalink
I've been really intrigued by IPFS <http://ipfs.io>. It is essentially a
singly-rooted git tree distributed via bittorrent.

Something that this makes possible, is everyone publishing trees and
software intelligently combining those.

My version of this export directory exists as a git repo called tip
<https://github.com/wholcomb/tip/>.

My idea is a piece of data is available through many paths.

For example, I want .../book/by/Frederik Pohl/Gateway/
<http://dhappy.org/.../book/by/Frederik Pohl/Gateway/> and
.../book/award/Hugo/1978/winner/ <http://wholcomb.github.io/mimis/> to
point to the same location. Navigation could be, in part, winnowing of
possible completion paths.

It could be coupled with a signing system that allows individuals or
organizations to authoritatively publish nodes. So, different groups could
curate different subtrees.

All of these are then conglomerated so the ideal content is retrieved when
a user requests something.

-Will
Feross Aboukhadijeh
2015-01-03 19:32:21 UTC
Permalink
An important issue for the WebTorrent <https://github.com/feross/webtorrent>
project (building a browser torrent client) is getting the DHT to work
entirely in the web browser, over WebRTC. WebRTC has several constraints
that make implementing a DHT more difficult:

1. No succinct way to refer to a node like 'ip:port'.
2. For A to connect to B, they must be introduced by C who is connected to
both A and B. (In most WebRTC apps, C is a centralized signaling server
that all clients connect to, but could easily be another node in the DHT.)
3. Browser crashes caused by opening too many connections. (Current WebRTC
implementations have memory leaks and artificial limits on the number of
allowed connections)
Post by Will Holcomb
I've been really intrigued by IPFS <http://ipfs.io>. It is essentially a
singly-rooted git tree distributed via bittorrent.
Something that this makes possible, is everyone publishing trees and
software intelligently combining those.
My version of this export directory exists as a git repo called tip
<https://github.com/wholcomb/tip/>.
My idea is a piece of data is available through many paths.
For example, I want .../book/by/Frederik Pohl/Gateway/
<http://dhappy.org/.../book/by/Frederik%20Pohl/Gateway/> and
.../book/award/Hugo/1978/winner/ <http://wholcomb.github.io/mimis/> to
point to the same location. Navigation could be, in part, winnowing of
possible completion paths.
It could be coupled with a signing system that allows individuals or
organizations to authoritatively publish nodes. So, different groups could
curate different subtrees.
All of these are then conglomerated so the ideal content is retrieved when
a user requests something.
-Will
_______________________________________________
p2p-hackers mailing list
http://lists.zooko.com/mailman/listinfo/p2p-hackers
realcr
2015-01-04 12:49:56 UTC
Permalink
Hi ianG,
These are some subjects I thought about:

- Running a DHT despite the NAT problems. (While many present non-elegent
technical ideas to route around NATs, there might be a nice theoretical
solution).
- Navigation using Virtual DHTs (Like done in Cjdns. I think nobody
really knows to prove why it works, and whether it is going to scale).

- Creating a rigorous adversarial model for DHTs. (I think that we still
don't have one. Most articles on this subject resort to experimentation
because they can't prove correctness).

- Building a secure DHT.
- Dealing with Sybil Attack and Eclipse attack (Many corrupt nodes
choosing DHT identities close to some value).


I tried to solve some of these myself, mostly from the direction of mesh
routing. My efforts are documented online, here:
http://www.freedomlayer.org/articles_index.html

Regards,
real.
Post by ianG
Hi all and happy new year!
Request for comments: what are the open topics in DHTs?
I'm asking on behalf of a 4th year CS student preparing to do the final
year project; with possible extension into Masters. Having worked with
DHTs, the bug appears to have struck...
iang
_______________________________________________
p2p-hackers mailing list
http://lists.zooko.com/mailman/listinfo/p2p-hackers
ianG
2015-01-04 14:55:38 UTC
Permalink
Excellent, thanks! In summary (obviously there is substantial overlap):



1. Anonymous participation is the idea that you can participate in a
distributed system without revealing that to third parties, or at least
to third parties that you do not trust. Think of the structure of
underground networks, where members only communicate with people they
really trust, because every contact that you don't know about could work
for the Gestapo, or the KGB, or the RIAA. This implies serious
restrictions on the topology of the graph, which means different message
routing protocols than classic DHT like Kademlia. It also require secure
protocols to add/remove contacts, or to connect/reconnect to the graph.
Freenet's "dark net" does lots of that, and the question is, what would
it take to make it mainstream?

-- Christian Huitema, also mentioned by Michael Rogers




2. Hidden rendezvous is the idea of enabling something like Skype on a
DHT, but hiding who communicates with whom. Suppose that I connect to
the network anonymously, e.g. using a public Wi-Fi access point, and
that you do the same. How do we find each other without revealing our
location to the NSA and its peers? One potential solution is that we
agree on a secret and derive from it a series of random numbers, say
hash(secret, time-of-day, my-name). Then use that as a key to publish an
IP address in the DHT. A bit clumsy, of course. Can you do better?

-- Christian Huitema





3. Adversary Model and Incentives.

I think the standard for most decentralized systems including DHT's and
consensus networks is in the "honest" vs "Byzantine" model, with no
regards to incentives. I think the incentive model is a recent thing
since Bitcoin, and it isn't well accepted.

In the incentive model, you might consider three groups of actors.
"Honest", "rational", and "byzantine". The rational my diverge from the
prescribed protocol for self gain, but the honest deviate.

-- Jae Kwon

I wonder what the standard adversary model is for DHTs, and what happens
when some significant fraction of participants are malicious.

-- Greg Troxel

Dealing with Sybil Attack and Eclipse attack (Many corrupt nodes
choosing DHT identities close to some value).

-- real, Micheal Rogers

Creating a rigorous adversarial model for DHTs. (I think that we still
don't have one. Most articles on this subject resort to experimentation
because they can't prove correctness).

-- real



4. Practical limits.

Running a DHT despite the NAT problems. (While many present non-elegant
technical ideas to route around NATs, there might be a nice theoretical
solution).

-- real.

Navigation using Virtual DHTs (Like done in Cjdns. I think nobody really
knows to prove why it works, and whether it is going to scale).

-- real.



5. Applications.

Go back to square one and ask what other pointer-based data structures
would lend themselves to distributed implementation...

-- Micheal Rogers

For example, IPFS, essentially a singly-rooted git tree distributed via
bittorrent. Something that this makes possible, is everyone publishing
trees and software intelligently combining those. ... My idea is a piece
of data is available through many paths. For example, I want
.../book/by/Frederik Pohl/Gateway/ and .../book/award/Hugo/1978/winner/
to point to the same location. Navigation could be, in part, winnowing
of possible completion paths. It could be coupled with a signing system
that allows individuals or organizations to authoritatively publish
nodes. So, different groups could curate different subtrees. All of
these are then conglomerated so the ideal content is retrieved when a
user requests something.

-- Will Holcomb
Post by ianG
Request for comments: what are the open topics in DHTs?
I'm asking on behalf of a 4th year CS student preparing to do the final
year project; with possible extension into Masters. Having worked with
DHTs, the bug appears to have struck...
Aymeric Vitte
2015-02-25 13:27:19 UTC
Permalink
Replying late to this one, since it seems not to appear in the recap, I
would suggest:

- more private DHT, like [1], this is simple and maybe can be
extended/improved but still applies even if the monitors can not choose
their nodeID

- mix of hidden rendez-vous and DHT, like [2], the DHT is used only if
the content can not be found, the peers can not know if they are
replying directly to an anonymous request or through a hidden rendez-vous

- WebRTC DHT, basically the principles are that peers can introduce each
others, as sketched in [3], still some nodes are required to bootstrap
the process

[1] https://github.com/Ayms/torrent-live#findspies
[2] https://github.com/Ayms/node-Tor#content-discovery
[3] https://github.com/Ayms/node-Tor#bootstrap-and-peers-discovery
Post by ianG
1. Anonymous participation is the idea that you can participate in a
distributed system without revealing that to third parties, or at
least to third parties that you do not trust. Think of the structure
of underground networks, where members only communicate with people
they really trust, because every contact that you don't know about
could work for the Gestapo, or the KGB, or the RIAA. This implies
serious restrictions on the topology of the graph, which means
different message routing protocols than classic DHT like Kademlia. It
also require secure protocols to add/remove contacts, or to
connect/reconnect to the graph. Freenet's "dark net" does lots of
that, and the question is, what would it take to make it mainstream?
-- Christian Huitema, also mentioned by Michael Rogers
2. Hidden rendezvous is the idea of enabling something like Skype on
a DHT, but hiding who communicates with whom. Suppose that I connect
to the network anonymously, e.g. using a public Wi-Fi access point,
and that you do the same. How do we find each other without revealing
our location to the NSA and its peers? One potential solution is that
we agree on a secret and derive from it a series of random numbers,
say hash(secret, time-of-day, my-name). Then use that as a key to
publish an IP address in the DHT. A bit clumsy, of course. Can you do
better?
-- Christian Huitema
3. Adversary Model and Incentives.
I think the standard for most decentralized systems including DHT's
and consensus networks is in the "honest" vs "Byzantine" model, with
no regards to incentives. I think the incentive model is a recent
thing since Bitcoin, and it isn't well accepted.
In the incentive model, you might consider three groups of actors.
"Honest", "rational", and "byzantine". The rational my diverge from
the prescribed protocol for self gain, but the honest deviate.
-- Jae Kwon
I wonder what the standard adversary model is for DHTs, and what
happens when some significant fraction of participants are malicious.
-- Greg Troxel
Dealing with Sybil Attack and Eclipse attack (Many corrupt nodes
choosing DHT identities close to some value).
-- real, Micheal Rogers
Creating a rigorous adversarial model for DHTs. (I think that we still
don't have one. Most articles on this subject resort to
experimentation because they can't prove correctness).
-- real
4. Practical limits.
Running a DHT despite the NAT problems. (While many present
non-elegant technical ideas to route around NATs, there might be a
nice theoretical solution).
-- real.
Navigation using Virtual DHTs (Like done in Cjdns. I think nobody
really knows to prove why it works, and whether it is going to scale).
-- real.
5. Applications.
Go back to square one and ask what other pointer-based data structures
would lend themselves to distributed implementation...
-- Micheal Rogers
For example, IPFS, essentially a singly-rooted git tree distributed
via bittorrent. Something that this makes possible, is everyone
publishing trees and software intelligently combining those. ... My
idea is a piece of data is available through many paths. For example,
I want .../book/by/Frederik Pohl/Gateway/ and
.../book/award/Hugo/1978/winner/ to point to the same location.
Navigation could be, in part, winnowing of possible completion paths.
It could be coupled with a signing system that allows individuals or
organizations to authoritatively publish nodes. So, different groups
could curate different subtrees. All of these are then conglomerated
so the ideal content is retrieved when a user requests something.
-- Will Holcomb
Post by ianG
Request for comments: what are the open topics in DHTs?
I'm asking on behalf of a 4th year CS student preparing to do the final
year project; with possible extension into Masters. Having worked with
DHTs, the bug appears to have struck...
_______________________________________________
p2p-hackers mailing list
http://lists.zooko.com/mailman/listinfo/p2p-hackers
--
Check the 10 M passwords list: http://peersm.com/findmyass
Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
Peersm : http://www.peersm.com
torrent-live: https://github.com/Ayms/torrent-live
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms
Serguei Osokine
2015-01-04 22:49:27 UTC
Permalink
Post by ianG
what are the open topics in DHTs?
I would also add the complicated search queries; not sure, maybe there
is already a good soution somehwere, but generally, when different DHT
nodes are responsible for parts of the key space, finding everything
that has both "sex" and "city" might result in the unmanageable traffic
volume. Both key words might yield lots of pointers to nodes with that
content, and finding the intersection between these pointer sets might
be really complicated, even if the result will be just a single node
with "sex and the city" content.

(Note that I already simplified this task by applying some optimization
and throwing out common "the" and "and" seach terms - with these terms
left in the query, the task would be even more complicated.)

One common solution would be looking for exact match for the whole query
"sex and the city", of course - but I'm talking about the real multiterm
search here, where "sex city", "sex and city", "and the city", etc would
be done with roughly the same system load. And to discourage the idea of
simply hashing every conceivable combination of "sex", "and", "the", and
"city" ((2^n - 1) nodes with this same content pointer if the canonical
search terms order is used, right?) - anyway, to avoid that, even "city
candace", should give the same result, adding the content metadata to
the searched volume.

Bonus is given for page ranking implementation on DHT, and superbonus -
for finding misspelled queries (fuzzy logic and distance in word space metrics to be used?) and for real-time search suggestions. Basically,
the idea is to put Google search out of business. Then finally we won't
see any DMCA-removed results when looking stuff up. :)

Best wishes -
S.Osokine.
4 Jan 2015.


-----Original Message-----
From: p2p-hackers [mailto:p2p-hackers-***@zim.maski.org]On Behalf Of
ianG
Sent: Thursday, January 01, 2015 4:51 PM
To: p2p-***@zim.maski.org
Subject: [p2p-hackers] q: open topics in DHTs and the like


Hi all and happy new year!

Request for comments: what are the open topics in DHTs?

I'm asking on behalf of a 4th year CS student preparing to do the final
year project; with possible extension into Masters. Having worked with
DHTs, the bug appears to have struck...

iang
_______________________________________________
p2p-hackers mailing list
p2p-***@lists.zooko.com
http://lists.zooko.com/mailman/listinfo/p2p-hackers
Tony Arcieri
2015-01-04 22:46:53 UTC
Permalink
Post by Serguei Osokine
I would also add the complicated search queries; not sure, maybe there
is already a good soution somehwere, but generally, when different DHT
nodes are responsible for parts of the key space, finding everything
that has both "sex" and "city" might result in the unmanageable traffic
volume. Both key words might yield lots of pointers to nodes with that
content, and finding the intersection between these pointer sets might
be really complicated, even if the result will be just a single node
with "sex and the city" content.
There's a really neat solution to this problem called Hyperspace Hashing
which has you model your data in a set of different dimensions (i.e. a
euclidian hyperspace), and locate a point within the hyperspace based on
the hashes of the inputs. You then break the hyperspace down into a finite
number of subspaces, and partition those among the available servers. The
familiar hash ring can be seen as the one dimensional version of this
concept.

It doesn't quite fit the full text search scenario you described, but it
does help expressing more powerful queries than just k/v lookup:

http://www.cs.cornell.edu/people/egs/papers/hyperdex-sigcomm.pdf
http://www.slideshare.net/DECK36/c36-hyperdexmeetupdec2013v2nogfx
--
Tony Arcieri
Thomas Bocek
2015-01-05 08:42:36 UTC
Permalink
Post by Serguei Osokine
Bonus is given for page ranking implementation on DHT, and superbonus -
for finding misspelled queries (fuzzy logic and distance in word space metrics to be used?) and for real-time search suggestions. Basically,
the idea is to put Google search out of business. Then finally we won't
see any DMCA-removed results when looking stuff up. :)
For misspelled queries you can use FastSS (http://fastss.csg.uzh.ch/).
With that you can find a misspelled word with edit distance of 1 and
partially edit distance 2 with low overhead.
Post by Serguei Osokine
Request for comments: what are the open topics in DHTs?
I'm asking on behalf of a 4th year CS student preparing to do the final
year project; with possible extension into Masters. Having worked with
DHTs, the bug appears to have struck...
Our current work at UZH that involves DHT are:

* vDHT - adding versions to DHT for making consistent updates
(https://files.ifi.uzh.ch/CSG/staff/bocek/extern/theses/MA-Sebastian-Golaszewski.pdf).
With versions in the DHT we can do forks and merges for particular values.

* NAT - we are currently working on hole-punching, where the rendez-vous
peer is selected based on your neighbor set. The rendez-vous peer will
also handle the routing for the NATed peer.

* DHT on Androind - we are playing with delay tolerant messages with
google cloud messaging in DHTs (routing excluded) to save energy. This
is work is also in progress.

* A port of TomP2P to .NET with a set of benchmark tools to optimize the
DHT performance.

All of these work will be (or already are) included in TomP2P
(http://tomp2p.net/).

Regards,

Thomas

Jae Kwon
2015-01-02 01:06:03 UTC
Permalink
Can you design a robust one that works in the incentive model?
That is, assume that most nodes are participating to earn say, coins. (You
can assume a secure coin system).
Can you make the system work despite nodes that will cheat or lie (or
perhaps even collude) when possible for self gain?
Post by ianG
Hi all and happy new year!
Request for comments: what are the open topics in DHTs?
I'm asking on behalf of a 4th year CS student preparing to do the final
year project; with possible extension into Masters. Having worked with
DHTs, the bug appears to have struck...
iang
_______________________________________________
p2p-hackers mailing list
http://lists.zooko.com/mailman/listinfo/p2p-hackers
Queue
2015-01-02 18:12:43 UTC
Permalink
Besides the security-related topics others have mentioned, I think
there's still interesting work to be done on indexing over DHTs, i.e.,
efficiently making more complicated queries than just GET and PUT.

For example, in the context of Bittorrent's DHT, a single node can't
announce more than a couple thousand individual torrents to the DHT,
because each torrent requires a separate announce, and the announces
expire after an hour or so. At 1 announce per second, you can maintain a
theoretical max of 3600 torrents.

Besides the other inefficiencies of the Bittorrent p2p protocol, this is
a big reason why we don't see Bittorrent used for things like linux
package distribution, DVCS (git), or archive.org content (well, they
punted and use HTTP seeding). Anything that involves lots of small files
becomes prohibitive to announce, so we have to fall back on centralized
load balancing or manually curated mirror sites. (Note that for these
use cases, there'd still be large servers serving most of the files, but
they'd be integrated with other peers, no matter how large or small,
instead of on their own.)

The announce problem can be mitigated somewhat by making large
multi-file torrents, but then you have to manually balance between huge
torrents that people only needs bits and pieces of, and announce load.
I'd prefer a system that can automatically adjust granularity in a local
and decentralized manner, similar to how Kademlia balances load on
popular keys. While there are some good papers out there on doing range
queries and indexing on top of DHTs, none of ones I've found quite fit
the Bittorrent use case.

-q
Post by ianG
Hi all and happy new year!
Request for comments: what are the open topics in DHTs?
I'm asking on behalf of a 4th year CS student preparing to do the final
year project; with possible extension into Masters. Having worked with
DHTs, the bug appears to have struck...
iang
_______________________________________________
p2p-hackers mailing list
http://lists.zooko.com/mailman/listinfo/p2p-hackers
Continue reading on narkive:
Loading...