Tor

for Project:

This is the bug-tracker for Tor.  There is a guide for reporting bugs.

Tor is a connection-based low-latency anonymous communication system 
that protects TCP streams: web browsing, instant messaging, irc, ssh, etc. 
See https://www.torproject.org/ for more information.

To report bugs in the Torbutton Firefox browser extension, 
select the Torbutton project on this tracker.

To report bugs in the Vidalia front-end program, go to 
http://trac.vidalia-project.net/wiki/ReportingBugs .

Thanks for your help!



Task #847 — Extreme values in downloading directory information

Attached to Project— Tor
Opened by Karsten Loesing (karsten) - Monday, 3 Nov 2008, 11:21am
Last edited by Nick Mathewson (nickm) - Tuesday, 6 Jan 2009, 6:27pm
Bug Report
Tor client
Assigned
Nick Mathewson (nickm)
All
Low
Normal
0.2.1.5-alpha
0.2.1.x-final
0% complete
Sometimes it takes a really long time until Tor has downloaded enough
directory information to build circuits. In an experiment with 1,220
attempts to start new Tor instances, the following times were measured for
downloading directory information (in seconds):

 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
1.732   5.016   7.602  35.680  17.210 681.300 

The highest 50 values are:

141.560 151.013 151.880 164.672 169.010 184.269 184.280 187.733 207.903
222.899 238.430 248.529 250.513 252.959 265.249 269.104 281.840 295.144
307.421 307.719 307.863 307.940 308.403 308.438 308.672 308.730 308.741
309.053 309.293 309.357 309.839 310.033 310.259 310.275 311.930 313.203
313.385 313.597 315.260 317.916 318.327 325.420 362.017 367.175 452.797
487.972 489.534 537.029 615.759 681.341

A look at the log files reveals that Tor attempts to fetch network
statuses or authority certificates from a single directory that is
unavailable at the moment. The request times out after 120 seconds, and
another attempt is made---possibly to the same directory. During the
measurements, dannenberg appeared to be offline, so that attempts to that
authority timed out:

egrep "Bootstrap|86.59.21.38|216.224.124.114|213.73.91.31|141.13.4.202|128.31.0.34|194.109.206.212" log 
Nov 01 08:42:02.390 [info] Bootstrapped 0%: Starting.
Nov 01 08:42:02.390 [info] connection_ap_make_link(): Making internal direct tunnel to 194.109.206.212:443 ...
Nov 01 08:42:02.391 [notice] Bootstrapped 5%: Connecting to directory server.
Nov 01 08:42:02.393 [info] connection_ap_make_link(): Making internal direct tunnel to 213.73.91.31:443 ...
Nov 01 08:42:02.393 [info] directory_send_command(): Downloading consensus from 213.73.91.31:443 using /tor/status-vote/current/consensus.z
Nov 01 08:42:02.413 [notice] Bootstrapped 10%: Finishing handshake with directory server.
Nov 01 08:42:02.603 [info] command_process_versions_cell(): Negotiated version 2 with 194.109.206.212; sending NETINFO.
Nov 01 08:42:02.604 [notice] Bootstrapped 15%: Establishing an encrypted directory connection.
Nov 01 08:42:02.605 [info] command_process_netinfo_cell(): Got good NETINFO cell from 194.109.206.212; OR connection is now open, using protocol version 2
Nov 01 08:42:02.693 [notice] Bootstrapped 20%: Asking for networkstatus consensus.
Nov 01 08:42:02.790 [info] connection_dir_client_reached_eof(): Received authority certificates (size 9804) from server '194.109.206.212:443'
Nov 01 08:42:35.033 [info] command_process_versions_cell(): Negotiated version 2 with 213.73.91.31; sending NETINFO.
Nov 01 08:44:02.857 [info] connection_ap_expire_beginning(): Tried for 120 seconds to get a connection to 213.73.91.31:443. Giving up. (waiting for circuit)
Nov 01 08:44:02.857 [info] connection_ap_make_link(): Making internal direct tunnel to 213.73.91.31:443 ...
Nov 01 08:44:02.858 [info] directory_send_command(): Downloading consensus from 213.73.91.31:443 using /tor/status-vote/current/consensus.z
Nov 01 08:46:02.317 [info] connection_ap_expire_beginning(): Tried for 120 seconds to get a connection to 213.73.91.31:443. Giving up. (waiting for circuit)
Nov 01 08:47:07.565 [info] connection_ap_make_link(): Making internal direct tunnel to 213.73.91.31:443 ...
Nov 01 08:47:07.565 [info] directory_send_command(): Downloading consensus from 213.73.91.31:443 using /tor/status-vote/current/consensus.z
Nov 01 08:47:35.669 [info] run_connection_housekeeping(): Expiring non-open OR connection to fd 16 (213.73.91.31:443).
Nov 01 08:47:35.669 [info] connection_ap_fail_onehop(): Closing onehop stream to '$7BE683E65D48141321C5ED92F075C55364AC7123/213.73.91.31' because the OR conn just failed.
Nov 01 08:53:13.929 [info] connection_ap_make_link(): Making internal direct tunnel to 216.224.124.114:9090 ...
Nov 01 08:53:13.930 [info] directory_send_command(): Downloading consensus from 216.224.124.114:9090 using /tor/status-vote/current/consensus.z
Nov 01 08:53:15.788 [info] command_process_versions_cell(): Negotiated version 2 with 216.224.124.114; sending NETINFO.
Nov 01 08:53:15.788 [info] command_process_netinfo_cell(): Got good NETINFO cell from 216.224.124.114; OR connection is now open, using protocol version 2
Nov 01 08:53:16.677 [notice] Bootstrapped 25%: Loading networkstatus consensus.
Nov 01 08:53:21.322 [info] connection_dir_client_reached_eof(): Received consensus directory (size 256583) from server '216.224.124.114:9090'
Nov 01 08:53:21.345 [notice] Bootstrapped 45%: Asking for relay descriptors.
Nov 01 08:53:21.742 [notice] Bootstrapped 50%: Loading relay descriptors.
Nov 01 08:53:22.105 [notice] Bootstrapped 59%: Loading relay descriptors.
Nov 01 08:53:22.652 [notice] Bootstrapped 69%: Loading relay descriptors.
Nov 01 08:53:23.092 [notice] Bootstrapped 80%: Connecting to the Tor network.
Nov 01 08:53:24.408 [notice] Bootstrapped 90%: Establishing a Tor circuit.
Nov 01 08:53:25.394 [notice] Bootstrapped 100%: Done.
Nov 01 08:53:59.577 [info] connection_ap_make_link(): Making internal anonymized tunnel to 141.13.4.202:443 ...
Nov 01 09:08:24.117 [info] run_connection_housekeeping(): Expiring non-used OR connection to fd 15 (194.109.206.212:443) [Not in clique mode].

Possible fixes could be shorter timeouts or parallel requests. However,
these fixes have side-effects on network load. Maybe there is also a way to
detect and exclude unavailable directory more quickly (requests are
tunneled directory requests using one-hop circuits).




This task depends upon

This task blocks these from closing

Comments (9) | Attachments (0) | Related Tasks (0/0) | Notifications (4) | Reminders (0) | History |

Comment by Nick Mathewson - Monday, 3 Nov 2008, 3:34pm

Hm.  Perhaps our usual methods for noticing that directories aren't working and excluding them don't work
when we're using one-hop tunnels.  This needs more investigation.



Comment by Karsten Loesing - Wednesday, 5 Nov 2008, 11:46am

r17189 should be part of the solution. Tunneled directory requests and client-side
introduction circuits should time out after the 30 seconds, which they didn't. This
should at least mitigate the problem.



Comment by Karsten Loesing - Sunday, 16 Nov 2008, 1:36pm

Mystery resolved. The bad values are simply the result of directory
downloads that take a very long time. Tor waits for a directory download to
either finish or fail before making another one. That attempt might then be
directed to the same slow authority. The problem occurs when downloading
the consensus or authority certificates (but is more serious for the
latter, because it's bigger).

This becomes visible in the log files by large gaps between two log
statements (all pairs originate from different test runs):

Nov 15 00:34:35.667 [info] _connection_free(): Freeing linked Directory
connection [client reading] with 0 bytes on inbuf, 0 on outbuf.
Nov 15 00:40:11.906 [info] update_consensus_networkstatus_downloads():
Launching networkstatus consensus download.
--
Nov 14 23:06:07.497 [notice] Bootstrapped 25%: Loading networkstatus
consensus.
Nov 14 23:12:59.818 [info] run_connection_housekeeping(): Expiring wedged
directory conn (fd -1, purpose 14)
--
Nov 14 05:18:50.407 [notice] Bootstrapped 25%: Loading networkstatus
consensus.
Nov 14 05:25:01.475 [info] connection_edge_process_relay_cell(): -1: end
cell (closed normally) for stream 28178. Removing stream.
--
Nov 15 09:45:03.075 [notice] We're missing a certificate from authority
with signing key 8C4CA02AA1AA71FCC56F9F4A46E09A4719DD75C3: launching
request.
Nov 15 09:46:04.294 [notice] We're missing a certificate from authority
with signing key D6D2325E1511B23A825DBE1CFD3DF9285AAE4DEB: launching
request.




Comment by Sebastian Hahn - Friday, 21 Nov 2008, 11:26am

Following patch is from mortalcombat.

--- connection.c    Wed Nov 12 10:47:36 2008
+++ connection.dir_linked.c Fri Nov 21 10:02:40 2008
@@ -1552,7 +1552,16 @@
   if (authdir_mode(get_options()) && priority>1)
     return 0; /* there's always room to answer v2 if we're an auth dir */

-  if (!connection_is_rate_limited(conn))
+  tor_assert(conn->_base.type == CONN_TYPE_DIR);
+  /* connection_is_rate_limited() helping with exclude moved bytes of
+   * buffers of linked conns from extra decrements of our buckets.
+   * But this call is exception, only local conns (why? whose it's conns?)
+   * not rate limited for directory operations. Tunneled directory conns 
+   * should be limitted.
+   * Else answered bytes just server's memory and client's time consume  
+   *  if not enought bandwith present.
+   */
+  if (!connection_is_rate_limited(conn) && !conn->linked)
     return 0; /* local conns don't get limited */

   if (smaller_bucket < (int)attempt)




Comment by Roger Dingledine - Friday, 21 Nov 2008, 3:41pm

This patch is also from mortalkombat. I'm worried that it might return 1
for authdirs in the !old_dir_conn case, but I will have to review it
once I've slept to be sure.

--- connection.c    Wed Nov 12 10:47:36 2008
+++ connection.dir_linked.c Fri Nov 21 12:06:38 2008
@@ -1549,16 +1549,21 @@
 {
   int smaller_bucket = global_write_bucket < global_relayed_write_bucket ?
                        global_write_bucket : global_relayed_write_bucket;
-  if (authdir_mode(get_options()) && priority>1)
+  int old_dir_conn = !conn->linked;
+  tor_assert(conn->type == CONN_TYPE_DIR);
+  /* tunneled conns for low bandwith authorities too slow, check for bucket */
+  if (authdir_mode(get_options()) && priority > 1 && old_dir_conn)
     return 0; /* there's always room to answer v2 if we're an auth dir */

-  if (!connection_is_rate_limited(conn))
+  /* tunneled conns for limited bandwith too slow, better 503 */
+  if (!connection_is_rate_limited(conn) && old_dir_conn)
     return 0; /* local conns don't get limited */

   if (smaller_bucket < (int)attempt)
     return 1; /* not enough space no matter the priority */

-  if (write_buckets_empty_last_second)
+  /* tunneled conns it's part of Tor's traffic, limited already */
+  if (write_buckets_empty_last_second && old_dir_conn)
     return 1; /* we're already hitting our limits, no more please */

   if (priority == 1) { /* old-style v1 query */



Comment by [Deleted] - Friday, 21 Nov 2008, 9:20pm

Limiting of service from a limited number of authdirs 
can be a convenient target for an attack on a denial of service.
(new clients with empty the cache of documents as potential victims
 for the proposed patch ) 
However, current versions have such documents also, it's a certificates 
(while other unlimited, such as consensus), and attack might 
be directed against all recipients - as mirrors and clients. 
For authdir perhaps better rule: slowly but distributed among all, as quickly but 
only for the attacker. 
There are two ways to solve the problem of congestion
while receiving of consensus from authdirs (and/or): 
1) Encourage the owners of authdirs with congestion to increase the ratio of 
  bandwidthburst / bandwithrate (with reachable bandwidthburst in practice); 
2) clients reduce allowable idletime in obtaining data from the directory,
and after trigger timer to avoid re-use this directory while receiving data. 

(The remaining two items of patch is tasks to integrate affordable of
 global write bucket  on the side of mirrors for all documents or
 certificates from authdirs, with intent distribute documents
 through tunneled dir conns)



Comment by Roger Dingledine - Monday, 22 Dec 2008, 11:34am

Another component to fixing this is that authorities with low bandwidth should set
MaxAdvertisedBandwidth 10 KB
so they attract less traffic.

But the other fixes above should be considered too.



Comment by Nick Mathewson - Monday, 9 Feb 2009, 6:33pm

The basis of the patches above seems like a good idea, though we should probably refactor some to make it cleaner.
(The big thing is that connection_is_rate_limited() probably wants to allow unlimited reads from a linked connection,
but limit writes to it so as not to clog our ram.  This might mean that we pull the linked_conn check higher, or such.)

Roger, I think you're right that the second version can return 1 for authdirs.

I think that for 0.2.1.x we should:
  - Fix the possibility if any of linked connection buffers getting enormous.
  - Fix the interaction of linked_conn and connection_is_rate_limited.

Idea 2 above (reduce timeout on fetches and treat a timeout as making a directory less desirable) is a fine one,
but it seems tricky to get right, and should maybe wait for 0.2.2.x and get a proposal.  The case I'm worried
about is the one in which a user has a really slow connection such that they can't get _any_ consensus under
the default timeout, and so they fail entirely.





Comment by Sebastian Hahn - Saturday, 19 Sep 2009, 7:12pm

Does Mike's work affect this in any way?