This is the bug-tracker for Tor. There is a guide for reporting bugs. Tor is a connection-based low-latency anonymous communication system that protects TCP streams: web browsing, instant messaging, irc, ssh, etc. See https://www.torproject.org/ for more information. To report bugs in the Torbutton Firefox browser extension, select the Torbutton project on this tracker. To report bugs in the Vidalia front-end program, go to http://trac.vidalia-project.net/wiki/ReportingBugs . Thanks for your help!
Task #847 — Extreme values in downloading directory information
Attached to Project— Tor
Opened by Karsten Loesing (karsten) - Monday, 3 Nov 2008, 11:21am
Last edited by Nick Mathewson (nickm) - Tuesday, 6 Jan 2009, 6:27pm
Last edited by Nick Mathewson (nickm) - Tuesday, 6 Jan 2009, 6:27pm
| Bug Report | |
| Tor client | |
| Assigned | |
| Nick Mathewson (nickm) | |
| All |
| Low | |
| Normal | |
| 0.2.1.5-alpha | |
| 0.2.1.x-final | |
Sometimes it takes a really long time until Tor has downloaded enough directory information to build circuits. In an experiment with 1,220 attempts to start new Tor instances, the following times were measured for downloading directory information (in seconds): Min. 1st Qu. Median Mean 3rd Qu. Max. 1.732 5.016 7.602 35.680 17.210 681.300 The highest 50 values are: 141.560 151.013 151.880 164.672 169.010 184.269 184.280 187.733 207.903 222.899 238.430 248.529 250.513 252.959 265.249 269.104 281.840 295.144 307.421 307.719 307.863 307.940 308.403 308.438 308.672 308.730 308.741 309.053 309.293 309.357 309.839 310.033 310.259 310.275 311.930 313.203 313.385 313.597 315.260 317.916 318.327 325.420 362.017 367.175 452.797 487.972 489.534 537.029 615.759 681.341 A look at the log files reveals that Tor attempts to fetch network statuses or authority certificates from a single directory that is unavailable at the moment. The request times out after 120 seconds, and another attempt is made---possibly to the same directory. During the measurements, dannenberg appeared to be offline, so that attempts to that authority timed out: egrep "Bootstrap|86.59.21.38|216.224.124.114|213.73.91.31|141.13.4.202|128.31.0.34|194.109.206.212" log Nov 01 08:42:02.390 [info] Bootstrapped 0%: Starting. Nov 01 08:42:02.390 [info] connection_ap_make_link(): Making internal direct tunnel to 194.109.206.212:443 ... Nov 01 08:42:02.391 [notice] Bootstrapped 5%: Connecting to directory server. Nov 01 08:42:02.393 [info] connection_ap_make_link(): Making internal direct tunnel to 213.73.91.31:443 ... Nov 01 08:42:02.393 [info] directory_send_command(): Downloading consensus from 213.73.91.31:443 using /tor/status-vote/current/consensus.z Nov 01 08:42:02.413 [notice] Bootstrapped 10%: Finishing handshake with directory server. Nov 01 08:42:02.603 [info] command_process_versions_cell(): Negotiated version 2 with 194.109.206.212; sending NETINFO. Nov 01 08:42:02.604 [notice] Bootstrapped 15%: Establishing an encrypted directory connection. Nov 01 08:42:02.605 [info] command_process_netinfo_cell(): Got good NETINFO cell from 194.109.206.212; OR connection is now open, using protocol version 2 Nov 01 08:42:02.693 [notice] Bootstrapped 20%: Asking for networkstatus consensus. Nov 01 08:42:02.790 [info] connection_dir_client_reached_eof(): Received authority certificates (size 9804) from server '194.109.206.212:443' Nov 01 08:42:35.033 [info] command_process_versions_cell(): Negotiated version 2 with 213.73.91.31; sending NETINFO. Nov 01 08:44:02.857 [info] connection_ap_expire_beginning(): Tried for 120 seconds to get a connection to 213.73.91.31:443. Giving up. (waiting for circuit) Nov 01 08:44:02.857 [info] connection_ap_make_link(): Making internal direct tunnel to 213.73.91.31:443 ... Nov 01 08:44:02.858 [info] directory_send_command(): Downloading consensus from 213.73.91.31:443 using /tor/status-vote/current/consensus.z Nov 01 08:46:02.317 [info] connection_ap_expire_beginning(): Tried for 120 seconds to get a connection to 213.73.91.31:443. Giving up. (waiting for circuit) Nov 01 08:47:07.565 [info] connection_ap_make_link(): Making internal direct tunnel to 213.73.91.31:443 ... Nov 01 08:47:07.565 [info] directory_send_command(): Downloading consensus from 213.73.91.31:443 using /tor/status-vote/current/consensus.z Nov 01 08:47:35.669 [info] run_connection_housekeeping(): Expiring non-open OR connection to fd 16 (213.73.91.31:443). Nov 01 08:47:35.669 [info] connection_ap_fail_onehop(): Closing onehop stream to '$7BE683E65D48141321C5ED92F075C55364AC7123/213.73.91.31' because the OR conn just failed. Nov 01 08:53:13.929 [info] connection_ap_make_link(): Making internal direct tunnel to 216.224.124.114:9090 ... Nov 01 08:53:13.930 [info] directory_send_command(): Downloading consensus from 216.224.124.114:9090 using /tor/status-vote/current/consensus.z Nov 01 08:53:15.788 [info] command_process_versions_cell(): Negotiated version 2 with 216.224.124.114; sending NETINFO. Nov 01 08:53:15.788 [info] command_process_netinfo_cell(): Got good NETINFO cell from 216.224.124.114; OR connection is now open, using protocol version 2 Nov 01 08:53:16.677 [notice] Bootstrapped 25%: Loading networkstatus consensus. Nov 01 08:53:21.322 [info] connection_dir_client_reached_eof(): Received consensus directory (size 256583) from server '216.224.124.114:9090' Nov 01 08:53:21.345 [notice] Bootstrapped 45%: Asking for relay descriptors. Nov 01 08:53:21.742 [notice] Bootstrapped 50%: Loading relay descriptors. Nov 01 08:53:22.105 [notice] Bootstrapped 59%: Loading relay descriptors. Nov 01 08:53:22.652 [notice] Bootstrapped 69%: Loading relay descriptors. Nov 01 08:53:23.092 [notice] Bootstrapped 80%: Connecting to the Tor network. Nov 01 08:53:24.408 [notice] Bootstrapped 90%: Establishing a Tor circuit. Nov 01 08:53:25.394 [notice] Bootstrapped 100%: Done. Nov 01 08:53:59.577 [info] connection_ap_make_link(): Making internal anonymized tunnel to 141.13.4.202:443 ... Nov 01 09:08:24.117 [info] run_connection_housekeeping(): Expiring non-used OR connection to fd 15 (194.109.206.212:443) [Not in clique mode]. Possible fixes could be shorter timeouts or parallel requests. However, these fixes have side-effects on network load. Maybe there is also a way to detect and exclude unavailable directory more quickly (requests are tunneled directory requests using one-hop circuits). |
This task depends upon
This task blocks these from closing
Comments (9) | Attachments (0) | Related Tasks (0/0) | Notifications (4) | Reminders (0) | History |
Comment by Nick Mathewson - Monday, 3 Nov 2008, 3:34pm
Hm. Perhaps our usual methods for noticing that directories aren't working and excluding them don't work when we're using one-hop tunnels. This needs more investigation.
Comment by Karsten Loesing - Wednesday, 5 Nov 2008, 11:46am
r17189 should be part of the solution. Tunneled directory requests and client-side introduction circuits should time out after the 30 seconds, which they didn't. This should at least mitigate the problem.
Comment by Karsten Loesing - Sunday, 16 Nov 2008, 1:36pm
Mystery resolved. The bad values are simply the result of directory downloads that take a very long time. Tor waits for a directory download to either finish or fail before making another one. That attempt might then be directed to the same slow authority. The problem occurs when downloading the consensus or authority certificates (but is more serious for the latter, because it's bigger). This becomes visible in the log files by large gaps between two log statements (all pairs originate from different test runs): Nov 15 00:34:35.667 [info] _connection_free(): Freeing linked Directory connection [client reading] with 0 bytes on inbuf, 0 on outbuf. Nov 15 00:40:11.906 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download. -- Nov 14 23:06:07.497 [notice] Bootstrapped 25%: Loading networkstatus consensus. Nov 14 23:12:59.818 [info] run_connection_housekeeping(): Expiring wedged directory conn (fd -1, purpose 14) -- Nov 14 05:18:50.407 [notice] Bootstrapped 25%: Loading networkstatus consensus. Nov 14 05:25:01.475 [info] connection_edge_process_relay_cell(): -1: end cell (closed normally) for stream 28178. Removing stream. -- Nov 15 09:45:03.075 [notice] We're missing a certificate from authority with signing key 8C4CA02AA1AA71FCC56F9F4A46E09A4719DD75C3: launching request. Nov 15 09:46:04.294 [notice] We're missing a certificate from authority with signing key D6D2325E1511B23A825DBE1CFD3DF9285AAE4DEB: launching request.
Comment by Sebastian Hahn - Friday, 21 Nov 2008, 11:26am
Following patch is from mortalcombat.
--- connection.c Wed Nov 12 10:47:36 2008
+++ connection.dir_linked.c Fri Nov 21 10:02:40 2008
@@ -1552,7 +1552,16 @@
if (authdir_mode(get_options()) && priority>1)
return 0; /* there's always room to answer v2 if we're an auth dir */
- if (!connection_is_rate_limited(conn))
+ tor_assert(conn->_base.type == CONN_TYPE_DIR);
+ /* connection_is_rate_limited() helping with exclude moved bytes of
+ * buffers of linked conns from extra decrements of our buckets.
+ * But this call is exception, only local conns (why? whose it's conns?)
+ * not rate limited for directory operations. Tunneled directory conns
+ * should be limitted.
+ * Else answered bytes just server's memory and client's time consume
+ * if not enought bandwith present.
+ */
+ if (!connection_is_rate_limited(conn) && !conn->linked)
return 0; /* local conns don't get limited */
if (smaller_bucket < (int)attempt)
Comment by Roger Dingledine - Friday, 21 Nov 2008, 3:41pm
This patch is also from mortalkombat. I'm worried that it might return 1
for authdirs in the !old_dir_conn case, but I will have to review it
once I've slept to be sure.
--- connection.c Wed Nov 12 10:47:36 2008
+++ connection.dir_linked.c Fri Nov 21 12:06:38 2008
@@ -1549,16 +1549,21 @@
{
int smaller_bucket = global_write_bucket < global_relayed_write_bucket ?
global_write_bucket : global_relayed_write_bucket;
- if (authdir_mode(get_options()) && priority>1)
+ int old_dir_conn = !conn->linked;
+ tor_assert(conn->type == CONN_TYPE_DIR);
+ /* tunneled conns for low bandwith authorities too slow, check for bucket */
+ if (authdir_mode(get_options()) && priority > 1 && old_dir_conn)
return 0; /* there's always room to answer v2 if we're an auth dir */
- if (!connection_is_rate_limited(conn))
+ /* tunneled conns for limited bandwith too slow, better 503 */
+ if (!connection_is_rate_limited(conn) && old_dir_conn)
return 0; /* local conns don't get limited */
if (smaller_bucket < (int)attempt)
return 1; /* not enough space no matter the priority */
- if (write_buckets_empty_last_second)
+ /* tunneled conns it's part of Tor's traffic, limited already */
+ if (write_buckets_empty_last_second && old_dir_conn)
return 1; /* we're already hitting our limits, no more please */
if (priority == 1) { /* old-style v1 query */
Comment by [Deleted] - Friday, 21 Nov 2008, 9:20pm
Limiting of service from a limited number of authdirs can be a convenient target for an attack on a denial of service. (new clients with empty the cache of documents as potential victims for the proposed patch ) However, current versions have such documents also, it's a certificates (while other unlimited, such as consensus), and attack might be directed against all recipients - as mirrors and clients. For authdir perhaps better rule: slowly but distributed among all, as quickly but only for the attacker. There are two ways to solve the problem of congestion while receiving of consensus from authdirs (and/or): 1) Encourage the owners of authdirs with congestion to increase the ratio of bandwidthburst / bandwithrate (with reachable bandwidthburst in practice); 2) clients reduce allowable idletime in obtaining data from the directory, and after trigger timer to avoid re-use this directory while receiving data. (The remaining two items of patch is tasks to integrate affordable of global write bucket on the side of mirrors for all documents or certificates from authdirs, with intent distribute documents through tunneled dir conns)
Comment by Roger Dingledine - Monday, 22 Dec 2008, 11:34am
Another component to fixing this is that authorities with low bandwidth should set MaxAdvertisedBandwidth 10 KB so they attract less traffic. But the other fixes above should be considered too.
Comment by Nick Mathewson - Monday, 9 Feb 2009, 6:33pm
The basis of the patches above seems like a good idea, though we should probably refactor some to make it cleaner. (The big thing is that connection_is_rate_limited() probably wants to allow unlimited reads from a linked connection, but limit writes to it so as not to clog our ram. This might mean that we pull the linked_conn check higher, or such.) Roger, I think you're right that the second version can return 1 for authdirs. I think that for 0.2.1.x we should: - Fix the possibility if any of linked connection buffers getting enormous. - Fix the interaction of linked_conn and connection_is_rate_limited. Idea 2 above (reduce timeout on fetches and treat a timeout as making a directory less desirable) is a fine one, but it seems tricky to get right, and should maybe wait for 0.2.2.x and get a proposal. The case I'm worried about is the one in which a user has a really slow connection such that they can't get _any_ consensus under the default timeout, and so they fail entirely.