1: #!/usr/bin/perl
2: # The LearningOnline Network with CAPA
3: # lonc maintains the connections to remote computers
4: #
5: # $Id: loncnew,v 1.31 2003/10/28 10:48:10 foxr Exp $
6: #
7: # Copyright Michigan State University Board of Trustees
8: #
9: # This file is part of the LearningOnline Network with CAPA (LON-CAPA).
10: ## LON-CAPA is free software; you can redistribute it and/or modify
11: # it under the terms of the GNU General Public License as published by
12: # the Free Software Foundation; either version 2 of the License, or
13: # (at your option) any later version.
14: #
15: # LON-CAPA is distributed in the hope that it will be useful,
16: # but WITHOUT ANY WARRANTY; without even the implied warranty of
17: # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
18: # GNU General Public License for more details.
19: #
20: # You should have received a copy of the GNU General Public License
21: # along with LON-CAPA; if not, write to the Free Software
22: # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
23: #
24: # /home/httpd/html/adm/gpl.txt
25: #
26: # http://www.lon-capa.org/
27: #
28: #
29: # new lonc handles n request out bver m connections to londs.
30: # This module is based on the Event class.
31: # Development iterations:
32: # - Setup basic event loop. (done)
33: # - Add timer dispatch. (done)
34: # - Add ability to accept lonc UNIX domain sockets. (done)
35: # - Add ability to create/negotiate lond connections (done).
36: # - Add general logic for dispatching requests and timeouts. (done).
37: # - Add support for the lonc/lond requests. (done).
38: # - Add logging/status monitoring.
39: # - Add Signal handling - HUP restarts. USR1 status report.
40: # - Add Configuration file I/O (done).
41: # - Add management/status request interface.
42: # - Add deferred request capability. (done)
43: # - Detect transmission timeouts.
44: #
45:
46: # Change log:
47: # $Log: loncnew,v $
48: # Revision 1.31 2003/10/28 10:48:10 foxr
49: # Call LondConnection::ReadConfig explicitly so that the configuration is loaded
50: # prior to getting the hosts table hashiterator.
51: #
52: # Revision 1.30 2003/10/27 10:09:21 foxr
53: # Tighten up a few compares to eq and flip a few debug levels around... nothing
54: # critical
55: #
56: # Revision 1.29 2003/10/21 14:24:42 foxr
57: # Fix little typo that may explain growth of connections
58: #
59: # Revision 1.28 2003/10/14 15:36:21 albertel
60: # - making it easier to run loncnew,
61: # /etc/init.d/loncontrol startnew
62: # /etc/init.d/loncontrol restartnew
63: # will now start loncnew in place of lonc
64: #
65: # Revision 1.27 2003/10/07 11:23:03 foxr
66: # Installed and tested code to process reinit in parent server.
67: #
68: # Revision 1.26 2003/09/30 11:11:17 foxr
69: # Add book-keeping hashes to support the re-init procedure.
70: #
71: # Revision 1.25 2003/09/23 11:22:14 foxr
72: # Tested ability to receive sigusr2 This is now logged and must be
73: # properly implemented as a re-read of hosts and re-init of appropriate
74: # children.
75: #
76: # Revision 1.24 2003/09/16 09:46:42 foxr
77: # Added skeletal infrastructure to support SIGUSR2 update hosts request.
78: #
79: # Revision 1.23 2003/09/15 09:24:49 foxr
80: # Add use strict and fix all the fallout from that.
81: #
82: # Revision 1.22 2003/09/02 10:34:47 foxr
83: # - Fix errors in host dead detection logic (too many cases where the
84: # retries left were not getting incremented or just not checked).
85: # - Added some additional status to the ps axuww display:
86: # o Remaining retries on a host.
87: # o >>> DEAD <<< indicator if I've given up on a host.
88: # - Tested the SIGHUP will reset the retries remaining count (thanks to
89: # the above status stuff, and get allow the loncnew to re-try again
90: # on the host (thanks to the log).
91: #
92: # Revision 1.21 2003/08/26 09:19:51 foxr
93: # How embarrassing... put in the SocketTimeout function in loncnew and forgot
94: # to actually hook it into the LondTransaction. Added this to MakeLondConnection
95: # where it belongs... hopefully transactions (not just connection attempts) will
96: # timeout more speedily than the socket errors will catch it.
97: #
98: # Revision 1.20 2003/08/25 18:48:11 albertel
99: # - fixing a forgotten ;
100: #
101: # Revision 1.19 2003/08/19 09:31:46 foxr
102: # Get socket directory from configuration rather than the old hard coded test
103: # way that I forgot to un-hard code.
104: #
105: # Revision 1.18 2003/08/06 09:52:29 foxr
106: # Also needed to remember to fail in-flight transactions if their sends fail.
107: #
108: # Revision 1.17 2003/08/03 00:44:31 foxr
109: # 1. Correct handling of connection failure: Assume it means the host is
110: # unreachable and fail all of the queued transactions. Note that the
111: # inflight transactions should fail on their own time due either to timeout
112: # or send/receive failures.
113: # 2. Correct handling of logs for forced death signals. Pull the signal
114: # from the event watcher.
115: #
116: # Revision 1.16 2003/07/29 02:33:05 foxr
117: # Add SIGINT processing to child processes to toggle annoying trace mode
118: # on/off.. will try to use this to isolate the compute boud process issue.
119: #
120: # Revision 1.15 2003/07/15 02:07:05 foxr
121: # Added code for lonc/lond transaction timeouts. Who knows if it works right.
122: # The intent is for a timeout to fail any transaction in progress and kill
123: # off the sockt that timed out.
124: #
125: # Revision 1.14 2003/07/03 02:10:18 foxr
126: # Get all of the signals to work correctly.
127: #
128: # Revision 1.13 2003/07/02 01:31:55 foxr
129: # Added kill -HUP logic (restart).
130: #
131: # Revision 1.11 2003/06/25 01:54:44 foxr
132: # Fix more problems with transaction failure.
133: #
134: # Revision 1.10 2003/06/24 02:46:04 foxr
135: # Put a limit on the number of times we'll retry a connection.
136: # Start getting the signal stuff put in as well...note that need to get signals
137: # going or else the client will permanently give up on dead servers.
138: #
139: # Revision 1.9 2003/06/13 02:38:43 foxr
140: # Add logging in 'expected format'
141: #
142: # Revision 1.8 2003/06/11 02:04:35 foxr
143: # Support delayed transactions... this is done uniformly by encapsulating
144: # transactions in an object ... a LondTransaction that is implemented by
145: # LondTransaction.pm
146: #
147: # Revision 1.7 2003/06/03 01:59:39 foxr
148: # complete coding to support deferred transactions.
149: #
150: #
151: use strict;
152: use lib "/home/httpd/lib/perl/";
153: use lib "/home/foxr/newloncapa/types";
154: use Event qw(:DEFAULT );
155: use POSIX qw(:signal_h);
156: use POSIX;
157: use IO::Socket;
158: use IO::Socket::INET;
159: use IO::Socket::UNIX;
160: use IO::File;
161: use IO::Handle;
162: use Socket;
163: use Crypt::IDEA;
164: use LONCAPA::Queue;
165: use LONCAPA::Stack;
166: use LONCAPA::LondConnection;
167: use LONCAPA::LondTransaction;
168: use LONCAPA::Configuration;
169: use LONCAPA::HashIterator;
170:
171:
172: #
173: # Disable all signals we might receive from outside for now.
174: #
175:
176:
177: # Read the httpd configuration file to get perl variables
178: # normally set in apache modules:
179:
180: my $perlvarref = LONCAPA::Configuration::read_conf('loncapa.conf');
181: my %perlvar = %{$perlvarref};
182:
183: #
184: # parent and shared variables.
185:
186: my %ChildHash; # by pid -> host.
187: my %HostToPid; # By host -> pid.
188: my %HostHash; # by loncapaname -> IP.
189:
190:
191: my $MaxConnectionCount = 10; # Will get from config later.
192: my $ClientConnection = 0; # Uniquifier for client events.
193:
194: my $DebugLevel = 0;
195: my $NextDebugLevel= 2; # So Sigint can toggle this.
196: my $IdleTimeout= 3600; # Wait an hour before pruning connections.
197:
198: #
199: # The variables below are only used by the child processes.
200: #
201: my $RemoteHost; # Name of host child is talking to.
202: my $UnixSocketDir= $perlvar{'lonSockDir'};
203: my $IdleConnections = Stack->new(); # Set of idle connections
204: my %ActiveConnections; # Connections to the remote lond.
205: my %ActiveTransactions; # LondTransactions in flight.
206: my %ActiveClients; # Serial numbers of active clients by socket.
207: my $WorkQueue = Queue->new(); # Queue of pending transactions.
208: my $ConnectionCount = 0;
209: my $IdleSeconds = 0; # Number of seconds idle.
210: my $Status = ""; # Current status string.
211: my $RecentLogEntry = "";
212: my $ConnectionRetries=2; # Number of connection retries allowed.
213: my $ConnectionRetriesLeft=2; # Number of connection retries remaining.
214:
215: #
216: # The hash below gives the HTML format for log messages
217: # given a severity.
218: #
219: my %LogFormats;
220:
221: $LogFormats{"CRITICAL"} = "<font color=red>CRITICAL: %s</font>";
222: $LogFormats{"SUCCESS"} = "<font color=green>SUCCESS: %s</font>";
223: $LogFormats{"INFO"} = "<font color=yellow>INFO: %s</font>";
224: $LogFormats{"WARNING"} = "<font color=blue>WARNING: %s</font>";
225: $LogFormats{"DEFAULT"} = " %s ";
226:
227:
228:
229: =pod
230:
231: =head2 LogPerm
232:
233: Makes an entry into the permanent log file.
234:
235: =cut
236: sub LogPerm {
237: my $message=shift;
238: my $execdir=$perlvar{'lonDaemons'};
239: my $now=time;
240: my $local=localtime($now);
241: my $fh=IO::File->new(">>$execdir/logs/lonnet.perm.log");
242: print $fh "$now:$message:$local\n";
243: }
244:
245: =pod
246:
247: =head2 Log
248:
249: Logs a message to the log file.
250: Parameters:
251:
252: =item severity
253:
254: One of CRITICAL, WARNING, INFO, SUCCESS used to select the
255: format string used to format the message. if the severity is
256: not a defined severity the Default format string is used.
257:
258: =item message
259:
260: The base message. In addtion to the format string, the message
261: will be appended to a string containing the name of our remote
262: host and the time will be formatted into the message.
263:
264: =cut
265:
266: sub Log {
267: my $severity = shift;
268: my $message = shift;
269:
270: if(!$LogFormats{$severity}) {
271: $severity = "DEFAULT";
272: }
273:
274: my $format = $LogFormats{$severity};
275:
276: # Put the window dressing in in front of the message format:
277:
278: my $now = time;
279: my $local = localtime($now);
280: my $finalformat = "$local ($$) [$RemoteHost] [$Status] ";
281: my $finalformat = $finalformat.$format."\n";
282:
283: # open the file and put the result.
284:
285: my $execdir = $perlvar{'lonDaemons'};
286: my $fh = IO::File->new(">>$execdir/logs/lonc.log");
287: my $msg = sprintf($finalformat, $message);
288: $RecentLogEntry = $msg;
289: print $fh $msg;
290:
291:
292: }
293:
294:
295: =pod
296:
297: =head2 GetPeerName
298:
299: Returns the name of the host that a socket object is connected to.
300:
301: =cut
302:
303: sub GetPeername {
304: my $connection = shift;
305: my $AdrFamily = shift;
306: my $peer = $connection->peername();
307: my $peerport;
308: my $peerip;
309: if($AdrFamily == AF_INET) {
310: ($peerport, $peerip) = sockaddr_in($peer);
311: my $peername = gethostbyaddr($peerip, $AdrFamily);
312: return $peername;
313: } elsif ($AdrFamily == AF_UNIX) {
314: my $peerfile;
315: ($peerfile) = sockaddr_un($peer);
316: return $peerfile;
317: }
318: }
319: #----------------------------- Timer management ------------------------
320: =pod
321:
322: =head2 Debug
323:
324: Invoked to issue a debug message.
325:
326: =cut
327:
328: sub Debug {
329: my $level = shift;
330: my $message = shift;
331: if ($level <= $DebugLevel) {
332: Log("INFO", "-Debug- $message host = $RemoteHost");
333: }
334: }
335:
336: sub SocketDump {
337: my $level = shift;
338: my $socket= shift;
339: if($level <= $DebugLevel) {
340: $socket->Dump();
341: }
342: }
343:
344: =pod
345:
346: =head2 ShowStatus
347:
348: Place some text as our pid status.
349: and as what we return in a SIGUSR1
350:
351: =cut
352: sub ShowStatus {
353: my $state = shift;
354: my $now = time;
355: my $local = localtime($now);
356: $Status = $local.": ".$state;
357: $0='lonc: '.$state.' '.$local;
358: }
359:
360: =pod
361:
362: =head 2 SocketTimeout
363:
364: Called when an action on the socket times out. The socket is
365: destroyed and any active transaction is failed.
366:
367:
368: =cut
369: sub SocketTimeout {
370: my $Socket = shift;
371:
372: KillSocket($Socket); # A transaction timeout also counts as
373: # a connection failure:
374: $ConnectionRetriesLeft--;
375: }
376:
377: =pod
378:
379: =head2 Tick
380:
381: Invoked each timer tick.
382:
383: =cut
384:
385:
386: sub Tick {
387: my $client;
388: if($ConnectionRetriesLeft > 0) {
389: ShowStatus(GetServerHost()." Connection count: ".$ConnectionCount
390: ." Retries remaining: ".$ConnectionRetriesLeft);
391: } else {
392: ShowStatus(GetServerHost()." >> DEAD <<");
393: }
394: # Is it time to prune connection count:
395:
396:
397: if($IdleConnections->Count() &&
398: ($WorkQueue->Count() == 0)) { # Idle connections and nothing to do?
399: $IdleSeconds++;
400: if($IdleSeconds > $IdleTimeout) { # Prune a connection...
401: my $Socket = $IdleConnections->pop();
402: KillSocket($Socket);
403: }
404: } else {
405: $IdleSeconds = 0; # Reset idle count if not idle.
406: }
407: #
408: # For each inflight transaction, tick down its timeout counter.
409: #
410: foreach my $item (keys %ActiveTransactions) {
411: my $Socket = $ActiveTransactions{$item}->getServer();
412: $Socket->Tick();
413: }
414: # Do we have work in the queue, but no connections to service them?
415: # If so, try to make some new connections to get things going again.
416: #
417:
418: my $Requests = $WorkQueue->Count();
419: if (($ConnectionCount == 0) && ($Requests > 0)) {
420: if ($ConnectionRetriesLeft > 0) {
421: my $Connections = ($Requests <= $MaxConnectionCount) ?
422: $Requests : $MaxConnectionCount;
423: Debug(5,"Work but no connections, start ".$Connections." of them");
424: my $successCount = 0;
425: for (my $i =0; $i < $Connections; $i++) {
426: $successCount += MakeLondConnection();
427: }
428: if($successCount == 0) { # All connections failed:
429: Debug(5,"Work in queue failed to make any connectiouns\n");
430: EmptyQueue(); # Fail pending transactions with con_lost.
431: }
432: } else {
433: ShowStatus(GetServerHost()." >>> DEAD!!! <<<");
434: Debug(5,"Work in queue, but gave up on connections..flushing\n");
435: EmptyQueue(); # Connections can't be established.
436: }
437:
438: }
439: }
440:
441: =pod
442:
443: =head2 SetupTimer
444:
445: Sets up a 1 per sec recurring timer event. The event handler is used to:
446:
447: =item
448:
449: Trigger timeouts on communications along active sockets.
450:
451: =item
452:
453: Trigger disconnections of idle sockets.
454:
455: =cut
456:
457: sub SetupTimer {
458: Debug(6, "SetupTimer");
459: Event->timer(interval => 1, debug => 1, cb => \&Tick );
460: }
461:
462: =pod
463:
464: =head2 ServerToIdle
465:
466: This function is called when a connection to the server is
467: ready for more work.
468:
469: If there is work in the Work queue the top element is dequeued
470: and the connection will start to work on it. If the work queue is
471: empty, the connection is pushed on the idle connection stack where
472: it will either get another work unit, or alternatively, if it sits there
473: long enough, it will be shut down and released.
474:
475: =cut
476:
477: sub ServerToIdle {
478: my $Socket = shift; # Get the socket.
479: delete($ActiveTransactions{$Socket}); # Server has no transaction
480:
481: &Debug(5, "Server to idle");
482:
483: # If there's work to do, start the transaction:
484:
485: my $reqdata = $WorkQueue->dequeue(); # This is a LondTransaction
486: if ($reqdata ne undef) {
487: Debug(5, "Queue gave request data: ".$reqdata->getRequest());
488: &StartRequest($Socket, $reqdata);
489:
490: } else {
491:
492: # There's no work waiting, so push the server to idle list.
493: &Debug(5, "No new work requests, server connection going idle");
494: $IdleConnections->push($Socket);
495: }
496: }
497:
498: =pod
499:
500: =head2 ClientWritable
501:
502: Event callback for when a client socket is writable.
503:
504: This callback is established when a transaction reponse is
505: avaiable from lond. The response is forwarded to the unix socket
506: as it becomes writable in this sub.
507:
508: Parameters:
509:
510: =item Event
511:
512: The event that has been triggered. Event->w->data is
513: the data and Event->w->fd is the socket to write.
514:
515: =cut
516:
517: sub ClientWritable {
518: my $Event = shift;
519: my $Watcher = $Event->w;
520: my $Data = $Watcher->data;
521: my $Socket = $Watcher->fd;
522:
523: # Try to send the data:
524:
525: &Debug(6, "ClientWritable writing".$Data);
526: &Debug(9, "Socket is: ".$Socket);
527:
528: if($Socket->connected) {
529: my $result = $Socket->send($Data, 0);
530:
531: # $result undefined: the write failed.
532: # otherwise $result is the number of bytes written.
533: # Remove that preceding string from the data.
534: # If the resulting data is empty, destroy the watcher
535: # and set up a read event handler to accept the next
536: # request.
537:
538: &Debug(9,"Send result is ".$result." Defined: ".defined($result));
539: if($result ne undef) {
540: &Debug(9, "send result was defined");
541: if($result == length($Data)) { # Entire string sent.
542: &Debug(9, "ClientWritable data all written");
543: $Watcher->cancel();
544: #
545: # Set up to read next request from socket:
546:
547: my $descr = sprintf("Connection to lonc client %d",
548: $ActiveClients{$Socket});
549: Event->io(cb => \&ClientRequest,
550: poll => 'r',
551: desc => $descr,
552: data => "",
553: fd => $Socket);
554:
555: } else { # Partial string sent.
556: $Watcher->data(substr($Data, $result));
557: if($result == 0) { # client hung up on us!!
558: Log("INFO", "lonc pipe client hung up on us!");
559: $Watcher->cancel;
560: $Socket->shutdown(2);
561: $Socket->close();
562: }
563: }
564:
565: } else { # Error of some sort...
566:
567: # Some errnos are possible:
568: my $errno = $!;
569: if($errno == POSIX::EWOULDBLOCK ||
570: $errno == POSIX::EAGAIN ||
571: $errno == POSIX::EINTR) {
572: # No action taken?
573: } else { # Unanticipated errno.
574: &Debug(5,"ClientWritable error or peer shutdown".$RemoteHost);
575: $Watcher->cancel; # Stop the watcher.
576: $Socket->shutdown(2); # Kill connection
577: $Socket->close(); # Close the socket.
578: }
579:
580: }
581: } else {
582: $Watcher->cancel(); # A delayed request...just cancel.
583: }
584: }
585:
586: =pod
587:
588: =head2 CompleteTransaction
589:
590: Called when the reply data has been received for a lond
591: transaction. The reply data must now be sent to the
592: ultimate client on the other end of the Unix socket. This is
593: done by setting up a writable event for the socket with the
594: data the reply data.
595:
596: Parameters:
597:
598: =item Socket
599:
600: Socket on which the lond transaction occured. This is a
601: LondConnection. The data received is in the TransactionReply member.
602:
603: =item Transaction
604:
605: The transaction that is being completed.
606:
607: =cut
608:
609: sub CompleteTransaction {
610: &Debug(5,"Complete transaction");
611: my $Socket = shift;
612: my $Transaction = shift;
613:
614: if (!$Transaction->isDeferred()) { # Normal transaction
615: my $data = $Socket->GetReply(); # Data to send.
616: StartClientReply($Transaction, $data);
617: } else { # Delete deferred transaction file.
618: Log("SUCCESS", "A delayed transaction was completed");
619: LogPerm("S:$Transaction->getClient() :".$Transaction->getRequest());
620: unlink $Transaction->getFile();
621: }
622: }
623: =pod
624: =head1 StartClientReply
625:
626: Initiates a reply to a client where the reply data is a parameter.
627:
628: =head2 parameters:
629:
630: =item Transaction
631:
632: The transaction for which we are responding to the client.
633:
634: =item data
635:
636: The data to send to apached client.
637:
638: =cut
639: sub StartClientReply {
640: my $Transaction = shift;
641: my $data = shift;
642:
643:
644: my $Client = $Transaction->getClient();
645:
646: &Debug(8," Reply was: ".$data);
647: my $Serial = $ActiveClients{$Client};
648: my $desc = sprintf("Connection to lonc client %d",
649: $Serial);
650: Event->io(fd => $Client,
651: poll => "w",
652: desc => $desc,
653: cb => \&ClientWritable,
654: data => $data);
655: }
656: =pod
657: =head2 FailTransaction
658:
659: Finishes a transaction with failure because the associated lond socket
660: disconnected. There are two possibilities:
661: - The transaction is deferred: in which case we just quietly
662: delete the transaction since there is no client connection.
663: - The transaction is 'live' in which case we initiate the sending
664: of "con_lost" to the client.
665:
666: Deleting the transaction means killing it from the
667: %ActiveTransactions hash.
668:
669: Parameters:
670:
671: =item client
672:
673: The LondTransaction we are failing.
674:
675: =cut
676:
677: sub FailTransaction {
678: my $transaction = shift;
679: Log("WARNING", "Failing transaction ".$transaction->getRequest());
680: Debug(1, "Failing transaction: ".$transaction->getRequest());
681: if (!$transaction->isDeferred()) { # If the transaction is deferred we'll get to it.
682: my $client = $transaction->getClient();
683: Debug(1," Replying con_lost to ".$transaction->getRequest());
684: StartClientReply($transaction, "con_lost\n");
685: }
686: if($ConnectionRetriesLeft <= 0) {
687: Log("CRITICAL", "Host marked dead: ".GetServerHost());
688: }
689:
690: }
691:
692: =pod
693: =head1 EmptyQueue
694:
695: Fails all items in the work queue with con_lost.
696: Note that each item in the work queue is a transaction.
697:
698: =cut
699: sub EmptyQueue {
700: $ConnectionRetriesLeft--; # Counts as connection failure too.
701: while($WorkQueue->Count()) {
702: my $request = $WorkQueue->dequeue(); # This is a transaction
703: FailTransaction($request);
704: }
705: }
706:
707: =pod
708:
709: =head2 CloseAllLondConnections
710:
711: Close all connections open on lond prior to exit e.g.
712:
713: =cut
714: sub CloseAllLondConnections {
715: foreach my $Socket (keys %ActiveConnections) {
716: KillSocket($Socket);
717: }
718: }
719: =cut
720:
721: =pod
722:
723: =head2 KillSocket
724:
725: Destroys a socket. This function can be called either when a socket
726: has died of 'natural' causes or because a socket needs to be pruned due to
727: idleness. If the socket has died naturally, if there are no longer any
728: live connections a new connection is created (in case there are transactions
729: in the queue). If the socket has been pruned, it is never re-created.
730:
731: Parameters:
732:
733: =item Socket
734:
735: The socket to kill off.
736:
737: =item Restart
738:
739: nonzero if we are allowed to create a new connection.
740:
741:
742: =cut
743: sub KillSocket {
744: my $Socket = shift;
745:
746: Log("WARNING", "Shutting down a socket");
747: $Socket->Shutdown();
748:
749: # If the socket came from the active connection set,
750: # delete its transaction... note that FailTransaction should
751: # already have been called!!!
752: # otherwise it came from the idle set.
753: #
754:
755: if(exists($ActiveTransactions{$Socket})) {
756: delete ($ActiveTransactions{$Socket});
757: }
758: if(exists($ActiveConnections{$Socket})) {
759: delete($ActiveConnections{$Socket});
760: }
761: $ConnectionCount--;
762:
763: # If the connection count has gone to zero and there is work in the
764: # work queue, the work all gets failed with con_lost.
765: #
766: if($ConnectionCount == 0) {
767: EmptyQueue();
768: }
769: }
770:
771: =pod
772:
773: =head2 LondReadable
774:
775: This function is called whenever a lond connection
776: is readable. The action is state dependent:
777:
778: =head3 State=Initialized
779:
780: We''re waiting for the challenge, this is a no-op until the
781: state changes.
782:
783: =head3 State=Challenged
784:
785: The challenge has arrived we need to transition to Writable.
786: The connection must echo the challenge back.
787:
788: =head3 State=ChallengeReplied
789:
790: The challenge has been replied to. The we are receiveing the
791: 'ok' from the partner.
792:
793: =head3 State=RequestingKey
794:
795: The ok has been received and we need to send the request for
796: an encryption key. Transition to writable for that.
797:
798: =head3 State=ReceivingKey
799:
800: The the key has been requested, now we are reading the new key.
801:
802: =head3 State=Idle
803:
804: The encryption key has been negotiated or we have finished
805: reading data from the a transaction. If the callback data has
806: a client as well as the socket iformation, then we are
807: doing a transaction and the data received is relayed to the client
808: before the socket is put on the idle list.
809:
810: =head3 State=SendingRequest
811:
812: I do not think this state can be received here, but if it is,
813: the appropriate thing to do is to transition to writable, and send
814: the request.
815:
816: =head3 State=ReceivingReply
817:
818: We finished sending the request to the server and now transition
819: to readable to receive the reply.
820:
821: The parameter to this function are:
822:
823: The event. Implicit in this is the watcher and its data. The data
824: contains at least the lond connection object and, if a
825: transaction is in progress, the socket attached to the local client.
826:
827: =cut
828:
829: sub LondReadable {
830:
831: my $Event = shift;
832: my $Watcher = $Event->w;
833: my $Socket = $Watcher->data;
834: my $client = undef;
835:
836: &Debug(6,"LondReadable called state = ".$Socket->GetState());
837:
838:
839: my $State = $Socket->GetState(); # All action depends on the state.
840:
841: SocketDump(6, $Socket);
842: my $status = $Socket->Readable();
843:
844: &Debug(2, "Socket->Readable returned: $status");
845:
846: if($status != 0) {
847: # bad return from socket read. Currently this means that
848: # The socket has become disconnected. We fail the transaction.
849:
850: Log("WARNING",
851: "Lond connection lost.");
852: if(exists($ActiveTransactions{$Socket})) {
853: FailTransaction($ActiveTransactions{$Socket});
854: }
855: $Watcher->cancel();
856: KillSocket($Socket);
857: $ConnectionRetriesLeft--; # Counts as connection failure
858: return;
859: }
860: SocketDump(6,$Socket);
861:
862: $State = $Socket->GetState(); # Update in case of transition.
863: &Debug(6, "After read, state is ".$State);
864:
865: if($State eq "Initialized") {
866:
867:
868: } elsif ($State eq "ChallengeReceived") {
869: # The challenge must be echoed back; The state machine
870: # in the connection takes care of setting that up. Just
871: # need to transition to writable:
872:
873: $Watcher->cb(\&LondWritable);
874: $Watcher->poll("w");
875:
876: } elsif ($State eq "ChallengeReplied") {
877:
878:
879: } elsif ($State eq "RequestingKey") {
880: # The ok was received. Now we need to request the key
881: # That requires us to be writable:
882:
883: $Watcher->cb(\&LondWritable);
884: $Watcher->poll("w");
885:
886: } elsif ($State eq "ReceivingKey") {
887:
888: } elsif ($State eq "Idle") {
889: # If necessary, complete a transaction and then go into the
890: # idle queue.
891: # Note that a trasition to idle indicates a live lond
892: # on the other end so reset the connection retries.
893: #
894: $ConnectionRetriesLeft = $ConnectionRetries; # success resets the count
895: $Watcher->cancel();
896: if(exists($ActiveTransactions{$Socket})) {
897: Debug(5,"Completing transaction!!");
898: CompleteTransaction($Socket,
899: $ActiveTransactions{$Socket});
900: } else {
901: Log("SUCCESS", "Connection ".$ConnectionCount." to "
902: .$RemoteHost." now ready for action");
903: }
904: ServerToIdle($Socket); # Next work unit or idle.
905:
906: } elsif ($State eq "SendingRequest") {
907: # We need to be writable for this and probably don't belong
908: # here inthe first place.
909:
910: Deubg(6, "SendingRequest state encountered in readable");
911: $Watcher->poll("w");
912: $Watcher->cb(\&LondWritable);
913:
914: } elsif ($State eq "ReceivingReply") {
915:
916:
917: } else {
918: # Invalid state.
919: Debug(4, "Invalid state in LondReadable");
920: }
921: }
922:
923: =pod
924:
925: =head2 LondWritable
926:
927: This function is called whenever a lond connection
928: becomes writable while there is a writeable monitoring
929: event. The action taken is very state dependent:
930:
931: =head3 State = Connected
932:
933: The connection is in the process of sending the 'init' hailing to the
934: lond on the remote end. The connection object''s Writable member is
935: called. On error, ConnectionError is called to destroy the connection
936: and remove it from the ActiveConnections hash
937:
938: =head3 Initialized
939:
940: 'init' has been sent, writability monitoring is removed and
941: readability monitoring is started with LondReadable as the callback.
942:
943: =head3 ChallengeReceived
944:
945: The connection has received the who are you challenge from the remote
946: system, and is in the process of sending the challenge
947: response. Writable is called.
948:
949: =head3 ChallengeReplied
950:
951: The connection has replied to the initial challenge The we switch to
952: monitoring readability looking for the server to reply with 'ok'.
953:
954: =head3 RequestingKey
955:
956: The connection is in the process of requesting its encryption key.
957: Writable is called.
958:
959: =head3 ReceivingKey
960:
961: The connection has sent the request for a key. Switch to readability
962: monitoring to accept the key
963:
964: =head3 SendingRequest
965:
966: The connection is in the process of sending a request to the server.
967: This request is part of a client transaction. All the states until
968: now represent the client setup protocol. Writable is called.
969:
970: =head3 ReceivingReply
971:
972: The connection has sent a request. Now it must receive a reply.
973: Readability monitoring is requested.
974:
975: This function is an event handler and therefore receives as
976: a parameter the event that has fired. The data for the watcher
977: of this event is a reference to a list of one or two elements,
978: depending on state. The first (and possibly only) element is the
979: socket. The second (present only if a request is in progress)
980: is the socket on which to return a reply to the caller.
981:
982: =cut
983:
984: sub LondWritable {
985: my $Event = shift;
986: my $Watcher = $Event->w;
987: my $Socket = $Watcher->data;
988: my $State = $Socket->GetState();
989:
990: Debug(6,"LondWritable State = ".$State."\n");
991:
992:
993: # Figure out what to do depending on the state of the socket:
994:
995:
996:
997:
998: SocketDump(6,$Socket);
999:
1000: if ($State eq "Connected") {
1001:
1002: if ($Socket->Writable() != 0) {
1003: # The write resulted in an error.
1004: # We'll treat this as if the socket got disconnected:
1005: Log("WARNING", "Connection to ".$RemoteHost.
1006: " has been disconnected");
1007: FailTransaction($ActiveTransactions{$Socket});
1008: $Watcher->cancel();
1009: KillSocket($Socket);
1010: return;
1011: }
1012: # "init" is being sent...
1013:
1014:
1015: } elsif ($State eq "Initialized") {
1016:
1017: # Now that init was sent, we switch
1018: # to watching for readability:
1019:
1020: $Watcher->cb(\&LondReadable);
1021: $Watcher->poll("r");
1022:
1023: } elsif ($State eq "ChallengeReceived") {
1024: # We received the challenge, now we
1025: # are echoing it back. This is a no-op,
1026: # we're waiting for the state to change
1027:
1028: if($Socket->Writable() != 0) {
1029:
1030: $Watcher->cancel();
1031: KillSocket($Socket);
1032: return;
1033: }
1034:
1035: } elsif ($State eq "ChallengeReplied") {
1036: # The echo was sent back, so we switch
1037: # to watching readability.
1038:
1039: $Watcher->cb(\&LondReadable);
1040: $Watcher->poll("r");
1041:
1042: } elsif ($State eq "RequestingKey") {
1043: # At this time we're requesting the key.
1044: # again, this is essentially a no-op.
1045: # we'll write the next chunk until the
1046: # state changes.
1047:
1048: if($Socket->Writable() != 0) {
1049: # Write resulted in an error.
1050:
1051: $Watcher->cancel();
1052: KillSocket($Socket);
1053: return;
1054:
1055: }
1056: } elsif ($State eq "ReceivingKey") {
1057: # Now we need to wait for the key
1058: # to come back from the peer:
1059:
1060: $Watcher->cb(\&LondReadable);
1061: $Watcher->poll("r");
1062:
1063: } elsif ($State eq "SendingRequest") {
1064: # At this time we are sending a request to the
1065: # peer... write the next chunk:
1066:
1067: if($Socket->Writable() != 0) {
1068:
1069: if(exists($ActiveTransactions{$Socket})) {
1070: Debug(3, "Lond connection lost, failing transactions");
1071: FailTransaction($ActiveTransactions{$Socket});
1072: }
1073: $Watcher->cancel();
1074: KillSocket($Socket);
1075: return;
1076:
1077: }
1078:
1079: } elsif ($State eq "ReceivingReply") {
1080: # The send has completed. Wait for the
1081: # data to come in for a reply.
1082: Debug(8,"Writable sent request/receiving reply");
1083: $Watcher->cb(\&LondReadable);
1084: $Watcher->poll("r");
1085:
1086: } else {
1087: # Control only passes here on an error:
1088: # the socket state does not match any
1089: # of the known states... so an error
1090: # must be logged.
1091:
1092: &Debug(4, "Invalid socket state ".$State."\n");
1093: }
1094:
1095: }
1096: =pod
1097:
1098: =cut
1099: sub QueueDelayed {
1100: Debug(3,"QueueDelayed called");
1101:
1102: my $path = "$perlvar{'lonSockDir'}/delayed";
1103:
1104: Debug(4, "Delayed path: ".$path);
1105: opendir(DIRHANDLE, $path);
1106:
1107: my @alldelayed = grep /\.$RemoteHost$/, readdir DIRHANDLE;
1108: closedir(DIRHANDLE);
1109: my $dfname;
1110: my $reqfile;
1111: foreach $dfname (sort @alldelayed) {
1112: $reqfile = "$path/$dfname";
1113: Debug(4, "queueing ".$reqfile);
1114: my $Handle = IO::File->new($reqfile);
1115: my $cmd = <$Handle>;
1116: chomp $cmd; # There may or may not be a newline...
1117: $cmd = $cmd."\n"; # now for sure there's exactly one newline.
1118: my $Transaction = LondTransaction->new($cmd);
1119: $Transaction->SetDeferred($reqfile);
1120: QueueTransaction($Transaction);
1121: }
1122:
1123: }
1124:
1125: =pod
1126:
1127: =head2 MakeLondConnection
1128:
1129: Create a new lond connection object, and start it towards its initial
1130: idleness. Once idle, it becomes elligible to receive transactions
1131: from the work queue. If the work queue is not empty when the
1132: connection is completed and becomes idle, it will dequeue an entry and
1133: start off on it.
1134:
1135: =cut
1136:
1137: sub MakeLondConnection {
1138: Debug(4,"MakeLondConnection to ".GetServerHost()." on port "
1139: .GetServerPort());
1140:
1141: my $Connection = LondConnection->new(&GetServerHost(),
1142: &GetServerPort());
1143:
1144: if($Connection eq undef) { # Needs to be more robust later.
1145: Log("CRITICAL","Failed to make a connection with lond.");
1146: $ConnectionRetriesLeft--;
1147: return 0; # Failure.
1148: } else {
1149:
1150: # The connection needs to have writability
1151: # monitored in order to send the init sequence
1152: # that starts the whole authentication/key
1153: # exchange underway.
1154: #
1155: my $Socket = $Connection->GetSocket();
1156: if($Socket eq undef) {
1157: die "did not get a socket from the connection";
1158: } else {
1159: &Debug(9,"MakeLondConnection got socket: ".$Socket);
1160: }
1161:
1162: $Connection->SetTimeoutCallback(\&SocketTimeout);
1163:
1164: my $event = Event->io(fd => $Socket,
1165: poll => 'w',
1166: cb => \&LondWritable,
1167: data => $Connection,
1168: desc => 'Connection to lond server');
1169: $ActiveConnections{$Connection} = $event;
1170:
1171: $ConnectionCount++;
1172: Debug(4, "Connection count = ".$ConnectionCount);
1173: if($ConnectionCount == 1) { # First Connection:
1174: QueueDelayed;
1175: }
1176: Log("SUCESS", "Created connection ".$ConnectionCount
1177: ." to host ".GetServerHost());
1178: return 1; # Return success.
1179: }
1180:
1181: }
1182:
1183: =pod
1184:
1185: =head2 StartRequest
1186:
1187: Starts a lond request going on a specified lond connection.
1188: parameters are:
1189:
1190: =item $Lond
1191:
1192: Connection to the lond that will send the transaction and receive the
1193: reply.
1194:
1195: =item $Client
1196:
1197: Connection to the client that is making this request We got the
1198: request from this socket, and when the request has been relayed to
1199: lond and we get a reply back from lond it will get sent to this
1200: socket.
1201:
1202: =item $Request
1203:
1204: The text of the request to send.
1205:
1206: =cut
1207:
1208: sub StartRequest {
1209: my $Lond = shift;
1210: my $Request = shift; # This is a LondTransaction.
1211:
1212: Debug(6, "StartRequest: ".$Request->getRequest());
1213:
1214: my $Socket = $Lond->GetSocket();
1215:
1216: $Request->Activate($Lond);
1217: $ActiveTransactions{$Lond} = $Request;
1218:
1219: $Lond->InitiateTransaction($Request->getRequest());
1220: my $event = Event->io(fd => $Socket,
1221: poll => "w",
1222: cb => \&LondWritable,
1223: data => $Lond,
1224: desc => "lond transaction connection");
1225: $ActiveConnections{$Lond} = $event;
1226: Debug(8," Start Request made watcher data with ".$event->data."\n");
1227: }
1228:
1229: =pod
1230:
1231: =head2 QueueTransaction
1232:
1233: If there is an idle lond connection, it is put to work doing this
1234: transaction. Otherwise, the transaction is placed in the work queue.
1235: If placed in the work queue and the maximum number of connections has
1236: not yet been created, a new connection will be started. Our goal is
1237: to eventually have a sufficient number of connections that the work
1238: queue will typically be empty. parameters are:
1239:
1240: =item Socket
1241:
1242: open on the lonc client.
1243:
1244: =item Request
1245:
1246: data to send to the lond.
1247:
1248: =cut
1249:
1250: sub QueueTransaction {
1251:
1252: my $requestData = shift; # This is a LondTransaction.
1253: my $cmd = $requestData->getRequest();
1254:
1255: Debug(6,"QueueTransaction: ".$cmd);
1256:
1257: my $LondSocket = $IdleConnections->pop();
1258: if(!defined $LondSocket) { # Need to queue request.
1259: Debug(5,"Must queue...");
1260: $WorkQueue->enqueue($requestData);
1261: if($ConnectionCount < $MaxConnectionCount) {
1262: if($ConnectionRetriesLeft > 0) {
1263: Debug(5,"Starting additional lond connection");
1264: if(MakeLondConnection() == 0) {
1265: EmptyQueue(); # Fail transactions, can't make connection.
1266: }
1267: } else {
1268: ShowStatus(GetServerHost()." >>> DEAD !!!! <<<");
1269: EmptyQueue(); # It's worse than that ... he's dead Jim.
1270: }
1271: }
1272: } else { # Can start the request:
1273: Debug(8,"Can start...");
1274: StartRequest($LondSocket, $requestData);
1275: }
1276: }
1277:
1278: #-------------------------- Lonc UNIX socket handling ---------------------
1279:
1280: =pod
1281:
1282: =head2 ClientRequest
1283: Callback that is called when data can be read from the UNIX domain
1284: socket connecting us with an apache server process.
1285:
1286: =cut
1287:
1288: sub ClientRequest {
1289: Debug(6, "ClientRequest");
1290: my $event = shift;
1291: my $watcher = $event->w;
1292: my $socket = $watcher->fd;
1293: my $data = $watcher->data;
1294: my $thisread;
1295:
1296: Debug(9, " Watcher named: ".$watcher->desc);
1297:
1298: my $rv = $socket->recv($thisread, POSIX::BUFSIZ, 0);
1299: Debug(8, "rcv: data length = ".length($thisread)
1300: ." read =".$thisread);
1301: unless (defined $rv && length($thisread)) {
1302: # Likely eof on socket.
1303: Debug(5,"Client Socket closed on lonc for ".$RemoteHost);
1304: close($socket);
1305: $watcher->cancel();
1306: delete($ActiveClients{$socket});
1307: return;
1308: }
1309: Debug(8,"Data: ".$data." this read: ".$thisread);
1310: $data = $data.$thisread; # Append new data.
1311: $watcher->data($data);
1312: if($data =~ /(.*\n)/) { # Request entirely read.
1313: if($data eq "close_connection_exit\n") {
1314: Log("CRITICAL",
1315: "Request Close Connection ... exiting");
1316: CloseAllLondConnections();
1317: exit;
1318: }
1319: Debug(8, "Complete transaction received: ".$data);
1320: my $Transaction = LondTransaction->new($data);
1321: $Transaction->SetClient($socket);
1322: QueueTransaction($Transaction);
1323: $watcher->cancel(); # Done looking for input data.
1324: }
1325:
1326: }
1327:
1328:
1329: =pod
1330:
1331: =head2 NewClient
1332:
1333: Callback that is called when a connection is received on the unix
1334: socket for a new client of lonc. The callback is parameterized by the
1335: event.. which is a-priori assumed to be an io event, and therefore has
1336: an fd member that is the Listener socket. We Accept the connection
1337: and register a new event on the readability of that socket:
1338:
1339: =cut
1340:
1341: sub NewClient {
1342: Debug(6, "NewClient");
1343: my $event = shift; # Get the event parameters.
1344: my $watcher = $event->w;
1345: my $socket = $watcher->fd; # Get the event' socket.
1346: my $connection = $socket->accept(); # Accept the client connection.
1347: Debug(8,"Connection request accepted from "
1348: .GetPeername($connection, AF_UNIX));
1349:
1350:
1351: my $description = sprintf("Connection to lonc client %d",
1352: $ClientConnection);
1353: Debug(9, "Creating event named: ".$description);
1354: Event->io(cb => \&ClientRequest,
1355: poll => 'r',
1356: desc => $description,
1357: data => "",
1358: fd => $connection);
1359: $ActiveClients{$connection} = $ClientConnection;
1360: $ClientConnection++;
1361: }
1362:
1363: =pod
1364:
1365: =head2 GetLoncSocketPath
1366:
1367: Returns the name of the UNIX socket on which to listen for client
1368: connections.
1369:
1370: =cut
1371:
1372: sub GetLoncSocketPath {
1373: return $UnixSocketDir."/".GetServerHost();
1374: }
1375:
1376: =pod
1377:
1378: =head2 GetServerHost
1379:
1380: Returns the host whose lond we talk with.
1381:
1382: =cut
1383:
1384: sub GetServerHost {
1385: return $RemoteHost; # Setup by the fork.
1386: }
1387:
1388: =pod
1389:
1390: =head2 GetServerPort
1391:
1392: Returns the lond port number.
1393:
1394: =cut
1395:
1396: sub GetServerPort {
1397: return $perlvar{londPort};
1398: }
1399:
1400: =pod
1401:
1402: =head2 SetupLoncListener
1403:
1404: Setup a lonc listener event. The event is called when the socket
1405: becomes readable.. that corresponds to the receipt of a new
1406: connection. The event handler established will accept the connection
1407: (creating a communcations channel), that int turn will establish
1408: another event handler to subess requests.
1409:
1410: =cut
1411:
1412: sub SetupLoncListener {
1413:
1414: my $socket;
1415: my $SocketName = GetLoncSocketPath();
1416: unlink($SocketName);
1417: unless ($socket =IO::Socket::UNIX->new(Local => $SocketName,
1418: Listen => 10,
1419: Type => SOCK_STREAM)) {
1420: die "Failed to create a lonc listner socket";
1421: }
1422: Event->io(cb => \&NewClient,
1423: poll => 'r',
1424: desc => 'Lonc listener Unix Socket',
1425: fd => $socket);
1426: }
1427:
1428: =pod
1429:
1430: =head2 ChildStatus
1431:
1432: Child USR1 signal handler to report the most recent status
1433: into the status file.
1434:
1435: We also use this to reset the retries count in order to allow the
1436: client to retry connections with a previously dead server.
1437: =cut
1438: sub ChildStatus {
1439: my $event = shift;
1440: my $watcher = $event->w;
1441:
1442: Debug(2, "Reporting child status because : ".$watcher->data);
1443: my $docdir = $perlvar{'lonDocRoot'};
1444: my $fh = IO::File->new(">>$docdir/lon-status/loncstatus.txt");
1445: print $fh $$."\t".$RemoteHost."\t".$Status."\t".
1446: $RecentLogEntry."\n";
1447: $ConnectionRetriesLeft = $ConnectionRetries;
1448: }
1449:
1450: =pod
1451:
1452: =head2 SignalledToDeath
1453:
1454: Called in response to a signal that causes a chid process to die.
1455:
1456: =cut
1457:
1458:
1459: sub SignalledToDeath {
1460: my $event = shift;
1461: my $watcher= $event->w;
1462:
1463: Debug(2,"Signalled to death! via ".$watcher->data);
1464: my ($signal) = $watcher->data;
1465: chomp($signal);
1466: Log("CRITICAL", "Abnormal exit. Child $$ for $RemoteHost "
1467: ."died through "."\"$signal\"");
1468: LogPerm("F:lonc: $$ on $RemoteHost signalled to death: "
1469: ."\"$signal\"");
1470: exit 0;
1471:
1472: }
1473:
1474: =head2 ToggleDebug
1475:
1476: This sub toggles trace debugging on and off.
1477:
1478: =cut
1479:
1480: sub ToggleDebug {
1481: my $Current = $DebugLevel;
1482: $DebugLevel = $NextDebugLevel;
1483: $NextDebugLevel = $Current;
1484:
1485: Log("SUCCESS", "New debugging level for $RemoteHost now $DebugLevel");
1486:
1487: }
1488:
1489: =head2 ChildProcess
1490:
1491: This sub implements a child process for a single lonc daemon.
1492:
1493: =cut
1494:
1495: sub ChildProcess {
1496:
1497:
1498: #
1499: # Signals must be handled by the Event framework...
1500: #
1501:
1502: Event->signal(signal => "QUIT",
1503: cb => \&SignalledToDeath,
1504: data => "QUIT");
1505: Event->signal(signal => "HUP",
1506: cb => \&ChildStatus,
1507: data => "HUP");
1508: Event->signal(signal => "USR1",
1509: cb => \&ChildStatus,
1510: data => "USR1");
1511: Event->signal(signal => "INT",
1512: cb => \&ToggleDebug,
1513: data => "INT");
1514:
1515: SetupTimer();
1516:
1517: SetupLoncListener();
1518:
1519: $Event::Debuglevel = $DebugLevel;
1520:
1521: Debug(9, "Making initial lond connection for ".$RemoteHost);
1522:
1523: # Setup the initial server connection:
1524:
1525: # &MakeLondConnection(); // let first work requirest do it.
1526:
1527:
1528: Debug(9,"Entering event loop");
1529: my $ret = Event::loop(); # Start the main event loop.
1530:
1531:
1532: die "Main event loop exited!!!";
1533: }
1534:
1535: # Create a new child for host passed in:
1536:
1537: sub CreateChild {
1538: my $sigset = POSIX::SigSet->new(SIGINT);
1539: sigprocmask(SIG_BLOCK, $sigset);
1540: my $host = shift;
1541: $RemoteHost = $host;
1542: Log("CRITICAL", "Forking server for ".$host);
1543: my $pid = fork;
1544: if($pid) { # Parent
1545: $RemoteHost = "Parent";
1546: $ChildHash{$pid} = $host;
1547: $HostToPid{$host}= $pid;
1548: sigprocmask(SIG_UNBLOCK, $sigset);
1549:
1550: } else { # child.
1551: ShowStatus("Connected to ".$RemoteHost);
1552: $SIG{INT} = 'DEFAULT';
1553: sigprocmask(SIG_UNBLOCK, $sigset);
1554: ChildProcess; # Does not return.
1555: }
1556:
1557: }
1558: #
1559: # Parent process logic pass 1:
1560: # For each entry in the hosts table, we will
1561: # fork off an instance of ChildProcess to service the transactions
1562: # to that host. Each pid will be entered in a global hash
1563: # with the value of the key, the host.
1564: # The parent will then enter a loop to wait for process exits.
1565: # Each exit gets logged and the child gets restarted.
1566: #
1567:
1568: #
1569: # Fork and start in new session so hang-up isn't going to
1570: # happen without intent.
1571: #
1572:
1573:
1574:
1575:
1576:
1577:
1578: ShowStatus("Forming new session");
1579: my $childpid = fork;
1580: if ($childpid != 0) {
1581: sleep 4; # Give child a chacne to break to
1582: exit 0; # a new sesion.
1583: }
1584: #
1585: # Write my pid into the pid file so I can be located
1586: #
1587:
1588: ShowStatus("Parent writing pid file:");
1589: my $execdir = $perlvar{'lonDaemons'};
1590: open (PIDSAVE, ">$execdir/logs/lonc.pid");
1591: print PIDSAVE "$$\n";
1592: close(PIDSAVE);
1593:
1594:
1595:
1596: if (POSIX::setsid() < 0) {
1597: print "Could not create new session\n";
1598: exit -1;
1599: }
1600:
1601: ShowStatus("Forking node servers");
1602:
1603: Log("CRITICAL", "--------------- Starting children ---------------");
1604:
1605: LondConnection::ReadConfig; # Read standard config files.
1606: my $HostIterator = LondConnection::GetHostIterator;
1607: while (! $HostIterator->end()) {
1608:
1609: my $hostentryref = $HostIterator->get();
1610: CreateChild($hostentryref->[0]);
1611: $HostHash{$hostentryref->[0]} = $hostentryref->[4];
1612: $HostIterator->next();
1613: }
1614: $RemoteHost = "Parent Server";
1615:
1616: # Maintain the population:
1617:
1618: ShowStatus("Parent keeping the flock");
1619:
1620: #
1621: # Set up parent signals:
1622: #
1623:
1624: $SIG{INT} = \&Terminate;
1625: $SIG{TERM} = \&Terminate;
1626: $SIG{HUP} = \&Restart;
1627: $SIG{USR1} = \&CheckKids;
1628: $SIG{USR2} = \&UpdateKids; # LonManage update request.
1629:
1630: while(1) {
1631: my $deadchild = wait();
1632: if(exists $ChildHash{$deadchild}) { # need to restart.
1633: my $deadhost = $ChildHash{$deadchild};
1634: delete($HostToPid{$deadhost});
1635: delete($ChildHash{$deadchild});
1636: Log("WARNING","Lost child pid= ".$deadchild.
1637: "Connected to host ".$deadhost);
1638: Log("INFO", "Restarting child procesing ".$deadhost);
1639: CreateChild($deadhost);
1640: }
1641: }
1642:
1643:
1644:
1645: =pod
1646:
1647: =head1 CheckKids
1648:
1649: Since kids do not die as easily in this implementation
1650: as the previous one, there is no need to restart the
1651: dead ones (all dead kids get restarted when they die!!)
1652: The only thing this function does is to pass USR1 to the
1653: kids so that they report their status.
1654:
1655: =cut
1656:
1657: sub CheckKids {
1658: Debug(2, "Checking status of children");
1659: my $docdir = $perlvar{'lonDocRoot'};
1660: my $fh = IO::File->new(">$docdir/lon-status/loncstatus.txt");
1661: my $now=time;
1662: my $local=localtime($now);
1663: print $fh "LONC status $local - parent $$ \n\n";
1664: foreach my $pid (keys %ChildHash) {
1665: Debug(2, "Sending USR1 -> $pid");
1666: kill 'USR1' => $pid; # Tell Child to report status.
1667: sleep 1; # Wait so file doesn't intermix.
1668: }
1669: }
1670:
1671: =pod
1672:
1673: =head1 UpdateKids
1674:
1675: parent's SIGUSR2 handler. This handler:
1676:
1677: =item
1678:
1679: Rereads the hosts file.
1680:
1681: =item
1682:
1683: Kills off (via sigint) children for hosts that have disappeared.
1684:
1685: =item
1686:
1687: QUITs children for hosts that already exist (this just forces a status display
1688: and resets the connection retry count for that host.
1689:
1690: =item
1691:
1692: Starts new children for hosts that have been added to the hosts.tab file since
1693: the start of the master program and maintains them.
1694:
1695: =cut
1696:
1697: sub UpdateKids {
1698:
1699: Log("INFO", "Updating connections via SIGUSR2");
1700:
1701: # Just in case we need to kill our own lonc, we wait a few seconds to
1702: # give it a chance to receive and relay lond's response to the
1703: # re-init command.
1704: #
1705:
1706: sleep(2); # Wait a couple of seconds.
1707:
1708: my %hosts; # Indexed by loncapa hostname, value=ip.
1709:
1710: # Need to re-read the host table:
1711:
1712:
1713: LondConnection::ReadConfig();
1714: my $I = LondConnection::GetHostIterator;
1715: while (! $I->end()) {
1716: my $item = $I->get();
1717: $hosts{$item->[0]} = $item->[4];
1718: $I->next();
1719: }
1720:
1721: # The logic below is written for clarity not for efficiency.
1722: # Since I anticipate that this function is only rarely called, that's
1723: # appropriate. There are certainly ways to combine the loops below,
1724: # and anyone wishing to obscure the logic is welcome to go for it.
1725: # Note that we don't re-direct sigchild. Instead we do what's needed
1726: # to the data structures that keep track of children to ensure that
1727: # when sigchild is honored, no new child is born.
1728: #
1729:
1730: # For each existing child; if it's host doesn't exist, kill the child.
1731:
1732: foreach my $child (keys %ChildHash) {
1733: my $oldhost = $ChildHash{$child};
1734: if (!(exists $hosts{$oldhost})) {
1735: Log("CRITICAL", "Killing child for $oldhost host no longer exists");
1736: delete $ChildHash{$child};
1737: delete $HostToPid{$oldhost};
1738: kill 'QUIT' => $child;
1739: }
1740: }
1741: # For each remaining existing child; if it's host's ip has changed,
1742: # Restart the child on the new IP.
1743:
1744: foreach my $child (keys %ChildHash) {
1745: my $oldhost = $ChildHash{$child};
1746: my $oldip = $HostHash{$oldhost};
1747: if ($hosts{$oldhost} ne $oldip) {
1748:
1749: # kill the old child.
1750:
1751: Log("CRITICAL", "Killing child for $oldhost host ip has changed...");
1752: delete $ChildHash{$child};
1753: delete $HostToPid{$oldhost};
1754: kill 'QUIT' => $child;
1755:
1756: # Do the book-keeping needed to start a new child on the
1757: # new ip.
1758:
1759: $HostHash{$oldhost} = $hosts{$oldhost};
1760: CreateChild($oldhost);
1761: }
1762: }
1763: # Finally, for each new host, not in the host hash, create a
1764: # enter the host and create a new child.
1765: # Force a status display of any existing process.
1766:
1767: foreach my $host (keys %hosts) {
1768: if(!(exists $HostHash{$host})) {
1769: Log("INFO", "New host $host discovered in hosts.tab...");
1770: $HostHash{$host} = $hosts{$host};
1771: CreateChild($host);
1772: } else {
1773: kill 'HUP' => $HostToPid{$host}; # status display.
1774: }
1775: }
1776: }
1777:
1778:
1779: =pod
1780:
1781: =head1 Restart
1782:
1783: Signal handler for HUP... all children are killed and
1784: we self restart. This is an el-cheapo way to re read
1785: the config file.
1786:
1787: =cut
1788:
1789: sub Restart {
1790: &KillThemAll; # First kill all the children.
1791: Log("CRITICAL", "Restarting");
1792: my $execdir = $perlvar{'lonDaemons'};
1793: unlink("$execdir/logs/lonc.pid");
1794: exec("$execdir/loncnew");
1795: }
1796:
1797: =pod
1798:
1799: =head1 KillThemAll
1800:
1801: Signal handler that kills all children by sending them a
1802: SIGHUP. Responds to sigint and sigterm.
1803:
1804: =cut
1805:
1806: sub KillThemAll {
1807: Debug(2, "Kill them all!!");
1808: local($SIG{CHLD}) = 'IGNORE'; # Our children >will< die.
1809: foreach my $pid (keys %ChildHash) {
1810: my $serving = $ChildHash{$pid};
1811: Debug(2, "Killing lonc for $serving pid = $pid");
1812: ShowStatus("Killing lonc for $serving pid = $pid");
1813: Log("CRITICAL", "Killing lonc for $serving pid = $pid");
1814: kill 'QUIT' => $pid;
1815: delete($ChildHash{$pid});
1816: }
1817: my $execdir = $perlvar{'lonDaemons'};
1818: unlink("$execdir/logs/lonc.pid");
1819:
1820: }
1821:
1822: =pod
1823:
1824: =head1 Terminate
1825:
1826: Terminate the system.
1827:
1828: =cut
1829:
1830: sub Terminate {
1831: KillThemAll;
1832: Log("CRITICAL","Master process exiting");
1833: exit 0;
1834:
1835: }
1836: =pod
1837:
1838: =head1 Theory
1839:
1840: The event class is used to build this as a single process with an
1841: event driven model. The following events are handled:
1842:
1843: =item UNIX Socket connection Received
1844:
1845: =item Request data arrives on UNIX data transfer socket.
1846:
1847: =item lond connection becomes writable.
1848:
1849: =item timer fires at 1 second intervals.
1850:
1851: All sockets are run in non-blocking mode. Timeouts managed by the timer
1852: handler prevents hung connections.
1853:
1854: Key data structures:
1855:
1856: =item RequestQueue
1857:
1858: A queue of requests received from UNIX sockets that are
1859: waiting for a chance to be forwarded on a lond connection socket.
1860:
1861: =item ActiveConnections
1862:
1863: A hash of lond connections that have transactions in process that are
1864: available to be timed out.
1865:
1866: =item ActiveTransactions
1867:
1868: A hash indexed by lond connections that contain the client reply
1869: socket for each connection that has an active transaction on it.
1870:
1871: =item IdleConnections
1872:
1873: A hash of lond connections that have no work to do. These connections
1874: can be closed if they are idle for a long enough time.
1875:
1876: =cut
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>