Search This Blog

Wednesday, 31 December 2014

Encoding, encryption and hashing basics with Perl examples

Encoding, encryption and hashing are about data transformation. While encoding and encryption are reversible (given the output and some additional information, the input can be obtained), hashing is a one way, irreversible. change.

                                   processing                                         purpose
          ------------------------------------------------------------------------------------------------------------
          Encoding:     data + algorithm                     => for transfer and consumation purposes
          Encryption:  data + algorithm + secret key  => for secure transfer
          Hashing:       data + hash function               => for data integrity, for sender identification, fast search for a data record, building caches

 

Encoding


Encoding uses a publicly available algorithm and a reverse process can be used to get at the original data. Its purpose is to provide a format that allows data to be transferred and consumed by a different system. It does not solve security issues. Examples are base64, unicode, URL/HTML encoding etc.

Example:

A)

my $ip         = '192.168.2.100';
my $ip_encoded = unpack('N', pack('C4', split(/\D/, $ip, 4))); 

(pack creates a string concatenated from 4 unsigned chars/octets (in machine form) of the IP address. unpack transforms this string into an unsigned long (32 bit) in "network" (big endian) order.)

B)
Base64 encoding (radix-64 representation) is widely used to represent binary data in text format, used for data transport.

CGI script:

use strict;
use warnings;

use CGI;
use JSON;
use MIME::Base64;

my $cgi = CGI->new();

my $image_file = "/var/www/images/sunflower.jpg";

undef $/;
open  my $fhandle, '<', $image_file;
my $image = <$fhandle>;

$image_encoded = encode_base64($image);
my $response = to_json({ photo => $image_encoded });

print $cgi->header(-Content_type => 'application/json');
print $response 
 
Then on the web page: 

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <title>Encode64 Example</title>
    <script src="jquery-1.2.1.min.js" type="text/javascript" charset="utf-8"></script>
  </head>
  <body>
    <p>Encode64 Example</p>
    <script>
       $(document).ready(function () {

            $.getJSON('/test_my_app',  function(data) {
               var photo = data.photo;      // contains the encoded contents of the image 

               var elems  = [];

               elems.push('<table>');
               elems.push('<tr>');

               elems.push('<td id="photo">' +  
                          '<img src="data:image/jpg;base64, ' + photo + '" />' + 
                          '</td>');

               elems.push('</tr>'); 
               elems.push('</table>');

               $('body').append(elems.join(''));
           });

       });
    </script>
  </body>
</html>  

 

Encryption


While an encrypted message can be decrypted to provide the original plain text, decryption can only be carried out by those who know a key, that was used together with a particular algorithm during the encryption. Encrypted data can be, therefore, securely transferred to the desired recipient.

Example


use strict;
use warnings;
use utf8;
use v5.018;
use Crypt::CBC;

my $plaintext  = "World's biggest secret";
my $cipher     = Crypt::CBC->new( -key    => 'my_secret_key',
                                  -cipher => 'Blowfish',
                               );  

my $ciphertext = $cipher->encrypt($plaintext);
$plaintext     = $cipher->decrypt($ciphertext);

say "$plaintext";

Hashing


Hashing transforms an arbitrary string of characters into a fixed-length string through the use of a hash function. The resulting hash value/hash sum/hash/message digest should be be always the same for the same input, and it is should be practically impossible, ie computationally infeasible, to invert/find the original value from the hash. A hashing function should never produce the same hash for two different inputs. Hashing is used for checking data integrity, sender's identity etc.

Perl modules:
               Digest::SHA
               Crypt::PBKDF2 etc

Examples


use Digest::SHA qw(sha256_hex);
$data = "this is top secret";

$digest = sha256_hex($data);


use Crypt::PBKDF2;

my $pbkdf2 = Crypt::PBKDF2->new(
                  hash_class => 'HMACSHA1' # default, other functions available
                  iterations => 120000,    
                  output_len => 25,        
                  salt_len   => 4,           
);

my $hash = $pbkdf2->generate("my_biggest_secret");
# store in database

# later on, somebody logs in
# retrieve the person's hash from the database
# check the password
if ($pbkdf2->validate($hash, $password)) { .... }

PSGI and Plack Basics

Background

Web applications run within the context of a web server. Web client (browser, web crawler, a script making API calls etc) sends a request to a web server (Apache, Nginx, Starman etc) that the server either maps to a file system resource (for static pages) or, in case of dynamic pages, dispatches, according to the server configuration, to the relevant web application.

The purpose of the PSGI specification is to help create portable web applications, applications that are independent of the web server environment they are run in, and can be, therefore run with ease under different web servers.

A PSGI compatible web application is a code reference, that accepts input and provides output  in a prescribed format.

PSGI Middleware is a PSGI application, that can run another PSGI (web) application and can do preprocessing of an HTTP request and/or postprocessing after HTTP response is received. It is a wrapper around the web application/framework and it sits between the web server and the application/framework. From the server perspective, the middleware component is a web application and from the web application perspective, it is a web server.

PSCI Specification

PSGI application accepts input in the form of a hash reference with CGI like header information and psgi.xxx/psgix.yyy keys. The prescribed format of its output is:
  1. array reference containing three elements:
    1. HTTP status
    2. arrayref with HTTP headers
    3. response body:
      1. arrayref with the response content
      2. a handle (Perl built-in filehandle or an IO::Handle-like object) containing the response body as byte strings 
  2. code reference for a delayed/streaming response

    Example of a hashref input providing information about the environment:

    $env = {
              'SERVER_PROTOCOL' => 'HTTP/1.1',
              'HTTP_ACCEPT_LANGUAGE' => 'en-US,en;q=0.5',
              'psgi.input' => \*{'HTTP::Server::PSGI::$input'},
              'psgi.errors' => *::STDERR,
              'psgix.io' => bless( \*Symbol::GEN1, 'IO::Socket::INET' ),
              'HTTP_ACCEPT' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
              'psgi.version' => [
                                  1,
                                  1
                                ],
              'HTTP_USER_AGENT' => 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0',
              'REMOTE_PORT' => 45541,
              'SCRIPT_NAME' => '',
              'REQUEST_URI' => '/',
              'psgix.input.buffered' => 1,
              'PATH_INFO' => '/',
              'psgi.run_once' => '',
              'REQUEST_METHOD' => 'GET',
              'HTTP_ACCEPT_ENCODING' => 'gzip, deflate',
              'psgi.url_scheme' => 'http',
              'psgi.multithread' => '',
              'psgi.streaming' => 1,
              'REMOTE_ADDR' => '127.0.0.1',
              'psgi.multiprocess' => '',
              'QUERY_STRING' => '',
              'SERVER_PORT' => 5000,
              'SERVER_NAME' => 'localhost',
              'HTTP_HOST' => '127.0.0.1:5000',
              'psgi.nonblocking' => '',
              'HTTP_CONNECTION' => 'keep-alive',
              'psgix.harakiri' => 1
            };


    Example of output with an arrayref body:

    $response = [
         200,
         ['Content-Type' => 'text/html'],
         ['<h2>Hello World</h2>', 'How are you today?' ]
    ]

    Example of output with a handle body:

    $body = new IO::File $file, "r"
    $body = MyClass->new                        # must implement read(), possibly seek()

    $response = [
         200,
         ['Content-Type' => 'text/plain'],
        $body,
    ]

    Plack

    Plack is a Perl module, a toolkit for using the PSGI stack: middleware, helpers and adapters to web servers. Standard adapters included in the module are for CGI, FCGI, Apache 1 and 2 and HTTP::Server::Simple. Others can be found on CPAN. Plack comes with its own standalone server HTTP::Server::PSGI.

    Utilities provided by Plack are aimed at web server, middleware and web framework authors. While it is possible to write PSGI web applications, the recommended way is to build web applications on top of web frameworks. Catalyst, Dancer, Mojolicious are some of the web frameworks supporting PSGI. PSGI enabled web frameworks have an adaptor/engine (middleware component) conforming to the PSGI specification.

    Examples

    An example of running a PSGI application in three different ways:

     

    1. running the application using the built-in Plack server
    2. running the application using the Apache server
    3. running the application using the built-in Plack server with Apache as a proxy server 

    Simple PSGI application


     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    #!/usr/bin/perl
    use strict;
    use warnings;
    
    use JSON qw(to_json);
    
    my $data = {
            data => {
                    message1 => 'Hello world',
                    message2 => 'Hello family',
                    message3 => 'Hello son',
            },
    };
    
    my $serialized = to_json($data);
    
    my $app = sub {
      my ($env) = @_;              # PSGI input, not used in this simple app
    
      return [                     # PSGI output
        '200',
        [ 'Content-Type' => 'application/json' ],
        [ $serialized ],
      ];
    };
    

     

    More complex example


     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    #!/usr/bin/perl
    
    =head2 plack_app1.pl
    
    run: plackup -o localhost -p 8000 plack_app1.pl
    
    =cut
    
    use strict;
    use warnings;
    use v5.018;
    use utf8;
    
    #use Data::Dumper qw(Dumper);
    #$Data::Dumper::Sortkeys = 1;
     
    use Plack::Request;
    use CHI;
    
    =head3 Cache messages for later display together with the latest message
    
    messages are cached on the file system, so will survive the server restart
    the storage is limited to 4kb
    
    =cut
    
    my $cache = CHI->new( driver => 'FastMmap',
        root_dir   => '/tmp/plack_app1_cache',
        cache_size => '4k');
    my $cache_key = "${0}_message";
    $cache->remove($cache_key);
     
    =head3 Plackup subroutine reference
    
    input:  hashref containing request information
    output: arrayref of the prescribed format
    
    =cut
    
    my $app = sub {
        my $env = shift;
    
        # Create the HTML form where a message can be input
        my $html    = _create_form();
    
        # Wraps the $env in a Plack request object 
        my $request = Plack::Request->new($env);
         
        # Recover old messages for display, store them all
        my $old_messages = $cache->get($cache_key) || '';
    
        # The request is a sent form with filled in message field
        if ($request->param('message')) {
    
            say "[$request->param('message')]";
    
            my $current_message = $request->param('message');
            my $message = $old_messages . "<br />$current_message";
            $message = (length $message > 2000) ? $current_message : $message;
            $cache->set($cache_key, $message);
    
            # Append the messages to the form html
            $message = $old_messages . "<br /><b>$current_message</b>";
            $html  .= "You told me:<br />$message";
        }
        elsif ($old_messages) {
            $html  .= "Old messages: <br />$old_messages";
        }
     
        return [
                '200',
                [ 'Content-Type' => 'text/html' ],
                [ $html ],
            ];
    };
    
    =head2 SUBROUTINES
    
    =head3 _create_form
    
    =cut
     
    sub _create_form {
        return q{
        <hr>
        <form>
            <input name="message">
            <input type="submit" value="Tell me">
        </form>
        <hr>
        }
    }
    
    Inspired by Dave Cross's Plack application example

    A) Built-in web server


    plackup  /path/to/my/my_app.psgi

    B) Apache 2 - using directly as a web server


    In /etc/apache2/sites-enabled/psgi_apps.conf:

    <VirtualHost *:4000>
            LogLevel debug
            ErrorLog ${APACHE_LOG_DIR}/error_psgi.log
            CustomLog ${APACHE_LOG_DIR}/access_psgi.log combined
    
            PerlOptions +Parent        
            
            <Location /test_my_app>          
              SetHandler perl-script
              PerlResponseHandler Plack::Handler::Apache2    # PSGI adaptor for apache
              PerlSetVar psgi_app  /var/www/apps/my_app.psgi
            </Location>
    
    </VirtualHost>
    

    C) Apache 2 - using as a proxy for the  built-in Plack web server


    The application runs locally on port 5000, on which the built-in Plack web server is listening (http://localhost:5000/). Locally or externally (if configured) the application can be accessed through:
    • http://some_hostname/test_my_app
    • http://localhost:4000/test_my_app

    a) cd /var/www; plackup  --host localhost /my_app.psgi 
    HTTP::Server::PSGI: Accepting connections at http://localhost:5000/

    b) Apache configuration - In /etc/apache2/sites-enabled/psgi_apps.conf:

    <VirtualHost *:4000>
            LogLevel debug
            ErrorLog ${APACHE_LOG_DIR}/error_psgi.log
            CustomLog ${APACHE_LOG_DIR}/access_psgi.log combined
    
            PerlOptions +Parent
               
            ## Proxying an application
            ##------------------------ 
            ProxyPass /test_my_app http://localhost:5000/
            ProxyPassReverse /test_my_app http://localhost:5000/
            
            <Location /test_my_app>
              Require all granted       # Apache 2.4
                      or
              Order allow,deny          # Apache 2.2
              Allow from all
    
            </Location>
    
    </VirtualHost>
    


     

    Note


    For the proxying to work, several prerequisites need to be satisfied:
    1. Apache proxy and proxy_http modules must be loaded 
      1. on Ubuntu:  a2enmod proxy proxy_http
    2. Uncomment the <proxy *> block in mods-enabled/proxy.conf
    3. Create a new site configuration sites-available/psgi_apps.conf and enable the configuration (on Ubuntu: a2ensite psgi_apps)
    4. In sites-available/psgi_apps.conf: allow access to the proxied location if the proxy configuration by default disallows access


    Sunday, 14 December 2014

    Data Journey - HTTP, TCP, IP Protocol basics

    Background

    All the data sent here and there on the Internet - how does it work? The basis is synchronization of data transfer through an agreed upon procedure, ie adhering to a protocol of communication. There are different types of protocols. Machines/hosts/nodes communicate using lower level protocols, applications running on machines communicate using higher level protocols.

    Communication happens between endpoints/sockets. Endpoints are entry points to a connection/process/service. Protocols are agreed upon rules, describing the format the communicated information, procedures that need to be followed.

    Communication on the Internet depends on the Internet Protocol suite, a set of communication protocols making it possible to send bytes/octets between two networked computers, even if they are miles apart on different networks. Its alias is TCP/IP because the TCP (Transmission Control Protocol) and IP (Internet Protocol) protocols were formulated first.

    How it works

    We want to send a message from a browser on our local machine to a web application running a remote server, something we wrote in a form on a web page.

    The remote web server is listening for HTTP requests on a socket described by the local IP address and a particular port. It directs requests for a dynamic resource to a web application. The client, sending the request, also creates a socket (web server IP address + the port on which the server is listening), through which it can now communicate with the web server. After establishing the physical connection, a several-step handshake follows , before data can be sent/received.

    Browser and web server are applications communicating using the HTTP protocol. That way they know in what format they want receive the data, if they can deal with compression, whether it is possible to use a cached resource/page or need to retrieve it, etc. Browser will send the actual information we want to be sent, alongside with HTTP mandatory and optional headers.

    How does the data travel to its destination? Thanks to the TCP, the sending application, browser in our case, does not need to worry about bytes and octets, but can send the whole message in one go, and let TCP tackle the problem.

    TCP provides connection oriented, ie reliable transfer with error checking. It guarantees delivery but is not necessarily timely. It controls the data flow to avoid overwhelming the receiver, and network congestion, a situation when no or little data transfer is happening. When transfer reliability is not crucial, reduced latency (transfer delay) can be achieved with UDP, the connectionless User Datagram Protocol. While using TCP is important in e-commerce, for instance, UDP is used when streaming films, VOIP etc.

    The message is divided into small pieces, each, a sequence of octets/bytes, each of which is then encapsulated with additional data (in a header/footer). The encapsulation - headers + payload - is called a packet or a datagram, and is a basic transfer unit. The headers (they gradually accumulate, as a protocol in each layer in the Internet Protocol suite, adds its own), contain, in the end, all the information needed to get the data across from one endpoint to the other one.  The TCP header holds information needed for reassembly of the message from individual packets (local and remote ports, sequence number etc).


     

    http://books.msspace.net/mirrorbooks/snortids/0596006616/snortids-CHP-2-SECT-2.html

    TCP operations have three phases. The first is about creating a connection using a multi-step handshake, to establish a reliable connection. A TCP connection is managed by the operating system through socket API (application programming interface) (Inter-process Communication). After that, the data transfer phase happens, followed by closure of the connection and release of resources.

    IP protocol  deals with the actual packet transfer across different network boundaries, ie with routing. It prescribes the format of its associated header containing IP addresses of the local and remote hosts and other routing data.

    When the collection of packets representing our message, arrives at the destination endpoint, they are reassembled to form the original data, according to the meta data in the TCP/IP/HTTP headers. The destination application receives the whole message instead of a bundle of little payloads. Impressive!

    How to connect to a LAN machine from the Internet

    Our gate to the Internet is a router, a networking device, to which our local machines connect, either through the Ethernet (cable) or WiFi (wireless). Router is connected to a modem that is plugged into the Ethernet port in the wall. After registering with an ISP/Internet Service Provider, we get a public/visible from the Internet, IP address, which will identify us when we make requests  to the Internet and through which responses/requests from outside our LAN/Local Area Network reach us. There are two types of router IP addresses: static and dynamic. Dynamic IP addresses, most common, change each time the router is restarted. Any configuration we make using the router IP address, needs to be updated after a router restart.

    Our local machines, connected to the router, form our Local Area Network/LAN. Each  machine is identified by a LAN IP address, which other local devices are aware of and use to communicate with each other. These IP addresses are usually something like 193.168.x.y or 10.x.y.z .

    Linux:

    tamara@fuego:~$ ifconfig
    eth0      Link encap:Ethernet  HWaddr  xxxxxxxxxxxxxxxx
    ......

    lo        Link encap:Local Loopback 
              inet addr:127.0.0.1  Mask:255.0.0.0
    ......

    wlan0     Link encap:Ethernet  HWaddr xxxxxxxxxxxxxxxx 
              inet addr:192.168.x.y  Bcast:192.168.x.z  Mask:255.255.z.w
              inet6 addr: yyyyyyyyyyyyyyyy Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:1453501 errors:0 dropped:0 overruns:0 frame:0
              TX packets:913979 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:1266612092 (1.2 GB)  TX bytes:186022548 (186.0 MB)


    LAN devices are hidden behind the router and cannot be reached through their LAN IP addresses from the outside. Firewall is generally in action for protection. Sometimes, however, we need to reach a LAN server to allow access to a locally running  a, eg. TCP/HTTP, application. This is achieved through Port Forwarding, making a particular port on a particular LAN machine (associated with a particular local application) reachable from the outside.

    Port Forwarding configuration can be done through the Router web admin page. Mine is 192.168.1.254. For BT Home Hub 2 router, the breadcrumb to the Port Forwarding configuration is:

              Settings -> Advanced settings -> Port Forwarding

    On the Port Forwarding page, there are several sections:
    Selecting one of the supported applications will open a port, which is a default for that application. Default ports are 80 for an HTTP server, 443 for an HTTPS server, 22 for ssh etc. Configuration involves choosing an application and associating it with the hostname, attached to the machine we want to set up the Port Forwarding for.

    To set up Port Forwarding for a local application running on a particular (non-default) port,  it is necessary to turn the UPnP to off. Doing that will add a new option under the Supported Applications tab. Through this option it is possible to add a local application, running on a specific port, to the Supported applications. Then one can set the Port Forwarding (under the Configuration tab) for the application. To access a Mojolicious application running under a development server on port 3000:

          http://router_IP_address:3000/mojo_test/

    Tuesday, 2 December 2014

    Controlling bubble machine with Arduino

    Required parts

    Arduino (UNO R3 used here)
    Bubble machine
    Relay board (I used one with two modules, but one module is enough)


    LED
    Resistor 1kΩ

    Optional:
    • prototyping shield
    • mini breadboard
        or 
    • normal breadboard

    These two parts are not required, one can use a normal breadboard, if preferred. The picture below not in proportion to the mini breadboard above.


    Disassembling the Bubble machine

    The Arduino sketch is dead simple. What I found trickier, was dealing with the hardware. Connecting everything together correctly, though, after I understood, how it all works, it is fairly simple too.
    1. Unscrew the back of the machine, to get to the wires. 
    2. The motor is connected to:
      1. the battery combo (negative change)
      2. the switch
    3. The battery combo is connected to:
      1. the motor
      2. the switch (positive charge)
    4. Disconnect both wires from the bubble machine switch

    Hardware connections

    Connecting the Bubble machine to the Relay module



    The Bubble machine wires will be connected to the relay board on the left side, Arduino on the right.
    On the left, there are 3 connections (from the top):
    • NO     ... Normally open (circuit)
    • COM  ... Power
    • NC     ... Normally closed (circuit)
    If we connect positively/+ charged battery wire to NC and the Motor wire to the COM, the motor will start, because the circuit is closed by default.
    We need to connect the battery wire to NO and the motor wire to COM.

    Connecting the Relay module to Arduino

     

    On the right side of the Relay board, there are several connections:
    • GRD ... Ground
    • VCC ... Power
    • IN1, IN2 etc ... Relay board output pins (each pin corresponds to one relay module)
    The Relay GRD will be connected to the Arduino GRD, VCC needs to be connected to Arduino V5 pin and one of the Relay output pins will get attached to whichever Arduino digital pin we like. This pin will be used to control the Bubble machine.

    NOTE

     

    Please note that the setting:
     
          digitalWrite(controlPin, HIGH);

    will NOT start the engine, because the default mode of the NO-COM connection is an open circuit. Rather counter-intuitively, we can start the engine with:

          digitalWrite(controlPin, LOW);

    which closes the circuit by switching off the default, ie "not connected"/"open" setting to the non-default/"closed" one.

    The Arduino Sketch

    /* Control the bubble machine (powered by batteries) bubble machine is switched on at start, then off and on again a LED switches on when the machine is off and vice versa
       */ int controlPin = 7; void setup() { pinMode(controlPin, OUTPUT); digitalWrite(controlPin, LOW); // turn the bubbles on,
    // the LED is off
    } // the loop routine runs over and over again forever: void loop() { digitalWrite(controlPin, HIGH); // turn the LED on and bubbles off delay(6000); // wait for 6 seconds digitalWrite(controlPin, LOW); // turn the LED off and bubbles on delay(6000); // wait for 6 second }
     
    The effect is that the Bubble machine, as soon as we power up the Arduino, will start and go on for 6 seconds. Then it will stop and the LED will light up for 6 seconds. Then the machine will be powered up again and the LED will be switched off for 6 seconds and so on and so on ...

    Video of the inside of the bubble machine:


    Bubbling away (with some extra LEDs for a bit more fun):

    
    

    Perl Unit Testing problem - Tests out of sequence

    Problem

     

    /opt/perl/bin/prove -v -I/opt t/Basket_delivery_estimates.t
    t/Basket_GetDeliveryEstimates.t 


    ok 1 - Pick up the basket ....
    [2014-12-12 14:33:52] [christmas_promise_01] [1][2014-12-03 14:33:52] [2014-12-04 14:33:52] [2014-12-03 14:33:52] [2014-12-04 14:33:52]ok 2 - Achievable delivery deadline Shopping.Basket.GetProcessingTimes
    ok 3 - no warnings
    1..3


    Test Summary Report
    -------------------
    t/Basket_delivery_estimates.t (Wstat: 0 Tests: 2 Failed: 0)
    Parse errors: Tests out of sequence.  Found (3) but expected (2)
                    Bad plan.  You planned 3 tests but ran 2.
    Files=1, Tests=2, 17 wallclock secs ( 0.04 usr  0.01 sys +  6.42 cusr  0.67 csys =  7.14 CPU)
    Result: FAIL


    Solution

     

    The unit test file contains:

           use Test::More qw(no_plan);

    so the number of tests will be calculated dynamically as the tests are run. No problem here.
    Looking at the output we can see three tests are reported: ok1, ok2 and ok3. However, the final conclusion is:

          Bad plan.  You planned 3 tests but ran 2

    Why, oh why?
    The reason is that the TAP parser, processing the test results, actually did not "see" the ok2. TAP, as a protocol, expects the output of the tests in a prescribed format, so it can be parsed to produce test stats. My mistake was, I did not put "\n" or use say, when printing out some debug data. This caused the output ok2 NOT to be at the beginning of the line, where the TAP parser expected it, hence missing result of one test. The parser was aware of the test being run (a test method from Test::More), but could not see the test output - discrepancy. Making sure all test results (ok....) appeared at the beginning of the line fixed the problem.

    Saturday, 29 November 2014

    Inter-process Communication for the Half-Initiated (Perl biased)

    Communication is, ultimately, about sending and receiving messages. People communicate using messages, so do devices/hardware and applications/software.

    An application is a dead beetle unless it's alive when running. A process is a running instance of an application. (For our purposes, I shall be using the words application and process interchangeably.)  For processes to be able to communicate, there needs to be a point through which processes can reach to one another. Similar to an open door through which somebody can come and ask/inform/demand something.

    This endpoint is called a socket, which is bound to (associated with) a port (a software construct, whose purpose is to uniquely identify different applications running on a single computer). The Operating System provides an API (Application Programming interface) - an established way, processes can use to manipulate the sockets, ie use them for communication.

    There are two types of sockets; one used for local inter-process communication within an Operating System/a single machine (Unix domain socket); the other is Network socket, used for communication of applications distributed over a network. Network sockets are particularly of interest to us. They are uniquely described by :
    1. a combination of local IP address and a port number
    2. Protocol (TCP (reliable, connection oriented)/UDP (unreliable, connectionless, faster)
    3. remote IP address and remote port number (only for established TCP sockets: one local TCP socket can be used by multiple remote TCP clients, each with its own IP address and port)
    Processes can use different ways how to communicate with each other. One decisive factor is whether the two processes, that need to talk or share data, run locally, ie on the same machine, or whether they are remote, ie distributed over different machines.
     
    Processes running on the same machine can communicate through:
    • files
    • pipes
    • signals
    • locally shared memory
    • local database
    on top of methods used by

    Processes running on distributed machines:
    • remote database
    • memcachedb
    • memcached
    • web services
    • memory queues

    File

    use CHI;
    my $cache = CHI->new( driver   => 'BerkeleyDB',
                          root_dir => '/path/to/cache'

     

    Memory based 


    use CHI;
    my $cache = CHI->new(
        driver => 'SharedMem',
        size   => 10 * 1024,
        shmkey => 'UniqueNamespace', # This namespace will be used by processes
                                     # wanting to communicate. All caches 
                                     # (in different processes) with this shmkey 
                                     # will be shared
    );  

    Database

    Multiple processes access the database to perform read/create/update/delete (CRUD operations). Database engines allow table or row locking to prevent inconsistencies and corruption of data.

     

    memcachedb

    Distributed database storage.


    memcached 

    Distributed memory caching. From application's perpective, the same as in-process memory caching. The difference compared to in-process memory caching is, that memcached will decide on which of the memcached servers to store the value and will know from where to retrieve it later.

     

    my $cache = CHI->new( driver  => 'Memcached::libmemcached',
                          servers => [ "192.168.1.150:11211",
                                       "192.168.1.150:11212",
                                       "192.168.1.151:11211", 
                                       "192.168.1.151:11212",],
                          l1_cache => { driver   => 'FastMmap'
                                        root_dir => '/path/to/cache' }
    );

     

    Web services

    • RPC (Remote Procedure Call)
                  allows to call a method implemented in an application
                  running on a remote system. RPC protocol can use either
                  XML or JSON format for data transfer and uses HTTP as its
                  data transport mechanism (lets applications use HTTP to
                  make connection, but uses its own RPC protocol to interpret
                  the request and the response). A request is sent to a
                  server/application implementing the XML-RPC protocol.

    • REST (Representational State Transfer)
                   an architectural approach, rather than a protocol,
                   a prescribed way of implementation. REST uses the 
                   HTTP design instead of inventing a new mechanism.
                   In the RESTful world, all revolves around resources (URI/URL), 
                   and their recovery and changes : read, created, updated, deleted (CRUD).


    Examples of RPC and RESTful requests:

    Information is stored about servers in datacentres. A remote application needs to know which servers are currently in use in the US datacentres.

    RPC: 
    <?xml version="1.0"?>
    <methodCall>
      <methodName>get_active_servers</methodName>
      <params>
        <param>
            <value><country>3</country></value>
        </param>
      </params>
    </methodCall>
    REST:
    https://hostname/country/3/server/active/1 (sent with HTTP GET method)
    Example: RPC implementation using RabbitMQ

    • SOAP - mentioned for completeness

     

     Message queue (MQ)

    Messaging systems are built to asynchronously connect multiple systems, by passing messages between them. (Messaging Anti-Patterns: Part 1). 
    Message queues are software components providing asynchronous communication between applications. Applications using the concept of memory queues, need to adhere to one of the message queue protocols, ie need to use a common "language" to understand each other. Some of the most common ones are AMQP (Advanced Message Queue Protocol), STOMP (Streaming Text Oriented Message Protocol), MQTT (Message Queue Telemetry Transport), Web Socket Protocol and WAMP (Web Application Messaging Protocol).

    Protocols:

     

    AMQP

                     Main features are reliability and interoperability. Offers a wide range of features related to messaging, including reliable queuing, topic-based publish-and-subscribe messaging, flexible routing, transactions, and security.

     

    STOMP

                    text-based, does not work with queues and topics,  ie does not provide a firm base for interchange

     

    MQTT

                   provides publish-and-subscribe messaging (no concept of queues despite the name). It was specially designed for resource-constrained devices and low bandwidth. MQTT’s strengths are simplicity (offers only five API methods) and a compact binary packet payload. These make it suitable for connecting devices like Arduino to a web service with MQTT etc.

     

    Web Socket

                  overcomes limitations of HTTP's design based on one-directional communication. Web Socket Protocol is used by applications requiring  bidirectional, real-time communication. 

     

    WAMP

                  is a subprotocol, built on top of the Web Socket Protocol. A common protocol for Publish/Subscribe and RPC communication methods (design patterns).


    MQ protocols provide a common standard for applications, that want to use this type of communication.  There are different messaging types/design patterns, some of which need or can take advantage of a middle man (called middleware), that has full responsibility for queues: creates them, routes messages to them, handles failures, sends messages to requesters and more.

    Examples:

    RabbitMQ (message broker/middleware, based on AMQP.)

    Example: RPC client/server implementation using RabbitMQ

    Fibonacci - using RPC RabbitMQ based server and client - Perl implementation

    The implementation presented here, was inspired by rabbitmq-tutorials.

    The fibonacci subroutine, in the server code, uses caching to speed up calculations for higher numbers.
    The client script sends its requests in a parallelized manner.

    The dependencies are:

        Getopt::Long
        Parallel::ForkManager
        AnyEvent
        UUID::Tiny
        Net::RabbitFoot
        Data::Printer         (can be replaced by Data::Dumper)


    The scripts are heavily documented to help understand all steps.

    RPC Server

    #!/usr/bin/perl
    
    =head1 RPC server using RabbitMQ to provide fibonacci series calculation
    
    =cut
    
    use strict;
    use warnings;
    use v5.010;
    
    $|++;
    use AnyEvent;
    use Net::RabbitFoot;
    
    use Data::Printer;
    
    # Declare subroutines
    # -------------------
    sub fib;
    sub on_request;
    
    # We shall be caching intermediate results of the recursive fib subroutine
    # ------------------------------------------------------------------------
    # (if we don't, calculations take an enormous amount of time for higher numbers)
    
    my $cached = {};
    
    # Connection to the RabbitMQ daemon
    # ---------------------------------
    # (in this case on the same server, the localhost)
    
    my $conn = Net::RabbitFoot->new()->load_xml_spec()->connect(
        host  => 'localhost',
        port  =>  5672,
        user  => 'guest',
        pass  => 'guest',
        vhost => '/',
    );
    
    # Create a RabbitMQ communication channel
    # ---------------------------------------
    
    my $channel = $conn->open_channel();
    
    # Declare which queue we shall be offering the service on
    # -------------------------------------------------------
    # (will provide the fibonaci series result. If we don't get any input,
    #  then the result will be calculated for n=30 )
    
    $channel->declare_queue(queue => 'rpc_queue');
    
    # Do the calculations, when a request comes
    # -----------------------------------------
    # (sent to the 'rpc_queue' queue) 
    
    $channel->consume(
        on_consume => \&on_request,
    );
    
    print " [x] Awaiting RPC requests\n";
    
    # Wait forever
    # ------------
    AnyEvent->condvar->recv;
    
    # ---------------------------------- SUBROUTINES ----------------------------------
    
    sub fib {
        my ($n, $padding) = @_
     
        # I used the padding to show the intermediate
        # recursive calculations during development. 
        $padding //= "";
    
        if ($n == 0 || $n == 1) {
            return $n;
        } else {
            my $n1 = $n-1;
            my $n2 = $n-2;
    
            $cached->{$n1} //=  fib($n1, "$padding  (-1)");
            $cached->{$n2} //=  fib($n2, "$padding  (-2)");
       
            return $cached->{$n1} + $cached->{$n2};
        }
    }
    
    sub on_request {
        my $var   = shift;                      # hashref with Net::AMQP header and 
                                                # body info
        my $body  = $var->{body}->{payload};    # the number for which the fibonacci
                                                # calculation is requested
        my $props = $var->{header};             # has correlation_id and reply_to queue
                                                # details
    
    
        my $n = $body;
        print " [.] fib($n)\n";
    
        my $response = fib($n);
    
        # publish/send the calculation to the client's queue
        $channel->publish(
            exchange    => '',
            routing_key => $props->{reply_to},
            header      => {
                correlation_id => $props->{correlation_id},
            },
            body => $response,
        );
    
        $channel->ack();                        # acknoledgement that the calculation  
                                                # was sent, so the message/calculation
                                                # can be deleted from memory
    } 

    RPC Client

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use v5.010;
    
    $|++;
    
    use Getopt::Long;
    use Parallel::ForkManager;
    
    use AnyEvent;
    use UUID::Tiny;
    use Net::RabbitFoot;
    
    use Data::Printer;
    
    # Process the command input and setup the default
    # for fibonacci processing, if no input provided
    # -----------------------------------------------
    GetOptions( 
        "numbers|n=s"  => \@nums,
        "help|h"        => sub { say "perl $0 [numbers|n ]" },
    );
    @nums = (scalar @nums) ? split(/,/,join(',',@nums)) : qw(50 30 20 10 5);
    
    # Declare subroutines
    # -------------------
    
    sub fibonacci($);
    
    # -----------------------------------------------
    #
    # Send parallelized requests to the RPC server
    # --------------------------------------------
    
    my $pm = Parallel::ForkManager->new(10);
    
    # 1) set up data structure retrieval and handling
    # -----------------------------------------------
    
    $pm->run_on_finish( 
        sub {
          my ($pid, 
              $exit_code, $ident, $exit_signal, $core_dump, 
              $all_child_data_ref) = @_;
    
          if (defined $all_child_data_ref) {
            p $all_child_data_ref;
          }
          else {
            print qq|No message received from child process $pid!\n|;
          }
        }
    );
    
    # 2) send the RPC requests
    # ------------------------
    
    FIB_LOOP:
    foreach my $num (@nums) {
        my $pid = $pm->start and next FIB_LOOP;
        # ----- start child process
    
        print " [x] Requesting fib($num)\n";
        my $response  = fibonacci($num);
    
        # ----- end of child process
        $pm->finish(0, { $num => $response });
    }
    
    $pm->wait_all_children;
    
    say "[END] - all children finished";
    
    # ---------------------------------- SUBROUTINES ---------------------------------- 
    
    # The main subroutine for the RPC (asynchronous) request
    # ------------------------------------------------------
    
    sub fibonacci($) {
        my $n = shift;
    
        # set up condition variable to watch for an event (when we receive
        # a result we asked for)
        my $cv = AnyEvent->condvar;
    
        # create a unique id for our request to the RPC server  
        my $corr_id = UUID::Tiny::create_UUID_as_string(UUID::Tiny::UUID_V4);
    
        # connect to the RabbitMQ deamon
        # (in this case on the same server, the localhost)
        my $conn = Net::RabbitFoot->new()->load_xml_spec()->connect(
            host  => 'localhost',
            port  =>  5672,
            user  => 'guest',
            pass  => 'guest',
            vhost => '/',
        );
    
        my $channel = $conn->open_channel();
    
        # after we disconnect, queue will be deleted
        my $result          = $channel->declare_queue(exclusive => 1);
        my $callback_queue  = $result->{method_frame}->{queue};
    
        my $on_response = sub {
            my $var     = shift;
            my $body    = $var->{body}->{payload};
            if ($corr_id eq $var->{header}->{correlation_id}) {
                $cv->send($body);
            }
        };
    
        # callback to execute after the response from the RPC server comes back
        $channel->consume(
            no_ack      => 1,               # turning off message acknowledgment: 
                                            #     we shall not notify the server
            on_consume  => $on_response,    # receives the server response, then 
                                            #     makes the condition variable
                                            #     true, which stops the event loop
        );
    
        # send the request for the fibonacci calculation to the RPC server
        $channel->publish(
            exchange    => '',
            routing_key => 'rpc_queue',     # to which queue on the server we are
                                            # sending the request 
                                            # (ie the number for which we want the
                                               fibonacci value)
            header      => {
                reply_to        => $callback_queue,     # our queue to which the server
                                                        #      will send its response
                correlation_id  => $corr_id,            # identifier of our request
            },
            body => $n,                     # the number, for which we want the fibonacci
                                            # value calculated
        );
    
        return $cv->recv;                   # callback->recv blocks until callback->send
                                            # is used
                                            # returns whatever data callback->send 
                                            # supplies 
    }