Search This Blog

Sunday, 22 February 2015

Perl one liners and other tools for fetching/posting internet content (WIP)


 

Fetching content

 
curl -X GET http://tttt.co.uk/api/asset/id/1
 
curl -H "Accept: application/json" -H "Content-Type: application/json" \
     -X GET  http://localhost:3030/download5/test.json 

wget http://tttt.co.uk/api/asset/id/1 -O test1.json

perl -MLWP::Simple -e 'print get("http://tttt.co.uk/api/asset/id/1")'

perl -MLWP::Simple -e 'getstore("http://tttt.co.uk/api/asset/id/1", "url_content.json")'

perl -MHTTP::Tiny -e 'print HTTP::Tiny->new->get("http://tttt.co.uk/api/asset/id/1")->{content}' 

 

Posting content

 
curl -X POST -d name=aaaa category=3 http://localhost:3010/api/asset
              (posts data using the Content-Type application/x-www-form-urlencoded)  
curl -H "Accept: application/json" -H "Content-Type: application/json" \
     -X POST --data '{"name":"aaaa","category":"3"}' http://localhost:3030/api/asset
 
curl -X POST -F 'file=@data.csv'  http://localhost:3010/api/category
             (posts data using the Content-Type multipart/form-data) 
 
curl -X POST -T 'asset.csv'  http://localhost:3010/api/asset
             (upload of a file) 
 

Saturday, 21 February 2015

Chat server implemented in Perl, based on AnyEvent

Event-driven implementation of a chat server, with one main processing thread.

Uses tcp_server method from AnyEvent::Socket for easy creation of a non-blocking TCP connection. Inside the connection callback, the connecting client is informed about other already connected clients and client information (host:port identifier and the client socket handle) is stored in a hash. The client file/socket handle, available in the tcp_server callback after a client connects to the server, is wrapped in a AnyEvent::Handle object to allow event-driven access and manipulation. The on_read handler of the client socket handle deals with the client message, sending it to all other connecting servers. The client can send a message either directly, or first send OK, followed by the message itself. 

#!/usr/bin/perl
 
=head2 chat_server.pl

Perl chat server based on AnyEvent

Server:     perl $0
Clients:    telnet 127.0.0.1 8888 (run in several terminals)
            clients communicate by:
                                    a) sending message terninated with carriage return
                                    b) sending OK, followed by carriage return
                                       sending  message terninated with carriage return  

=cut

use strict;
use warnings;
use utf8;
use v5.018;

use AnyEvent;                           # creates event driven loop
use AnyEvent::Socket qw(tcp_server);    # provides high level function to create tcp server
use AnyEvent::Handle;                   # creates non-blocking (socket) handle

use Data::Dumper qw(Dumper);

sub _inform_clients;

=head2 Store connected clients in a hash structure

key:    $host:$port ..... uniquely identifies a connected client
value:  socket handle ... so we can continue communication with individual clients

=cut

my %client = ();

=head2 Create TCP server

allow connection from everywhere, on a specified port

=cut

tcp_server undef, 8888, sub {
    my ($fh, $host, $port) = @_;

    say "[$host:$port] connected";

=head3 On connection, tell the client how many are already connected

=cut

    syswrite $fh, "Hello friend. There are currently " . scalar(keys %client) . 
                  " connected friends.\015\012";

    _inform_clients(\%client, "Friend [$host:$port] joined us!");

=head3 Create nonblocking socket handle for the client

=cut

    my $hdl = AnyEvent::Handle->new(
        fh => $fh,
    );

=head3 Store client information

=cut

    my $client_key = "$host:$port";
    $client{$client_key} = $hdl;

=head3 On error, clear the read buffer

=cut

    $hdl->on_error (sub {
        my $data = delete $_[0]{rbuf};
    });

=head3 On receiving a message from a client

We expect:

    sending a regular message
        either "OK\n", then a message
        or      directly a message
    disconnecting
        send quit/QUIT followed by carriage return

=cut

    my $writer; 
    $writer = sub {
        my ($hdl, $line) = @_;
        say "Reading from client: [$line]";

        my @clients = keys %client;
        say Dumper(\@clients);

        # The client cannot disconnect until we release its handle
        if ($line =~ /\Aquit|bye|exit\z/i) {

            my $client_count = (scalar keys %client) - 1;       # exclude the leaving client
            say "REMAINING (apart from this): $client_count";

            # Send message to each client
            for my $key (@clients) {

                if ($key eq $client_key) {
                    $hdl->push_write("Bye\015\012");
                }
                else {
                    my $message = ($client_count > 1) ? "only $client_count of us left\015\012" : 
                                                         "You are the only one left :(. Send quit/QUIT to disconnect\015\012";
                    $client{$key}->push_write("Friend $client_key is leaving us, $message");
                }

            }

            $hdl->push_shutdown;
            delete $client{$client_key};
            
        }
        # if we got an "OK", we have to _prepend_ another line,
        # so it will be read before the second request reads the 64 bytes ("OK\n")
        # which are already stored in the queue when this callback is called
        elsif ($line eq "OK") {
            $_[0]->unshift_read (line => sub {
                my $response = $_[1];
                for my $key (grep {$_ ne $client_key} @clients) {
                    $client{$key}->push_write("$response from $client_key\015\012");
                }
            });
        }
        elsif ($line) {
            for my $key (grep {$_ ne $client_key} @clients) {
                my $response = $line;
                $client{$key}->push_write("$response from $client_key\015\012");
            }
        }
    };

=head3  Enter the request handling loop

=cut

    $hdl->on_read (sub {
        my ($hdl) = @_;

        # Read what was sent, when request/message received
        # (then distribute the message)
        $hdl->push_read (line => $writer);
    });

};

=head3 Start the event loop

=cut

AnyEvent->condvar->recv; 

=head2 SUBROUTINES

_inform_clients

=cut

=head2 _inform_clients

sends a message to all known/stored clients

=cut

sub _inform_clients {
    my ($client_href, $message) = @_;

    for my $key (keys %$client_href) {
        $client{$key}->push_write("$message\015\012");
    }
}

Source code on github 

Numeral systems and bit shifting quick overview

Numeral Systems


Numeral system Radix/root Digits Example In Decimal system
Binary

0,1 0,1

Byte 2

Groupings of 8 binary digits
(representation of a byte or integer (16/32/64 bits))
01010001 (02 × 27) + (12 × 26) + (02 × 25) + (12 × 24) + (03 × 23) + (02 × 22) + (02 × 21) + (12 × 20)
Decimal 10

0-9 124 (110 × 102) + (210 × 101) + (410 × 100)
Octal 8

0-7 02732 (28 × 83) + (78 × 82) + (38 × 81) + (28 × 80)
Hexadecimal 16

0-9,A-F (corresponding to 10-15) 0x2AF3 (216 × 163) + (A16 × 162) + (F16 × 161) + (316 × 160)

Bit Shifting

 

128
64 32 16 8 4 2 1

27         

26

25

24

23

22

21

20

0 1 0 1 0 0 0 1



0     1   0    1    0   0  0   1 => 1*64 + 1*16 + 1*1 = 81

 

 

Arithmetic bit shifting to the right with >>

 


Makes bits fall of the right and adds zero padding to the left. This is equivalent to arithmetic division.

01010001 = 1*64 + 1*16 + 1*1 = 81

$y =  0b01010001 >> 1
>> 1 … shifting by one position to the right:
01010001 → 00101000 = 1*32 + 1*8 = 40   (ie int(81/2))

Arithmetic bit shifting to the left with <<


Makes bits fall of the left and adds zero padding to the right. This is equivalent to arithmetic multiplication by 2 to the number on the right of the operator.

01010001 = 1*64 + 1*16 + 1*1 = 81

$y =  0b01010001 << 1

<< 1 … shifting by one position to the left, we are multiplying by 2 to 1:
01010001 → 10100010 = 1*128 + 1*32 + 1*2 = 162 (ie 81 * 2)


NOTE

 

The number of shift positions needs to result in a value within the allowed range of the original value type:

Wrong:

<< 2:
01010001 (81) → 01000100 = 1*64 + 1*4 =68

Wednesday, 11 February 2015

Unicode for the Half-initiated (Perl biased)

  1. Background

  2. Perl


Background

There are under 7 000 languages, about one third of which have a writing system. The challenge is to be able to represent all writing systems using one encoding set.

         And at the beginning was ASCII ...


         ("American Standard for Information Interchange", 1963)

ASCII, representing 128 characters, 33 non-printable control characters and the rest used for encoding of the English alphabet. ASCII developed from telegraphic codes. Its characters are encoded into 7-bit binary integers (with most significant bit being 0), giving the total of 128 possibilities. Using 8 bits extends the range to 255 characters. One of the encodings covering this range (called extended ASCII) is latin-1/ISO-8859-1.

After computers spread to other countries, other encodings were needed to represent characters in other languages, not available in ASCII. Western Europe uses Latin-1 (ISO-8859-1), Central Europe Latin-2 (ISO-8859-2) etc.

These local character sets are limited in the ability to provide character representations. So, a Unicode Consortium was created in 1991 in the attempt to unify all character representations and provide one encoding that would be able to represent any writing system.  A collection of all known characters was started. Each character was assigned a unique number, called code point.

The code point is usually written as a four or six digit hex number (eg U+07FF). Some characters have a user-friendly name, like WHITE SMILING FACE (☺) or SNOWMAN (☃).  Apart from base characters like A etc, there are accents and decorations (umlaut etc). A character followed by an accent, forming a logical character, is called a grapheme. Unicode is an umbrella for  different encoding forms: UTF-8, UTF-16 and UTF-32. UTF-8 is the most popular encoding, at the beginning of 2015 used on around 82% of World Wide Web pages.

                                      Picture of how characters map to bytes.

http://www.w3.org/International/articles/definitions-characters/images/encodings-utf8.png

Originally it was assumed 16 bits to represent one character, giving 16 536 (216 ) options, would suffice. However soon the ambition was to be able to represent all possible writing systems, so more bytes were needed for character representations. The first 65 536 code points is called a Basic Multilingual Plane (BMP). There are 16 more Multilingual planes designed to hold over 1,000,000 of characters. These planes are not contiguously populated, leaving blocks of code points for future assignment.

 http://rishida.net/docs/unicode-tutorial/images/unicode-charset2.png

http://rishida.net/docs/unicode-tutorial/images/unicode-charset2.png 

 

UTF-8 Encoding


Number        First                   Last               Bytes 
of bits          code point         code point
-----------------------------------------------------------------------------------------------------
  7                   U+0000           U+007F
 0xxxxxxx
11                   U+0080           U+07FF
 110xxxxx 10xxxxxx
16                   U+0800           U+FFFF
 1110xxxx 10xxxxxx 10xxxxxx
21                   U+10000           U+1FFFFF
 11110xxx 10xxxxxx 10xxxxxx by 10xxxxxx

UTF-8, unlike UTF-16, is a variable length encoding, where different code point ranges are represented by 1 byte or a sequence of 2,3 or 4 bytes. The first 128 characters are equivalent to ASCII. These have the higher order bit 0. Code points represented by more bytes, have the higher bit 1, followed by as many 1s as there are remaining bytes representing the given character. This is how system can understand the octet stream and decode it into characters.

Encode :    into binary string
Decode:     into character string

If the system cannot interpret a sequence of octets, because it assumes a wrong encoding, a warning about a wide character is given and a placement character is used. The solution is to encode the string into the desired encoding, then decode into a character string.

BOM and surrogates

 

UTF-16 and UTF-32 use 2 and 4 bytes respectively for character representation and need to deal with the endianness/the byte order, associated with the particular processor. Big endian order: most significant bits stored first vs little endian. BOM (Byte Order Mark) is a short byte sequence (U+FEFF or U+FFFE code points), present at the beginning of text etc, that clarifies the byte order information (which byte is the first one?) and allows correct decoding. UTF-8 does not suffer from the endian problem.

UTF-16 uses 2 bytes for all character representations, even the first 255 characters. Two byte encoding tackles the BMP, ie 65 536 characters; higher code points correspond to surrogate pairs, two 16 bit units.

 

Perl

use utf8;                        # to be able to use unicode in variable names and literals
use feature "unicode_strings";   # to use character based  string operations
use open    ":encoding(UTF-8)";  # expect UTF-8 input and provide UTF-8 output
use charnames ":loose";          # to be able to use \N{WHITE SMILING FACE}
 
# already created filehandle 

binmode STDOUT, ':iso-8859-1';
binmode $fh,    ':utf8'; 

# Database access

# DBI

$dbh = DBI->connect($dsn, $user, $password,
                    { RaiseError => 1, AutoCommit => 0,  

                      mysql_enable_utf8 => 1 }); 

# DBIx::Class 

$self->{ schema } = Schema->connect("dbi:mysql:somedatabase", 
                                        'sqluser', 'scret',
                                       { mysql_enable_utf8 => 1 },
                        ) or die "Cannot connect to database";
 
# Catalyst model - DBIx::Class

 __PACKAGE__->config(
    schema_class => 'MyApp::Schema',
    
    connect_info => mysql{
        dsn => 'dbi:mysql:somedatabase',
        user => 'mysqluser',
        password => 'scret',
        mysql_enable_utf8 => 1,
    }   
);

Sunday, 8 February 2015

Design Patterns in Perl - Factory pattern

Factory class

.../EntityFactory.pm
.../Entity/Asset.pm
.../Entity/Datacentre.pm  etc

type ... the entity type, which corresponds to the database table. Function type2source returns the name of the DBIC ResultSource file in the Result directory ($class). $class is used to load the specialized class (according to the $type) at run time, after which the specialized class object is created and returned;

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
package AssetManagerApi2::EntityFactory;

use v5.018;
use utf8;
use open    ':encoding(UTF-8)';
use feature 'unicode_strings';

use FindBin qw($Bin);
use lib "$Bin/../lib";

use AssetManagerApi2::Helper::Entity;

=head3 new

IN: class
    hash of input params for the entity object to be created
    and c and entity type info

=cut

sub new {
    my $class = shift;

    my $entity = AssetManagerApi2::Helper::Entity::type2table(shift); 

    my $path  = "AssetManagerApi2/Entity/$entity.pm";
    $class    = "AssetManagerApi2::Entity::$entity";

    require $path;
    return $class->new(@_);
};


1;

 

Specialized classes

Specialized classes are Moose-based and are associated with database table rows represented by DBIx::Class ResultSources.
 
The input to the constructor is a  hashref, containing:
  • Catalyst object and the entity type
  • properties of the DBIC object
      
      example :  { c => $c,  type => $type, id => $dbic->id, name => $dbic->name }

The id, the Catalyst object and the entity type are set up first. Then this information is used by the Moose constructor to set up the rest of the properties through the builder methods.

 

Asset.pm


  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
package AssetManagerApi2::Entity::Asset;

use Moose;
use namespace::autoclean;

use v5.018;
use utf8;
use open    ':encoding(UTF-8)';
use feature 'unicode_strings';

use AssetManagerApi2::Helper::Entity qw(
                                        create_output_structure
                                     );

my $asset_model = 'AssetManagerDB::Asset';

has 'c'          => (is       => 'ro', 
                     isa      => 'AssetManagerApi2',
                     required =>  1,
                    );

has 'dbic'       => (is       => 'ro', 
                     isa      => 'AssetManagerApi2::Schema::AssetManagerDB::Result::Asset',
                     builder  => '_build_dbic',
                    );

has 'type'       => (is       => 'ro', 
                     isa      => 'Str',
                     lazy     =>  1,
                     builder  => '_build_type',
                     );

has 'id'         => (is       => 'ro', 
                     isa      => 'Int',
                     required =>  1,
                    );

has 'name'       => (is  => 'rw', 
                     isa => 'Str',
                     lazy     =>  1,
                     builder => '_build_name',
                     );

has 'datacentre' => (is  => 'rw', 
                     isa => 'HashRef',
                     lazy     =>  1,
                     builder => '_build_datacentre',
                    );

has 'software'   => (is  => 'ro',               
                     isa => "ArrayRef[HashRef]|ArrayRef",
                     lazy     =>  1,
                     builder => '_build_software',
                    );

=head2 MOOSE METHODS

=cut

=head3 BUILDARGS

The instance is associated with the DBIC object.
    1) Retrieve the DBIC object
    2) Set up some of the Asset object properties before
       Moose constructer takes over

=cut

around BUILDARGS => sub {
    my $orig  = shift;
    my $class = shift;

    my $c    = $_[0]->{c};
    my $dbic = $c->model($asset_model)->find({id => $_[0]->{id}});

    my %props = (
                    c           => $c,
                    id          => $_[0]->{id},
                    type        => 'asset',
                    dbic        => $dbic,
                );

    return $class->$orig(%props);
};
                     
sub _build_type { 
    my ($self) = @_;
}

sub _build_dbic {
    my ($self) = @_;
}
 
sub _build_name {
    my ($self) = @_;

    $self->name($self->dbic->name);
}
 
sub _build_datacentre {
    my ($self) = @_;

    my $c = $self->c;
    my $datacentre_dbic = $self->dbic->datacentre;
    my $datacentre = create_output_structure($c, $datacentre_dbic, 'datacentre');

    $self->datacentre($datacentre);
}
 
sub _build_software {
    my ($self) = @_;

    my $c = $self->c;

    my $self_software_dbic = $self->dbic->softwares;

    my @software = ();
    while (my $package = $self_software_dbic->next) {
        my $software = create_output_structure($c, $package, 'software');
        push @software, $software;
    }

    $self->software(\@software);
}

=head2 PRIVATE HELPER METHODS

=cut

=head2 PUBLIC API METHODS

=cut

sub associate_software {
    my ($self, $props) = @_;

    my $c    = $self->c;
    my $dbic = $self->dbic->find_or_create($props);

    create_output_structure($c, $dbic, 'asset_software');
}

__PACKAGE__->meta->make_immutable;

1;

Saturday, 7 February 2015

A note to myself on Caching

In Perl, a good module to use is CHI, a wrapper around specific data storage. This means there is one API to access the cache and the particular data store is hidden from the application.
  • Local system

  •     use CHI;
    
        # Choose a driver
        
        my $cache = CHI->new( driver     => 'FastMmap',
                              root_dir   => '/path/to/cache',
                              cache_size => '1k'
        );
        my $cache = CHI->new( driver  => 'Memcached::libmemcached',
                              servers => [ "10.0.0.15:11211",
                                           "10.0.0.15:11212" ],
                              l1_cache => { driver   => 'FastMmap'
                                            root_dir => '/path/to/cache' }
        );
        my $cache = CHI->new( driver   => 'BerkeleyDB',
                              root_dir => '/path/to/cache'
        );
        my $cache = CHI->new( driver => 'Memory',    global => 1 );
        my $cache = CHI->new( driver => 'RawMemory', global => 1 );


    CHI::Driver::FastMmap          Shared memory inter-process cache via mmaped files
    CHI::Driver::BerkeleyDB        Shared memory inter-process cache via Berkeley DB
                                                      environment using the Concurrent Data Store (CDS), making it
                                                      safe for multiple processes to read and write the cache without
                                                      explicit locking
    CHI::Driver::Memory               In-process memory based cache
    CHI::Driver::RawMemory        In-process memory based cache that stores references
                                                      directly instead of serializing/deep-copying)
  • Distributed systems

  •     use CHI;
    
        my $cache = CHI->new(
            driver => 'Memcached',   # or 'Memcached::Fast', 
                                       or 'Memcached::libmemcached'
            namespace => 'products',
            servers => [ "10.0.0.15:11211", "10.0.0.15:11212"
                         "/var/sock/memcached",
                         "10.0.0.17:11211"
                         [ "10.0.0.17:11211", 3
                       ],
            debug => 0,
            compress_threshold => 10_000,
        );

Code examples taken from: CHI module on CPAN

Unicode problem - wide character in print (Perl solution)

PROBLEM

I encountered the problem of the wide character in print (appearing as the replacement character, question mark in a black diamond) when working on a Catalyst application, though this was not a Catalyst related issue.

I had, as far I know, all the important setup in place:

  1. Database:
    1. Database created with: 
      1. CREATE DATABASE IF NOT EXISTS xxxxx DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
    2. On the Catalyst side, connection to the database had the mysql_enable_utf8 flag set up
  2.  Code
    1. use utf8   
    2. use open ':encoding(utf8)'
    3. use feature 'unicode_strings'
    4. Template::Toolkit setup in Catalyst View:
      1.  _PACKAGE__->config(
            TEMPLATE_EXTENSION => '.tt',
            render_die => 1,
            WRAPPER    => 'wrapper.tt',
            ENCODING   => 'utf-8',
        );
Despite all this, I kept getting the infamous warning about 'wide character in print' for the word Elégant, though not for Nice™.  Very puzzling because:
  • I did not encounter this problem in a RESTful implementation of the functionality with JSON output
  • All seemed set up to correctly and consistently cater for unicode
  • Some unicode characters showed correctly, some not
SOLUTION

The issue was caused by Perl assuming the encoding of the text retrieved from the database was in utf8 (as instructed), though, for this particular character it was in latin-1. One of the bytes that the latin-1 encoding uses for this character, is a non-utf8 byte and the system does not know how to handle it. Hence the replacement character.
 
The solution was to encode the retrieved text into utf8:

use Encode qw(encode);
...
my $text = encode('utf8', get_from_database());