Hex, Bugs and More Physics | Emre S. Tasci

a blog about physics, computation, computational physics and materials…

Sub-groups of space groups

July 3, 2008 Posted by Emre S. Tasci

ISOTROPY is a highly effective program written by the guru of applications of Group Theory on Phase Transitions, Harold T. Stokes (For those who are interested in the topic, I recommend his Introduction to Isotropy Subgroups and Displacive Phase Transitions titled paper(co-authored by Dorian M. Hatch) and Structures and phase transitions in perovskites – a group-theoretical approach paper (co-authored by Christopher J. Howard).

I’m pretty new and noobie in Group Theory – it was one of the subjects I found myself always running away. Anyway, I wanted to limit my search for some specific phase transition search in the database. Say, if an A-B binary has been reported in S1 and S2 structures, I didn’t want to go all the way looking for possible transition mechanisms if S1->S2 isn’t allowed by the group theory at all!

In the meantime, I’ve begun experimenting with the ISOTROPY program, yet I’m still at the very beginning. You can find detailed information and an online version in the related website, but to give an example, suppose we’d like to know about the subgroups of spacegroup 123 (P4/mmm). We enter the following commands:

VALUE PARENT 123
SHOW PARENT
SHOW SUBGROUP
SHOW LATTICE
DISPLAY ISOTROPY

and it displays a list of which a portion is presented below:

Parent     Lattice Subgroup   Lattice
123 P4/mmm q       123 P4/mmm q
123 P4/mmm q       47 Pmmm    o
123 P4/mmm q       83 P4/m    q
123 P4/mmm q       65 Cmmm    o-b
123 P4/mmm q       10 P2/m    m
123 P4/mmm q       12 C2/m    m-b
123 P4/mmm q       2 P-1      t
123 P4/mmm q       89 P422    q
123 P4/mmm q       111 P-42m  q
123 P4/mmm q       99 P4mm    q
123 P4/mmm q       115 P-4m2  q
123 P4/mmm q       25 Pmm2    o
123 P4/mmm q       38 Amm2    o-b
123 P4/mmm q       6 Pm       m
123 P4/mmm q       123 P4/mmm q
123 P4/mmm q       127 P4/mbm q
123 P4/mmm q       127 P4/mbm q

You may have noticed that, there are some recurring lines – this is due to some other properties that we have not specifically selected for display, i.e., it would display lines only containing "123 P4/mmm" if we hadn’t specifically asked for subgroup and lattice information (in analogy with the degeneracies in QM).

I needed a table containing all the subgroups of each space-group but there were two problems:

1.  As far as I know, ISOTROPY does not accept an input file where you can pass the commands you’ve prepared via a script/macro. So, I was afraid that I would type "VALUE PARENT 1 – DISPLAY ISOTROPY – VALUE PARENT 2 – DISPLAY ISOTROPY – …" (by the way, ISOTROPY has a clever interpreter so that you can just type the first letters of commands and parameters as long as they are not dubious).

2. Again, as far as I know, you can’t direct the output to a file, meaning you have to copy the output manually and paste into a file (x230 in my case).

Luckily, it turned out that:

1. You could not supply an input file, but you could instead, paste the contents of such an input file and ISOTROPY would process it line by line.

2. Stokes had included a very useful and helpful property that all the sessions was recorded automatically in the "iso.log" file.

So, all I had to do was

1. Write a script that would produce the query to retrieve the subgroups of all 230 space-groups.

2. Paste the outcome of this script to ISOTROPY.

3. Write another script that would parse the "iso.log" file and split these reported subgroups per spacegroup (and while I’m at it, remove the duplicate lines and order the subgroups).

Maybe you won’t believe me but I am also not happy to paste lines and lines of code to this blog’s entries. I will deal it soon but in the mean time, please bear with them a little longer.. Ath the moment, I will describe the process and will include them at the end.

* build the query via query_builder.php
* feed this query to ISOTROPY
* split the space groups from the "iso.log" via split_spacegroups.php
* run sortall.php which :
    * runs the sortkillduplines.php on each of these files
    * sorts again via the sort.pl wrt to the 4th col (space number of the subgroup)
    * removes the blank lines
* And that’s it..

Here comes the codes (php+perl)

query_builder.php:

<?
 // Generates the query that will be "pasted" into the ISOTROPY prompt
 // The query will be stored in isotropy_query.txt
$fp = fopen("isotropy_query.txt","w");
 fwrite($fp,"PAGE 1000\nSHOW SUBGROUP\nSHOW LATTICE\nSHOW PARENT\n");
 for($i=1;$i<231;$i++)
     fwrite($fp,"VALUE PARENT $i\nDISPLAY ISOTROPY\n");
 fwrite($fp,"\n\n");
 fclose($fp);
 ?>



sortkillduplines:

<?
// PERL sorts the mentioned file (via $_GET["file"]) and deletes duplicate lines
require("commandline.inc.php");
if($_GET["file"]) $file = $_GET["file"];
 else
 {
     $in = fopen(’php://stdin’, ‘r’);
     $fw = fopen("sortkilldup_tmp.tmp","w");
     while(!feof($in))
     {
         fwrite($fw, rtrim(fgets($in, 4096))."\n");
     }
     fclose($fw);
     $file = "sortkilldup_tmp.tmp";
 }
 if($_GET["seperator"])$f_seperator = TRUE;//Means that an extra seperator has been placed for this order purpose and is seperated from the text by the parameter defined with seperator=xxx.
if($_GET["n"]||$_GET["numeric"]||$_GET["num"])$f_numeric = TRUE; // If it is TRUE than it is assumed that the first word of each line is a number and will be sort accordingly.
 // Reference : http://www.stonehenge.com/merlyn/UnixReview/col06.html
if($f_numeric) $exec = exec("perl -e ‘print sort numerically <>;\nsub numerically { \$a <=> \$b; }’ ".$file." > sorttemp.txt");
 else $exec = exec("perl -e ‘print sort <>’ ".$file." > sorttemp.txt");
 $fi = file("sorttemp.txt");
 unlink("sorttemp.txt");
 $curlinecont = "wrwerwer";
 if(!$f_seperator)
 {
     for($i=0; $i<sizeof($fi); $i++)
     {
         $line = rtrim($fi[$i]);
         if($curlinecont!=$line)
         {
             echo $line."\n";
             $curlinecont = $line;
         }
     }
 }
 else
 {
     $frs = fopen("sorttemp2.txt","w"); // will capture the dup killed output in this file and resort it later.
     for($i=0; $i<sizeof($fi); $i++)
     {
         $lineorg = rtrim($fi[$i]);
         $linearr = explode($_GET["seperator"],$lineorg);
         if($curlinecont!=$linearr[0])
         {
             fwrite($frs, $linearr[1]."\n");
             $curlinecont = $linearr[0];
         }
     }
     fclose($frs);
     // resort the file 
     passthru("perl -e ‘print sort <>’ sorttemp2.txt");
     unlink("sorttemp2.txt");
 }
   // unlink("sortkilldup_tmp.tmp");
 ?>



split_spacegroups.php:

<?
 // splits the isotropy.log file generated by the query : 
 //      PAGE 1000
 //      SHOW SUBGROUP
 //      SHOW LATTICE
 //      SHOW PARENT
 //      VALUE PARENT 1
 //      DISPLAY ISOTROPY
 //      â€¦
 //      VALUE PARENT 230
 //      DISPLAY ISOTROPY
 // with respect to the parent groups
$file = fopen("iso.log", "r") or exit("Unable to open file!");
 //Output a line of the file until the end is reached
 $cur = 0;
 $fp = fopen("dummy.txt","w"); // we need to open a dummy file in order to include the fclose() in the iteration
 while(!feof($file))
 {
     // Read the line
     $line = fgets($file);
    // Check if it is a spacegroup information entry by looking at the first word
     // - if this word is a number, than process that line
     if(ereg("^([0-9]+)",$line,$num))
     {
         $num = $num[0];
         if($cur!=$num)
         {
             // We have a new space group, so create its file:
             fclose($fp); // First close the previous space group’s file
             $filename = sprintf("%03d",$num)."_sg.txt";
             $fp = fopen($filename,"w");
             $cur = $num; // Set the current check data to the sg number
         }
         fwrite($fp,$line);
     }
}
 fclose($file);
 unlink("dummy.txt");
?>



sortall.php:

<?
 // sorts & kills dupes in all the spacegroup files
for($num=1;$num<231;$num++)
 {
     $filename = sprintf("%03d",$num)."_sg.txt";
     $exec = exec("php ../../toolbox/sortkillduplines.php file=".$filename." > temp");
     $exec = exec("./sort.pl temp > temp2");
     $exec = exec("perl -pi -e \"s/^\\n//\" < temp2 > ".$filename);
 }
 unlink("temp");
 unlink("temp2");
?>

or you can directly skip all these steps and download the generated spacegroup files from here. 8)

 

Lattice values are given in Schoenflies Notation where it corresponds to (with Pearson notation):

T : triclinic (AP)
M : prmitive monoclinic (MP)
M-B : base-centered monoclinic (MC)
O : primitive orthorhombic (OP)
O-B : base-centered orthorhombic (OC)
O-V : body-centered orthorhombic (OI)
O-F : face-centered orthorhombic (OF)
Q : primitive tetragonal (TP)
Q-V : body-centered tetragonal (TI)
RH : trigonal (HR)
H : hexagonal (HP)
C : primitive cubic (CP)
C-F : face-centered cubic (CF)
C-V : body-centered cubic (CI)

Quick reference on Bravais Lattices and info on how/why they are called after Bravais but not Frankenheim..

So, as Blake had said, "Energy is Eternal Delight." or in other words, let’s merge the preceding two entries to write a code for a function that produces MySQL queries that will fetch the common elements involved with the given elements:

function fetch_elements($el,$order,$total)
{
    // produces the query to select the elements that have binaries with the $el
    // $order and $total are used to adjust the tab positions and designating the sets
 
    $abc = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    $tab = str_repeat("\t",($total-$order));
    $lom = <<<LOM
$tab SELECT symbol_A AS symb, val1 AS val$abc[$order]
$tab FROM `dbl014`
$tab WHERE symbol_B = "$el"
$tab GROUP BY Symbol_A
$tab UNION
$tab SELECT symbol_B AS symb, val1 AS val$abc[$order]
$tab FROM `dbl014`
$tab WHERE (
$tab     symbol_A = "$el"
$tab     AND symbol_B != ""
$tab )
$tab GROUP BY Symbol_B
LOM;
    return $lom;
}
 
function merge_queries($str1,$str2,$or1,$or2,$tab)
{
    // merges two node (or node group) queries (formed via fetch_elements() function)
    // using the INNER JOIN function
    return "$tab SELECT *\n$tab FROM (\n".$str1."\n$tab ) AS $or1\n$tab INNER JOIN (\n".$str2."\n$tab ) AS $or2\n$tab USING (symb)\n$tab ";
}
 
function fetch_common_elements($ar_el)
{
    // produces the query that selects the common elements that have binaries with all of the elements given in the $ar_el array (using their symbols)
    $total = sizeof($ar_el);
    $abc = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    //$tab = str_repeat("\t",($total-$order-1));
    $str[0] = fetch_elements($ar_el[0],0,$total-1);
    $str[1] = fetch_elements($ar_el[1],1,$total);
    $stri = merge_queries($str[0],$str[1],"A","B",str_repeat("\t",($total-2)));
    for($i=2;$i<$total;$i++)
    {
        $str[$i] = fetch_elements($ar_el[$i],$i,$total);
        $stri = merge_queries($str[$i],$stri,$abc[$i],$abc[($i+1)],str_repeat("\t",($total-$i-1)));
    }
    return $stri;
}

Now, we can call it with, for example:
echo fetch_common_elements(Array("Ga","Fe","Gd","Y","Ti","Ge","Ce"));
Which will output:

 SELECT *
 FROM (
         SELECT symbol_A AS symb, val1 AS valG
         FROM `dbl014`
         WHERE symbol_B = "Ce"
         GROUP BY Symbol_A
         UNION
         SELECT symbol_B AS symb, val1 AS valG
         FROM `dbl014`
         WHERE (
             symbol_A = "Ce"
             AND symbol_B != ""
         )
         GROUP BY Symbol_B
 ) AS G
 INNER JOIN (
         SELECT *
         FROM (
                 SELECT symbol_A AS symb, val1 AS valF
                 FROM `dbl014`
                 WHERE symbol_B = "Ge"
                 GROUP BY Symbol_A
                 UNION
                 SELECT symbol_B AS symb, val1 AS valF
                 FROM `dbl014`
                 WHERE (
                     symbol_A = "Ge"
                     AND symbol_B != ""
                 )
                 GROUP BY Symbol_B
         ) AS F
         INNER JOIN (
                 SELECT *
                 FROM (
                         SELECT symbol_A AS symb, val1 AS valE
                         FROM `dbl014`
                         WHERE symbol_B = "Ti"
                         GROUP BY Symbol_A
                         UNION
                         SELECT symbol_B AS symb, val1 AS valE
                         FROM `dbl014`
                         WHERE (
                             symbol_A = "Ti"
                             AND symbol_B != ""
                         )
                         GROUP BY Symbol_B
                 ) AS E
                 INNER JOIN (
                         SELECT *
                         FROM (
                                 SELECT symbol_A AS symb, val1 AS valD
                                 FROM `dbl014`
                                 WHERE symbol_B = "Y"
                                 GROUP BY Symbol_A
                                 UNION
                                 SELECT symbol_B AS symb, val1 AS valD
                                 FROM `dbl014`
                                 WHERE (
                                     symbol_A = "Y"
                                     AND symbol_B != ""
                                 )
                                 GROUP BY Symbol_B
                         ) AS D
                         INNER JOIN (
                                 SELECT *
                                 FROM (
                                         SELECT symbol_A AS symb, val1 AS valC
                                         FROM `dbl014`
                                         WHERE symbol_B = "Gd"
                                         GROUP BY Symbol_A
                                         UNION
                                         SELECT symbol_B AS symb, val1 AS valC
                                         FROM `dbl014`
                                         WHERE (
                                             symbol_A = "Gd"
                                             AND symbol_B != ""
                                         )
                                         GROUP BY Symbol_B
                                 ) AS C
                                 INNER JOIN (
                                         SELECT *
                                         FROM (
                                                 SELECT symbol_A AS symb, val1 AS valA
                                                 FROM `dbl014`
                                                 WHERE symbol_B = "Ga"
                                                 GROUP BY Symbol_A
                                                 UNION
                                                 SELECT symbol_B AS symb, val1 AS valA
                                                 FROM `dbl014`
                                                 WHERE (
                                                     symbol_A = "Ga"
                                                     AND symbol_B != ""
                                                 )
                                                 GROUP BY Symbol_B
                                         ) AS A
                                         INNER JOIN (
                                                 SELECT symbol_A AS symb, val1 AS valB
                                                 FROM `dbl014`
                                                 WHERE symbol_B = "Fe"
                                                 GROUP BY Symbol_A
                                                 UNION
                                                 SELECT symbol_B AS symb, val1 AS valB
                                                 FROM `dbl014`
                                                 WHERE (
                                                     symbol_A = "Fe"
                                                     AND symbol_B != ""
                                                 )
                                                 GROUP BY Symbol_B
                                         ) AS B
                                         USING (symb)
 
                                 ) AS D
                                 USING (symb)
 
                         ) AS E
                         USING (symb)
 
                 ) AS F
                 USING (symb)
 
         ) AS G
         USING (symb)
 
 ) AS H
 USING (symb)

and when fed to the database will yield:

 

 thank y’all.. 8)

(Un)Common?

June 24, 2008 Posted by Emre S. Tasci

They say that, it is the opposite (unsimilar) couples that make up for much stabler relationships… If you have only 2 different values, (+) and (-), spin up and down, man and woman, then the unsimilar of a thing that is unsimilar to you equals to a thing that is similar to you. Although it is generally not valid for elements, it gives you an insight nonetheless.. So if Y, Gd, Fe and Ga play a similar part in dearest ZrO2‘s life, let’s check for their other common "partners" that are known to have some connection/involvement this or that way…

The table we’re interested in today holds the formula and the constituent elements as in the example below:

 

(Actually the table contains way more properties than the formula and the constituent elements but I’ve focused only to the relevant props)

symbol_A and symbol_B are position oriented wrt the written formula, meaning that for an AxBy binary, you have A for symbol_A and B for symbol_B whereas if it is BxAy, then they are reversed.

We will be looking for elements that form binaries with the aforementioned 4 elements, namely: Yttrium, Gadolinium, Iron and Gallium.

We can search for each of these 4 via the following query and then add it up:

SELECT symbol_A as symb,val1 FROM `dbl014`
   WHERE  symbol_B="Gd" 
   GROUP BY Symbol_A
UNION
SELECT symbol_B as symb, val1 FROM `dbl014`
   WHERE (symbol_A="Gd" AND symbol_B!="") 
   GROUP BY Symbol_B

 

For the intersection of two elements, we glue the two like the above one via the INNER JOIN function:

SELECT * FROM
(
   SELECT symbol_A as symb,val1
      FROM `dbl014`
      WHERE  symbol_B="Gd"
      GROUP BY Symbol_A

   UNION

   SELECT symbol_B as symb, val1
      FROM `dbl014` WHERE (symbol_A="Gd" AND symbol_B!="")
      GROUP BY Symbol_B
) AS A

INNER JOIN

(
   SELECT symbol_A as symb,val1
      FROM `dbl014` WHERE symbol_B="Y" 
      GROUP BY Symbol_A

   UNION

   SELECT symbol_B as symb, val1
      FROM `dbl014` where (symbol_A="Y" AND symbol_B!="") 
      GROUP BY Symbol_B
) AS B
USING (symb)
ORDER BY symb

 

And for the ultimate quattro, it gets even exciting! :

SELECT *
FROM (

    SELECT symbol_A AS symb, val1 AS val
    FROM `dbl014`
    WHERE symbol_B = "Fe"
    GROUP BY Symbol_A
    UNION
    SELECT symbol_B AS symb, val1 AS val
    FROM `dbl014`
    WHERE (
        symbol_A = "Fe"
        AND symbol_B != ""
    )
    GROUP BY Symbol_B
) AS A
INNER JOIN (
    SELECT *
    FROM (

        SELECT symbol_A AS symb, val1 AS valA
        FROM `dbl014`
        WHERE symbol_B = "Gd"
        GROUP BY Symbol_A
        UNION
        SELECT symbol_B AS symb, val1 AS valA
        FROM `dbl014`
        WHERE (
            symbol_A = "Gd"
            AND symbol_B != ""
        )
        GROUP BY Symbol_B
    ) AS A
    INNER JOIN (
        SELECT *
        FROM (

            SELECT symbol_A AS symb, val1 AS valB
            FROM `dbl014`
            WHERE symbol_B = "Ga"
            GROUP BY Symbol_A
            UNION
            SELECT symbol_B AS symb, val1 AS valB
            FROM `dbl014`
            WHERE (
                symbol_A = "Ga"
                AND symbol_B != ""
            )
            GROUP BY Symbol_B
        ) AS A
        INNER JOIN (
            SELECT symbol_A AS symb, val1 AS valC
            FROM `dbl014`
            WHERE symbol_B = "Y"
            GROUP BY Symbol_A
            UNION
            SELECT symbol_B AS symb, val1 AS valC
            FROM `dbl014`
            WHERE (
                symbol_A = "Y"
                AND symbol_B != ""
            )
            GROUP BY Symbol_B
        ) AS B
        USING ( symb )
    ) AS C
    USING ( symb )
) AS D
USING ( symb )
ORDER BY symb

and voila!

a total of 42 all in all. The "val" columns are there for verificational purposes only – since after applying the GROUP BY function, they all lose their meanings.

A code to write code (as in a war to end -all- wars – 8P)

June 19, 2008 Posted by Emre S. Tasci

Lately, I’ve been studying on the nice and efficient algorithm of Dominik Endres and he’s been very helpful in replying to my relentless mails about the algorithm and the software. Now, if you have checked the algorithm, there is a part where iteration is needed to be done over the possible bin boundary distributions. Via a smart workaround he avoids this and uses a recursive equation.

For my needs, I want to follow the brute-force approach. To requote the situation, suppose we have K=4 number of classes, let’s label these as 0,1,2,3. There are 3 different ways to group these classes into 2 (M=2-1=1) bins obeying the following rules:

* There is always at least one classification per bin.

* Bins are adjascent to each other.

* Bins can not overlap.

The number of configurations are given by the formula:

N=(K-1)! / ( (K-M-1)! M!)   (still haven’t brought back my latex renderer, sorry for that!)

 

and these 3 possible configurations for K=4, M=1 are:

|—-|—-|—-|

0            3  : Meaning, the first bin only covers the 0 class while the second bin covers 1, 2, 3.

     1       3 : Meaning, the first bin covers 0, 1 while the second bin covers 2, 3.

         2   3 : Meaning, the first bin covers 0, 1, 2 while the second bin covers 3.

 

Now, think that you are writing a code to get this values, it is easy when you pseudo-code it:

for k_0 = 0 : K-1-M

   for k_1 = k_0+1 : K-M

      …

         for k_(K-3) = k_(K-4)+1 : K-4

            for k_(K-2) = k_(K-3)+1 : K-3

               print("k_0 – k_1 – … – k_(K-2) – K-1")

            endfor

         endfor

      …

   endfor

endfor

easier said than done because the number of the k_ variables and the for loops are changing with M.

So after some trying in Octave for recursive loops with the help of the feval() function, I gave up and wrote a PHP code that writes an Octave code and it was really fun! 8)

<?
//Support for command line variable passing:
for($i=1;$i<sizeof($_SERVER["argv"]);$i++)
{
    list($var0,$val0) = explode("=",$_SERVER["argv"][$i]);
    $_GET[$var0] = $val0;
}

$K = $_GET["K"];
$M = $_GET["M"];

if(!$_GET["K"])$_GET["K"] = 10;
if(!$_GET["M"])$_GET["M"] = 2;
if(!$_GET["f"])$_GET["f"] = "lol.m";

$fp = fopen($_GET["f"],"w");
fwrite($fp, "count=0;\n");

fwrite($fp, str_repeat("\t",$m)."for k0 = 0:".($K-$M-1)."\n");

for($m=1;$m<$M;$m++)
{
    fwrite($fp, str_repeat("\t",$m)."for k".$m." = k".($m-1)."+1:".($K-$M-1+$m)."\n");
}

fwrite($fp, str_repeat("\t",$m)."printf(\"".str_repeat("%d – ",$M)."%d\\n\",");

for($i=0;$i<$m;$i++)
    fwrite($fp, "k".$i.",");
fwrite($fp,($K-1).")\n");

fwrite($fp, str_repeat("\t",$m)."count++;\n");

for($m=$M-1;$m>=0;$m–)
{
    fwrite($fp, str_repeat("\t",$m)."endfor\n");
}

fwrite($fp,"if(count!=factorial(".$K."-1)/(factorial(".$K."-".$M."-1)*factorial(".$M.")))\n\tprintf(\"Houston, we have a problem..\");\nendif\n");

//fwrite($fp,"printf(\"count : %d\\nfactorial : %d\\n\",count,factorial(".$K."-1)/(factorial(".$K."-".$M."-1)*factorial(".$M.")));\n");

fclose($fp);

system("octave ".$_GET["f"]);
?>

Now, if you call this code with for example:

> php genfor.php K=4 M=1

it produces and runs this Octave file lol.m:

count=0;
for k0 = 0:2
    printf("%d – %d\n",k0,3)
    count++;
endfor
if(count!=factorial(4-1)/(factorial(4-1-1)*factorial(1)))
    printf("Houston, we have a problem..");
endif

which outputs:

0 – 3
1 – 3
2 – 3

And you get to keep your hands clean 8)

 

Later addition (27/6/2008) You can easily modify the code to also give you possible permutations for a set with K items and groups of M items (where ordering is not important). For example, for K=5, say {0,1,2,3,4} and M=4:

0-1-2-3
0-1-2-4
0-1-3-4
0-2-3-4
1-2-3-4

5 (=K!/(M!(K-M)!)) different sets.

The modifications we make to the code are:

* Add an optional parameter that will tell the code that, we want permutation instead of binning:

if($_GET["perm"]||$_GET["p"]||$_GET["P"])
{
    $f_perm = TRUE;
    $K+=1;
}

via the f_perm flag, we can check the type of the operation anywhere within the code. K is incremented because, previously, we were setting the end point fixed so that the last bin would always include the last point for the bins to cover all the space. But for permutationing we are not bounded by this fixication so, we have an additional freedom.

* Previously, we were just including the last point in the print() statement, we should fix this if we are permutating:

$str = "";

for($i=0;$i<$m;$i++)
    $str .= "k".$i.",";
if(!$f_perm) fwrite($fp,$str.($K-1).")\n");
else fwrite($fp, substr($str,0,-1).")\n");

The reason for instead of now writing directly to the file in the iteration we just store it in a variable (str) is to deal with the trailing comma if we are permuting, as simple as that!

and the difference between before and after the modifications is:

sururi@dutsm0175 Octave $ diff genfor.php genfor2.php
10a11,16
> if($_GET["perm"]||$_GET["p"]||$_GET["P"])
> {
>     $f_perm = TRUE;
>     $K+=1;
> }
>
21c27,30
< fwrite($fp, str_repeat("\t",$m)."printf(\"".str_repeat("%d – ",$M)."%d\\n\",");

> if(!$f_perm)fwrite($fp, str_repeat("\t",$m)."printf(\"".str_repeat("%d – ",$M)."%d\\n\",");
> else fwrite($fp, str_repeat("\t",$m)."printf(\"".str_repeat("%d – ",($M-1))."%d\\n\",");
>
> $str = "";
24,25c33,35
<     fwrite($fp, "k".$i.",");
< fwrite($fp,($K-1).")\n");

>     $str .= "k".$i.",";
> if(!$f_perm) fwrite($fp,$str.($K-1).")\n");
> else fwrite($fp, substr($str,0,-1).")\n");

for the really lazy people (like myself), here is the whole deal! 8)

<?
//Support for command line variable passing:
for($i=1;$i<sizeof($_SERVER["argv"]);$i++)
{
    list($var0,$val0) = explode("=",$_SERVER["argv"][$i]);
    $_GET[$var0] = $val0;
}


$K = $_GET["K"];
$M = $_GET["M"];

if(!$_GET["K"])$_GET["K"] = 10;
if(!$_GET["M"])$_GET["M"] = 2;
if(!$_GET["f"])$_GET["f"] = "lol.m";

if($_GET["perm"]||$_GET["p"]||$_GET["P"])
{
    $f_perm = TRUE;
    $K+=1;
}

$fp = fopen($_GET["f"],"w");
fwrite($fp, "count=0;\n");

fwrite($fp, str_repeat("\t",$m)."for k0 = 0:".($K-$M-1)."\n");

for($m=1;$m<$M;$m++)
{
    fwrite($fp, str_repeat("\t",$m)."for k".$m." = k".($m-1)."+1:".($K-$M-1+$m)."\n");
}

if(!$f_perm)fwrite($fp, str_repeat("\t",$m)."printf(\"".str_repeat("%d – ",$M)."%d\\n\",");
else fwrite($fp, str_repeat("\t",$m)."printf(\"".str_repeat("%d – ",($M-1))."%d\\n\",");

$str = "";

for($i=0;$i<$m;$i++)
    $str .= "k".$i.",";
if(!$f_perm) fwrite($fp,$str.($K-1).")\n");
else fwrite($fp, substr($str,0,-1).")\n");

fwrite($fp, str_repeat("\t",$m)."count++;\n");

for($m=$M-1;$m>=0;$m–)
{
    fwrite($fp, str_repeat("\t",$m)."endfor\n");
}

fwrite($fp,"if(count!=factorial(".$K."-1)/(factorial(".$K."-".$M."-1)*factorial(".$M.")))\n\tprintf(\"Houston, we have a problem..\");\nendif\n");

fwrite($fp,"printf(\"count : %d\\nfactorial : %d\\n\",count,factorial(".$K."-1)/(factorial(".$K."-".$M."-1)*factorial(".$M.")));\n");

fclose($fp);

system("octave ".$_GET["f"]);
?>

After the modifications,

php genfor.php K=5 M=3 p=1

will produce

0 – 1 – 2
0 – 1 – 3
0 – 1 – 4
0 – 2 – 3
0 – 2 – 4
0 – 3 – 4
1 – 2 – 3
1 – 2 – 4
1 – 3 – 4
2 – 3 – 4

When you have some nice discrete random variables, or more correctly, some sets of nice discrete random variables and given that they behave well (like the ones you encounter all the time in the textbook examples), calculating the entropies and then moving on to the deducing of mutual information between these sets is never difficult. But things don’t happen that well in real situations: they tend to get nastier. The methods you employ for binary-valued sets are useless when you have 17000 different type of data among 25000 data points.

As an example, consider these two sets:

A={1,2,3,4,5}

B={1,121,43,12,100}

which one would you say is more ordered? A? Really? But how? — each of them has 5 different values which would amount to the same entropy for both. Yes, you can define two new sets based on these sets with actually having the differences between two adjacent members:

A’ = {1,1,1,1}

B’ = {120,-78,-31,88}

and the answer is obvious. This is really assuring – except the fact that it doesn’t have to be this obvious and furthermore, you don’t usually get only 6 membered sets to try out variations.

For this problem of mine, I have been devising a method and I think it’s working well. It’s really simple, you just bin (down) your data according some criteria and if it goes well, at the end you have classified your data points in much more smaller number of bins.

Today, I re-found the works of Dominik Endres and his colleagues. Their binning method is called the Bayesian Binning (you can find the detailed information at Endres’ personal page) and it pretty looks solid. I’ll still be focusing on my method (which I humbly call Tracking the Eskimo and the method itself compared to the Bayesian Binning, seems very primitive) but it’s always nice to see that, someone also had a similar problem and worked his way through it. There is also the other thing which I mention from time to time : when you treat your information as data, you get to have nice tools that were developed not by researchers from your field of interest but from totally different disciplines who were also at one time visited the 101010 world. 8) (It turns out that Dominik Endres and his fellows are from Psychology. One other thing is that I had begun thinking about my method after seeing another method for cytometry (the scientific branch which counts the cells in blood flow – or smt like that 8) applications…)

References:

D. Endres and P. Földiák, Bayesian bin distribution inference and mutual information, IEEE Transactions on Information Theory, vol. 51, no. 11, pp. 3766-3779, November 2005 DOI: 10.1109/TIT.2005.856954

D. Endres and P. Földiák, Exact Bayesian Bin Classification: a fast alternative to Bayesian Classification and its application to neural response analysis,
Journal of Computational Neuroscience, 24(1): 24-35, 2008. DOI: 10.1007/s10827-007-0039-5.

D. Endres, M. Oram, J. Schindelin and P. Földiák, Bayesian binning beats approximate alternatives: estimating peri-stimulus time histograms,
pp. 393-400, Advances in Neural Information Processing Systems 20, MIT Press, Cambridge, MA, 2008 http://books.nips.cc/nips20.html