I have just uploaded a single file to github which computes cosine similarity of two files. Cosine similarity is a widely used similarity measure of two texts. This simple code computes cosine similarity of two text files given as parameters. To use this, php-cli (PHP command line interpreter) must be working. The output is written to the STDOUT. I have made the cosine similarity code work in PHP. I have not implemented all of the code. The code is developed using the ideas and piece of codes in various web sites. I used this code in identification of topics of a microblog post set which was introduced in this post. Usage: php computeCosine.php file1 file2

The most common function used in Ethereum application development is the Keccak256 hashing. Keccak256 is used for Ethereum message signing, obtaining the wallet address given a public key, determining smart contract address and determining the function selector of smart contract to be provided as data in transactions given a function prototype (like in ABI encoding). I have looked around and could not find a useful implementation for this. They are either unfinished like this one or too complicated to use like this one, or this one.
All I need is a simple keccak256() function that returns keccak-256 of a hex encoded string. So I decided to do it for myself. I looked around a bit and found a clear C program in this page. All I need is to make this code work in PHP. Luckily PHP has a mechanism to extent its functionality. I followed the guide here. Then I obtained a working keccak256 function.
In this github page, I published the codes to be used by anyone interested.
Before installation, PHP developer files must be installed. To install keccak256() functionality:
  • git clone https://github.com/RnDevelover/Keccak256PHP.git
  • cd Keccak256PHP
  • phpize
  • ./configure --enable-keccak256
  • make
  • copy modules/keccak256.so to your php extension directory.
  • enable extension by adding extension=keccak256.so to your php.ini
$a="cc"; // Hex encoded string. All characters are [0-9a-fA-F].
echo $hash; // Hex encoded hash.

This function returns a string of hex encoded 256 bit data (64 characters where each two character indicates one byte).
Our previous study extracts human readable topics given a set of microblog posts. Based on the idea of identifying the topics of a crowd of microblog users, we have recently came up with semantically representing microblog topics for machine consumption. Source code of the prototype is published. To install topic identification approach in a linux machine follow the following steps.
  • Install R
  • Make sure that Rscript is running
  • Install php-cli (Php command line interface) version>5
  • Make sure that php-curl is installed
  • Make sure that shell_exec is working in PHP-cli
  • Obtain a TagMe API key
  • Download the SBounTI package and extract it in an empty directory
  • Edit cfg/config.php according to need (such as base urls of resources that will be produced and the TagMe API key)
  • Obtain a microblog post dataset about 5 thousand posts, either
    • in a file format of short texts in each line
    • or in a raw file retrieved from Twitter streaming API
  • Issue command:
    • ./sbounti <filename> "<dataset_name>" "<start_date>" "<end_date>"
      for the text file
    • ./sbounti <filename> "<dataset_name>"
      for the raw Twitter streaming API file
    Where <filename> is the file name of the file that has short messages, <dataset_name> that is used in the explanations of the resources expressed in OWL, <start_date> and <end_date> are valid start and end date-times of the post set in the format as in example: Wed Sep 21 11:01:56 +0300 2016.
  • The produced OWL file contents are written to STDOUT. So, you may want to redirect the output to a file using "> filename.owl" at the end of the command.
  • If you have questions please contact Ahmet Yildirim
