Our previous study extracts human readable topics given a set of microblog posts. Based on the idea of identifying the topics of a crowd of microblog users, we have recently came up with semantically representing microblog topics for machine consumption. Source code of the prototype is published. To install topic identification approach in a linux machine follow the following steps.
  • Install R
  • Make sure that Rscript is running
  • Install php-cli (Php command line interface) version>5
  • Make sure that php-curl is installed
  • Make sure that shell_exec is working in PHP-cli
  • Obtain a TagMe API key
  • Download the SBounTI package and extract it in an empty directory
  • Edit cfg/config.php according to need (such as base urls of resources that will be produced and the TagMe API key)
  • Obtain a microblog post dataset about 5 thousand posts, either
    • in a file format of short texts in each line
    • or in a raw file retrieved from Twitter streaming API
  • Issue command:
    • ./sbounti <filename> "<dataset_name>" "<start_date>" "<end_date>"
      for the text file
    • ./sbounti <filename> "<dataset_name>"
      for the raw Twitter streaming API file
    Where <filename> is the file name of the file that has short messages, <dataset_name> that is used in the explanations of the resources expressed in OWL, <start_date> and <end_date> are valid start and end date-times of the post set in the format as in example: Wed Sep 21 11:01:56 +0300 2016.
  • The produced OWL file contents are written to STDOUT. So, you may want to redirect the output to a file using "> filename.owl" at the end of the command.
  • If you have questions please contact Ahmet Yildirim
Next Post Newer Posts Previous Post Older Posts Home