My pretty face [ László Monda's Blog ]
Exploring the cyberspace, one quadrant at a time!
 
Main Page | Blog | Projects

Let's backup our tweets using twitter-backup.sh

September 5th, 2010

I've written a very simple BASH script to backup my tweets.  It's very easy to use:

$ ./twitter-backup.sh
Usage ./twitter-backup.sh TWITTER-USERNAME
$ ./twitter-backup.sh mondalaci
2010-09-05 17:46:04 URL:http://twitter.com/statuses/user_timeline/mondalaci.xml?page=1 [40899/40899] -> "twitter-backup-mondalaci-2010-09-05_17-46-03/1.xml" [1]
2010-09-05 17:46:06 URL:http://twitter.com/statuses/user_timeline/mondalaci.xml?page=2 [42928/42928] -> "twitter-backup-mondalaci-2010-09-05_17-46-03/2.xml" [1]
2010-09-05 17:46:07 URL:http://twitter.com/statuses/user_timeline/mondalaci.xml?page=3 [42753/42753] -> "twitter-backup-mondalaci-2010-09-05_17-46-03/3.xml" [1]
2010-09-05 17:46:09 URL:http://twitter.com/statuses/user_timeline/mondalaci.xml?page=4 [42784/42784] -> "twitter-backup-mondalaci-2010-09-05_17-46-03/4.xml" [1]
2010-09-05 17:46:11 URL:http://twitter.com/statuses/user_timeline/mondalaci.xml?page=5 [42872/42872] -> "twitter-backup-mondalaci-2010-09-05_17-46-03/5.xml" [1]
2010-09-05 17:46:11 URL:http://twitter.com/statuses/user_timeline/mondalaci.xml?page=6 [6465/6465] -> "twitter-backup-mondalaci-2010-09-05_17-46-03/6.xml" [1]
2010-09-05 17:46:12 URL:http://twitter.com/statuses/user_timeline/mondalaci.xml?page=7 [75/75] -> "twitter-backup-mondalaci-2010-09-05_17-46-03/7.xml" [1]

I've followed the holy way of Unix, the KISS principle when developing this little script:

#!/bin/bash
 
if [ $# -ne 1 ]; then
    echo "Usage $0 TWITTER-USERNAME"
    exit 1
fi
 
username=$1
 
backup_dir=twitter-backup-$username-`date +%Y-%m-%d_%H-%M-%S`
mkdir $backup_dir
 
page=1
while true; do
    dest_file=$backup_dir/$page.xml
    wget -nv -O $dest_file http://twitter.com/statuses/user_timeline/$username.xml?page=$page
    page_size=`stat -c%s $dest_file`
 
    if [ $page_size -lt 1000 ]; then
        break  # We've reached a final, empty page so let's exit from the loop.
    fi
 
    page=$(($page+1))
done
 
rm $dest_file  # Delete the last, empty page.

Now let's download twitter-backup.sh and backup our tweets!

Overclock.net Mechanical Keyboard Guide Atom Feed

August 29th, 2010

I use RSS / atom feeds pretty much all the time to minimize information overload but the Mechanical Keyboard Guide of Overclock.net doesn't make my job any easier because they don't provide any feeds and the thread moves very fast.

I couldn't tolerate this anymore so I've created a webscraper that provides atom feeds for this thread. Parsing HTML into a DOM and executing XPath queries on the DOM is something that I have a vast amount experience with and this project didn't take a long time either. I've been testing it for more than a month and it's rock solid. The only glitch is that sometimes posts are randomized between very short time intervals which is a minor inconvenience.

The script below is executed on a hourly basis by cron and its content is saved to http://monda.hu/overclock-net-mech-keyboard.xml

< ?php
 
include 'config.php';  // include $database_{servername, username, password, dbname}
 
$start_url = 'http://www.overclock.net/computer-peripherals/491752-mechanical-keyboard-guide-10000.html';
 
function DOMinnerHTML($element)
{
    // Borrowed from php.net
    $innerHTML = "";
    $children = $element->childNodes;
    foreach ($children as $child) {
        $tmp_dom = new DOMDocument();
        $tmp_dom->appendChild($tmp_dom->importNode($child, true));
        $innerHTML .= trim($tmp_dom->saveHTML()) . ' ';
    }
    return $innerHTML;
}
 
function page_to_entries($html)
{
    $domdocument = new DOMDocument();
    @$domdocument->loadHTML($html);
    $domxpath = new DomXPath($domdocument);
    $xpath = '/html/body/div[5]/div/div/div/div/table';
    $nodelist = $domxpath->query($xpath);
    $entries = array();
    foreach ($nodelist as $node) {
        $link_node = $node->firstChild->childNodes->Item(2)->childNodes->Item(1);
        $id = $link_node->textContent;
        $url = $link_node->getAttribute('href');
        $username = $node->childNodes->Item(1)->firstChild->childNodes->Item(1)->textContent;
        $comment = DOMInnerHTML($node->childNodes->Item(1)->childNodes->Item(2)->childNodes->Item(8));
        $entry = array('id'=>$id, 'url'=>$url, 'username'=>$username, 'comment'=>$comment);
        $entries[] = $entry;
    }
    return $entries;
}
 
 
function query($sql)
{
    if (($result=mysql_query($sql)) === false) {
        die(mysql_error());
    }
    return $result;
}
 
function register_entry_and_get_timestamp($id)
{
    if (!is_numeric($id)) {
        die("Entry ID is not numeric!");
    }
    $result = query("INSERT IGNORE INTO mechanical_keyboard_guide SET id=$id");
    $result = query("SELECT timestamp FROM mechanical_keyboard_guide WHERE id=$id");
    $row = mysql_fetch_assoc($result);
    $timestamp = strtr($row['timestamp'], ' ', 'T') . 'Z';
    return $timestamp;
}
 
function add_updated_timestamp(&$entries)
{
    for ($i=0; $i<count ($entries); $i++) {
        $entries[$i]['updated'] = register_entry_and_get_timestamp($entries[$i]['id']);
    }
}
 
function print_entry($entry)
{
    $url = $entry['url'];
    $username = htmlspecialchars($entry['username']);
    $comment = $entry['comment'];
    $updated = $entry['updated'];
 
    print "<entry>\n";
    print "     <title>$username</title>\n";
    print "     <link href=\"$url\"/>\n";
    print "     <id>$url</id>\n";
    print "     <updated>$updated</updated>\n";
    print "     <summary type=\"html\">< ![CDATA[$comment]]></summary>\n";
    print "\n";
}
 
// Set up MySQL connection.
if (mysql_connect($database_servername, $database_username, $database_password) === false) {
    die('Failed to connect to the MySQL server.  Please check the $database_servername, $database_username, $database_password variables in config.php');
}
if (mysql_select_db($database_dbname) === false) {
    die('Failed to select the MySQL database.  Please check the $database_dbname variable in config.php');
}
 
// Set up cURL.
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
 
// Fetch last page.
curl_setopt($ch, CURLOPT_URL, $start_url);
$last_page_html = curl_exec($ch);
 
// Get the page ID of the page before the last page.
$last_page_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
preg_match('/-([0-9]+)\.html$/', $last_page_url, $matches);
$last_page_id = $matches[1];
$almost_last_page_id = $last_page_id - 1;
 
// Fetch the page before the last page.
$almost_last_page_url = "http://www.overclock.net/computer-peripherals/491752-mechanical-keyboard-guide-$almost_last_page_id.html";
curl_setopt($ch, CURLOPT_URL, $almost_last_page_url);
$almost_last_page_html = curl_exec($ch);
 
$almost_last_page_entries = page_to_entries($almost_last_page_html);
$last_page_entries = page_to_entries($last_page_html);
$entries = array_merge($almost_last_page_entries, $last_page_entries);
add_updated_timestamp($entries);
$last_updated_timestamp = $entries[count($entries)-1]['updated'];
 
 
print '< ?xml version="1.0" encoding="utf-8" ?>';
?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Overclock.net's Mechanical Keyboards Guide</title>
<subtitle>Overclock.net's Mechanical Keyboards Guide</subtitle>
<link href="http://www.overclock.net/computer-peripherals/491752-mechanical-keyboard-guide.html"/>
<updated>< ?php print $last_updated_timestamp ?></updated>
<author>
<name>Overclock.net's Mechanical Keyboards Guide</name>
<email>laci_nospam@monda.hu</email>
</author>
<id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>
< ?php foreach ($entries as $entry) print_entry($entry) ?>
</feed>
</count>

As for the SQL table structure, it's not particularly complex.

CREATE TABLE IF NOT EXISTS `mechanical_keyboard_guide` (
  `id` int(11) NOT NULL,
  `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Acer Aspire 8935G-874G100BN laptop disassembly

August 29th, 2010

I've disassembled my laptop a while ago. The CPU core temperature was pretty high, sometimes above 85 celsius. After disassembling and dusting it the temperature dropped by 20 celsius so the operation was a huge success. High temperature has many side effects like decreased durability, increased power consumption, lower performance and who knows what else.

If you decide to disassemble your laptop then I can promise you few things. First, you'll be challenged as laptops are very highly integrated and if you're coming from the PC world you'll quickly realize that it's a completely different ballgame. Second, it's pretty likely that you'll appreciate your laptop more as you'll be able to see the vast scale of integration and all its components.

As for me, I'm absolutely fascinated by the internal design of laptops. I think these gadgets truly symbolize the level of technological advancedment that humanity has reached so far.

I originally wanted to detail the whole disassembly process but I've realized that it'd take too much of my time. Despite of this I hope that you'll enjoy the show. Let's get some popcorn and let the ultimate geek porn begin.

Let's see some video of the actual dusting.

G-Cube GUA-54A USB hub disassembly

August 29th, 2010

This hub uses the most popular USB 2.0 hub IC, the GL850G. I'm not sure whether the manufacturer used ultrasonic welding or glue, but this hub cannot be disassembled without significantly damaging the case and I pretty much hate such solutions. USB hubs pose a perfect example of how much additional value an ODM can provide on top of an OEM.

Noname card reader disassembly

August 29th, 2010

It's interesting how one chip handles 4 types of cards. I wonder whether the various card specs are that similar or whether the IC is that highly integrated.

Edimax PS1206UWG print server disassembly

June 8th, 2010

Have you ever seen a print server from the inside?  I can appreciate its two floor design since it's very compact.

3D printing service providers

May 13th, 2010

I'm collecting the ones that alow you to upload STLs and give you a quote:

I'd love to see contributions to this list.

Open Hardware stores

April 25th, 2010

I try to collect all Open Hardware stores that I know but you're encouraged to let me know about others in the comments.

Making a helping hand that actually helps

March 31st, 2010

I've seen some helping hands offered by various shops and I'm not impressed at all. Their price is usually dirt cheap and I can say from experience that you get what you pay for. The only viable solution is to build a kickass helping hand for yourself. Fortunately, Instructables has some really great tutorials:

I've been trying to get the parts for a while but it's very challenging to source the hoses. If I won't be able to get them from Hungary then I'll order them from a foreign country.

Open Hardware Revolution

February 14th, 2010

I'm very passionate about open hardware. I'm into FOSS software for a long time since about 2000 when I completely switched to Linux, but I've only recently became conscious that it's possible to create hardware by individuals or small groups.

Hardware is not that fascinating to me in itself. Sure, lots of big companies create well-designed and quality hardware, Apple being one of the most well known amongst them, but I'll never buy their products because these devices are locked and not designed to be exploited to reach their full potential. Putting OpenWrt into my ASUS WL500GPV2 is the best example I can think of how one can make his/her device a thousand times more powerful and customizable by replacing the stock firmware. Unfortunately, it's necessary to buy closed hardware in most cases because there are not many open alternatives but this situation can change in the future and whenever I can I choose open hardware.

In the Next Industrial Revolution, Atoms Are the New Bits is a fascinating read for anyone interested in the open hardware revolution. Atoms Are Not Bits; Wired Is Not A Business Magazine has lots of though provoking arguments and Are atoms the new bits? discusses the mentioned issues even further.  I don't really think that open hardware will ever take over the world and will replace closed hardware. The big manufacturers fiercely protect their intellectual property and most consumers couldn't care less whether they can hack a given piece of hardware because they just wanna use the damn thing (with all its shortcomings, being unaware of its full potential).

Hackers are a different breed. There are a several hundred open source projects out there, the most relevant ones being present on Harkopen, Open Innovation Projects and Open Manufacturing. Reprap is the flagship project of the revolution and rightly so because it's very rare for the open hardware community to create something this complex and well working, even if the quality of the created models lags way behind the commercial alternatives. I think open hardware is not so widespread because 1) most of the projects are technical minded and aren't practical for the average Joe, 2) most creators are only interested in implementing, not distributing the projects, 3) these teams don't have any marketing / business experience and 4) the economies of scale are against us (until we conquer the world).

I definitely have to work on 3) but the Ultimate Keyboard is gonna be ready in the not too distant future. I don't mind learning non-technical stuff to make it happen.