Removing duplicate rows in SQL

Since I’ ve had to look this up twice in the last year, I should blog about it so I’ll remember. This is the quickest solution I’ve found.

Problem:
You have duplicate rows in your database. Even if you have an AUTOINCREMENT primary key, that doesn’t necessarily prevent you from having rows with identical information.

Solution:
Find a column that will uniquely identify a record. If there isn’t one column, find a combination of columns. If there isn’t a combination of columns that uniquely identify your row, you probably have database design issues.

CREATE TEMPORARY TABLE nonduplicates LIKE table_with_duplicates;
INSERT INTO nonduplicates SELECT * FROM table_with_duplicates GROUP BY unique_column;

To use a combination of columns, do:

INSERT INTO nonduplicates SELECT * FROM table_with_duplicates GROUP BY unique_column_1, unique_column_2, etc...;

If you have a very large data set, this technique is going to take a lot of time and resources. It may not even be possible due to memory constraints. In that case, here is an article on analyzing duplicate rows without using temporary tables. In essence, you JOIN a table to itself in a subquery that matches all of your unique_columns against themselves with a HAVING clause that checks for an id lower or higher than the MAX(id) or MIN(id) depending on whether or not you want to keep the older or newer records. You could also use a timestamp field for this if there is one. This allows you much finer control over which records you keep. And in the time it will take you to understand my explanation, you could have read the (commented!) SQL code in the link above and understood it much better.

Of the many languages I work in and have worked with, SQL is the most unintuitive to me. I ran across the above link while idly searching for links on SQL and set theory. I can think of functional languages in terms of mathematical functions, but I don’t have that mapping in my brain with SQL. I think it’s the syntax. It’s always felt so awkward and COBOL-ish to me. I guess I’m spoiled by Python.

Posted in Geek, Web Development | Tagged , , , | Leave a comment

Life with OS X on my Hackintosh

So on to what I’ve learned from getting hacked. I’ve learned that Windows is not to be trusted with anything. I’m still not sure whether I got the virus by transferring a file from another computer or if it was from a Vista security hole.

My experience with my Hackintosh has been very positive, except it (a Dell Mini 10v) doesn’t get anywhere near the battery life of my Samsung NC310. It doesn’t auto-dim the screen when on battery power.

It also has problems with hibernation/safe sleep. I’m getting the message “hibernation image too old” and this is after I had to use the pmset command in Terminal. Is there really no menu setting for this in OS X? I have no problem going into the command line, but I’d expect Apple to have some shiny, lickable interface for this.

Firefox 3.6 is almost unbearably slow on this. Chrome feels more or less like a desktop browser if you reasonably limit your tabs. Safari is fine, but not quite as peppy as Chrome.

Two-finger vertical scrolling works, which is nice, but the Dell Mini 10V’s  trackpad is a bit uncomfortably small. I can’t get horizontal gestures, three-finger gestures, or pinch and zoom to work, but maybe this is possible and I’m just not doing it right.

I’m sure that a lot of these problems could be solved with fairly simple tweaks. I haven’t had time to research it as I’m working hard on the new version of Yodelscope. I didn’t even hack this Hackintosh myself, which is shameful to my nerd status. An Apple mega-nerd friend of mine gave it to me to play with Apple’s various dev kits (He also makes an iPhone app called How Did I Get Home? which you should definitely check out). I will get around to turning this thing into a triple-booting little beast one of these days.

I also ran across this really interesting Cult of Mac interview with former Apple CEO John Sculley. I’m too attached to the idea(l)s of open source to join the Mac (or more appropriately Apple) cult, but this article certainly does help cement Jobs’ mythical status. As a young nerd, I read a lot about Apple, but many things focused on Steve Wozniak’s technical genius rather than the organizational/design genius of Jobs. Call me culturally clueless, but this interview is what first drew this to my attention.

Posted in Geek | Tagged , , , , , | Leave a comment

Site hacked!

If you came here or to some of my other sites recently, you may have noticed that Google blacklisted them as site distributing malware. This was a result of a virus that infected my PC and used a keylogger to steal passwords. Then, on Sept. 25, it used those passwords to download files via FTP and re-upload them with malicious Javascript files included from addonrock.ru and alienradar.ru. It did this to any file named index or default with an extension of .php or .html. It also affected many .js files, adding a document.write call to insert malware scripts.

The way this all started is that I had a Windows Vista laptop that was sitting around forever and when I connected it to the Internet, it must have gotten infected before I had a chance to apply essential Windows upgrades. This was months ago and I don’t think that infection specifically was the keylogger. I cleaned that virus up as much as I could using advice from www.bleepingcomputer.com and tools like Malware Bytes’ Malware Defender and Spybot Search & Destroy. Unfortunately, I don’t think that totally removed whatever virus it was. There was some part of the trojan left lurking around, dormant. I think it later re-activated to download another part or an updated version of itself, which is what stole my passwords. I’m still not sure whether to just totally wipe the hard drive on my Windows laptop. Painstakingly removing all traces of the viruses could a learning experience, but I can think of slightly more fun ways to spend my time.

So, now I’m typing this from OS X, on  a little Hackintosh netbook, which I got from my pal over at Dopie’s Life. It’s probably a stepping stone to an all-Linux life. Or maybe I’ll get really paranoid about security and use some customized NetBSD install with encrypted everything. I’m already looking at encrypting my files using something like TrueCrypt.

Posted in Uncategorized | 4 Comments

Microsoft Silverlight broke Netflix

I went to stream some movies from Netflix today and got a DRM error (N8151) instead of the film I wanted to watch. After a quick call with Netflix’s very helpful staff, they informed me that it was a problem with Microsoft Silverlight’s latest version and I would have to downgrade to version 3 (download here: http://www.microsoft.com/getsilverlight/handlers/getsilverlight.ashx?v=3.0) in order to fix it. Apparently the latest version of Silverlight just refuses to accept the certificate.

This worked, but…. so annoying! I develop using SSH/Emacs (and screen in case I get disconnected) so I can generally use any platform. I work on OS X, Windows, and Linux on different contracts. All my home systems are Windows for a bunch of programs unavailable elsewhere. PuTTY, Cygwin and XAMPP allow me to do work. I’ve never been convinced of the innate superiority of Apple (especially now that I own an iPhone), but between this and getting a virus on my Vista laptop by doing NOTHING except being online, I’m getting pretty fed up.

Posted in Uncategorized | Leave a comment

New Section: Portfolio!

On the Portfolio page, you’ll find links to all the websites I can think of that I’ve worked on. Wow, this is starting to resemble a real website.

Posted in Uncategorized | Leave a comment

Welcome!

I’m Michael Buell and this is my blog where I’ll be posting various things related to my profession: programming and software development, usually for dynamic websites and web applications.

My languages of choice for web development are Python, Django, and jQuery. I’ve also done a fair amount of work in PHP… under duress. I have done work on everything from e-commerce sites to iPhone in-app purchase servers to custom enterprise tools, all using open source technology wherever possible.

I’ll be discussing all of these topics, plus whatever other nerdery springs to mind. This will also be the repository for all my notes on programming, server setup, etc. Partly for my own reference, hopefully to help other people solve the same problems.

Alright! I look forward to hearing from you, my non-existent audience!

Posted in Uncategorized | Leave a comment