Dec17

Responsible administration for you and your users

certificates servers administration | comments

When I was constructing my own websites, for myself or for clients, I was always bothered by one thing:

I didn't have an SSL certificate for any of my domains.

This is due to many reasons: SSL certificates are tied to a single domain, they cost a lot of money, and they are tied to an IP address, which can make shared hosting hard. Furthermore, I don't feel that shelling out money for some SSL authority to issue me a certificate really means anything – why are they qualified to say anything (good or bad) about my website.

Unfortunately, there are only two alternatives: don't use SSL, or use a self-signed SSL certificate.

Not using SSL is not really an option if you are sending sensitive information of any kind over the Internet. For example, the password to login to the backend of any site.

If you have a user-facing website that is selling things, i.e., asking people for their credit card information, you don't have much of a choice. You really need to use SSL, and you have to use a real registered SSL certificate, since you don't want to scare away users.

However, for almost any other case, I think that they are just not necessary. Take this website, for example: the only person who logs into this domain is me. I certainly trust any certificate I sign, so a self-signed certificate makes a lot of sense. I can go ahead and "Accept the risk", as Firefox asks me to, and confirm an exception for my own site. I will then sleep easy knowing that my password is sent encrypted.

Even if I make a site for a client, I would be happy to give them the option: we can use this self-signed certificate for you to login to the backend, or they can front the cost for a "real" SSL certificate. However, nobody likes to waste money, and buying a certificate for only a few people to use a backend is a silly idea.

Creating a self-signed certificate with OpenSSL is easy, and installing it in Apache is also easy.

Don't let you or your clients down by neglecting to give them the option for free SSL.

Extra credit question: why aren't there are any free SSL signing authorities out there?

Sep26

Assembly language programming under OS X with NASM

nasm mac osx assembly programming | comments

One of my favorite passions from my teenage days was assembly language programming. Don't laugh.

It embodies a lot of my favorite things about programming: I have total control, it is clean and simple, and it is just and fast and functional as I am capable of making it. The only thing standing in the way of me and world domination is how well I can program.

Well, with some minor exceptions. Having a decent assembler is really key, and for x86 architectures, there are many choices. If you are hacking a small function together to support some higher-level language, then maybe you can get by using MASM (for Windows/DOS), or, if you really hate yourself, gas (for every platform).

But, if you intend to spend some time programming in assembly and not hating every minute of it, then you need to use NASM, probably the best assembler for the x86 family, ever (or possibly Yasm, a NASM clone that I have no experience with).

So, this past week I was interested in playing some more with NASM, and so I thought that I would see how what I could do under OS X (I previously worked primarily in DOS). Unfortunately, assembly language support in OS X is fairly hampered if you want follow its standard calling conventions for 32-bit x86 code. The innocuous-looking statement "The stack is 16-byte aligned at the point of function calls" seems innocent, but is a nightmare if you using external calls.

Basically, this means that you have to keep very close track of your stack size when calling functions. And even worse is that your stack never enters your function correctly aligned: the return address is always 4 bytes long, meaning you are always 12 bytes off when you start.

What this means is that I may have to use a VM with Linux to have any fun with assembly language programming again.

If anyone is interested, here is a quick NASM file I threw together that demonstrates how to use NASM on OS X to call glibc functions. I tested it with the latest NASM (2.07) on Snow Leopard (you'll probably need XCode installed to get this to work). This program prints "Hello World", allocates some memory using malloc, uses that memory to write 10 letters of the alphabet on the screen (using printf), frees the memory, and returns.
;
; Basic OS X calls to glibc
;
; compile with:
; nasm -g -f macho malloc.asm
; gcc -o a.out malloc.o
;
 
; glibc stuff
extern _puts, _printf, _malloc, _free
 
; static data
segment .data
 
hello_world_str db "Hello world!", 10, 0
int_str db "Address %x", 10, 0
 
; code
segment .text
 
global _main
 
_main:
        push ebp ; setup the frame
        mov  ebp, esp
 
        sub  esp, 4 ; align the stack
        push dword hello_world_str
        call _puts
        add  esp, 4
 
        ; malloc 16 bytes
        push  dword 16
        call  _malloc
        add  esp, 4
 
        ; check if the malloc failed
        test  eax, eax
        jz    fail_exit
         
        sub   esp, 0xC ; align the stack
        mov   ebx, eax
        push  ebx
        push  dword int_str
        call  _printf
        add   esp, 8
        add   esp, 0xC
 
        ; print "A\nB\n..."     
        mov   [ebx], dword 0xD41 ; 'A\n'
 
        mov   edi, 10
        push  ebx
_loop:
        call  _puts
        inc  dword [ebx] 
        dec  edi 
        jnz  _loop
 
        add  esp, 4
         
        ; free the malloc'd memory
        push  ebx
        call  _free
        add  esp, 4
        add  esp, 4 ; cleanup the stack 
        pop  ebp
        ret
         
fail_exit:
        mov  eax, 1
        pop  ebp
        ret


The output should look something like this:
Hello world!
 
Address 100130
A
B
C
D
E
F
G
H
I
J


Am I the only one who likes assembly language programming these days?

Aug23

Gentoo Linux FTW?

linux gentoo funny | comments

For the past 6 years or so, Gentoo has been my preferred distribution of Linux on the server side* for many reasons.

But, that doesn't stop me from laughing uncontrollably at Gentoo's Uncyclopedia entry. Pull quote from the top:

“Stop staring at my output; you have no life!”
  ~ GCC on Gentoo users


I have to admit — Gentoo is Linux at its most pretentious and obnoxious. Maybe I'm an elitist, but I still use and love it.

Why? What keeps me emerge-ing year after year?

  • I know Gentoo. I've been working with it for years, and when things go wrong, I know where to look and how to fix it. This is also partially due to the fact that it has (arguable, and in my opinion) one of the simplest overall system designs. I don't need a GUI to fix it, which is really handy for my remote installs where I won't even install a GUI.
  • It's fast. The rumors are true — if you know what you are doing and you take some time to do it, you can build a very fast system.
  • Upgradeable. The whole system can be easily brought up to speed by resyncing and emerging world. The only thing you ever have to reboot for is a kernel change, and you'll never have to reinstall to upgrade "versions". Gentoo doesn't have versions.
  • Up/downgradeable. If I want a custom-built version of, say, apache built, I can easily build such a thing and install it. If it is completely broken, I can easily uninstall it. And, 99% of the time, I won't ever have to open up the source and run ./configure --prefix=/opt/apache2 && make && sudo make install to do it, and I won't have to just hope that it won't clobber my files.


The first point is one primary reason I don't switch: I am locked in. I would maybe try Debian or Ubuntu for a server setup, but I'd be shaking in my boots waiting for something to break and not knowing where to start fixing the problem.

My biggest complaint about Gentoo? It doesn't deal well with bleeding edge Ruby gems. Meaning, Gentoo tries to install Ruby packages like Gentoo packages, but Gentoo packages are not always up-to-date or complete for all of the gems that are out there. So, there tend to be version problems and weird configuration errors if you start maintaining your own gem install somewhere. I'm not really sure whose fault this problem is.

Aside from this minor fault and the reputation that Gentoo gets for being elistist and complicated, I will be sticking to strange, bloated Pacman for the foreseeable future.





* My preferred distribution for desktop? Mac OS X. :) Sorry Linux, you just don't feel as smooth and silky in my hands with a GUI. The slick GUI that OS X gives me beats any minor incompatibility problems I have, or the fact that OS X uses strange broken BSD versions of some command-line tools, like sed.

Aug14

Brilliant programming

programming sorting searching | comments

Often, I like to read articles on particularly insightful programming techniques or algorithms. Sometimes these algorithms are for some fairly mundane things, like searching and sorting. (So mundane, of course, that Knuth wrote a whole book about it.)

Two pieces I have enjoyed recently.

First, timsort – the sorting algorithm used by Python. It describes an interesting approach to mergesort that is stable and incredibly fast, especially for "almost" sorted arrays.

Second is a class: strlen(), from glibc. strlen(), for you non-C programmers, finds the length of a null-terminated string. So, simply, given a location in memory, it finds the next zero byte. Sounds easy? Well, it is, but there are several tricks you can to do speed up the search by quite a bit.

Any other favorite snippets of code out there?

Jul13

Python Remote Debugger Announcement

python debugging remote cherrypy | comments

Recently, I desired to have a simple Python function I could call with as little fuss as possible that would start up some kind of server that I could use to tell what is going on inside my program at a later date, without resorting to some kind of logging system or console output.

Behold, a simple Python Remote Debugger.

Well, it's not really a debugger (yet). It's more of a "Current State of the System".

There are two servers shipped with it: a plain, pickle-based server (meant to be used to develop richer applications, though it may be abandoned in the future), and a simple web server, based on CherryPy.

It's still in beta, but has many neat features, including:
  • Listing of all currently running threads. Each entry includes a snapshot of the values of all of their local and global variables, as well as the current stack trace.
  • Listing of all loaded modules
  • sys.path listing
  • SSL support
  • Simple username / password authentication


I would be more than happy to take feature requests. Though, the source is available, fairly simple (and hopefully easy to use), and licensed under the MIT license, so you are free to use or modify it as you see fit.

If I have time in the future, I would love to develop some kind of fancy Ajax-based actual remote debugger, with IPython-like functionality, but it probably won't happen for a while.

Jun13

Bypassing the Python GIL with ctypes

python programming gil ctypes c | comments

I recently read an interesting article (actually, the slides linked to) about the horror that is the Global Interpreter Lock in Python, especially with multicore CPUs. And I agree — in these cases, the GIL is painful.

My favorite way of bypassing the GIL is to use ctypes, a wonderful library that allows you to easily link to dynamic libraries and call the functions from C, with only a small amount of boilerplate (to map function calls, argument types, and return types).

The best feature of ctypes is that when a program is executing a ctypes function, it releases the GIL. Meaning that if you have more than one thread threads, and one of them is busy with a ctypes call, the other threads can go along their merry way.

In the slides above, he shows that Python CPU-intensive multithreaded applications slow down as the number of CPUs increase. Well, I decided to use a quick counterexample.

First, I create a C file to do some work for me, called test.c:
int test(int from, int to)
{
  int i;
  int s = 0;
 
  for (i = from; i < to; i++)
    if (i % 3 == 0)
      s += i;
 
  return s;
}
To compile this as a dynamic shared library under OS X, the following two commands can be used:
gcc -g -fPIC -c -o test.o test.c
ld -dylib -o libtest.dylib test.o
(Under Linux, replace this last line with ld -shared -o libtest.so test.o)

Then, we can use the following Python program to load the dynamic library and run a quick test (should work in Linux or OS X):
import ctypes
import ctypes.util
import threading
import time
 
testname = ctypes.util.find_library('test')
testlib = ctypes.cdll.LoadLibrary(testname)
 
test = testlib.test
test.argtypes = [ctypes.c_int, ctypes.c_int]
 
def t():
  test(0, 1000000000)
 
if __name__ == '__main__':
  start_time = time.time()
  t()
  t()
  print "Sequential run time: %.2f seconds" % (time.time() - start_time)
 
  start_time = time.time()
  t1 = threading.Thread(target=t)
  t2 = threading.Thread(target=t)
  t1.start()
  t2.start()
  t1.join()
  t2.join()
  print "Parallel run time: %.2f seconds" % (time.time() - start_time)
On my quad-core OS X box, I get the following output:
Sequential run time: 13.27 seconds
Parallel run time: 6.66 seconds
A pretty solid doubling of performance, which is what we would hope.

May17

Presentations now up on slideshare

presentations | comments

Someone recently requested that I post some of my older presentations on miscellaneous topics.

When I was more actively involved with teaching and research at The University of Tulsa, I would often give presentations on random things. Programming, research I was doing, my anal retentiveness, whatever. We often had weekly meetings for most of my graduate program, so I would give talks there, or at ACM meetings.

I have been tracking some of these down and posting them to my slideshare account, although I will maintain a list of them (as well as descriptions) here on this site (you can always see them on the slide bar.

As a sample, here is one that is based on some (abandoned) research I did on applying genetic programming to chess.

Apr25

Netbook

netbook wireless drivers linux ubuntu | comments

I finally convinced myself to buy a netbook recently. I am currently typing this post on it.

I got an Asus EeePC 1000HA – that is, it's got a 10" screen, 160 GB hard drive (regular, not solid state), and a 6-cell battery (4.5 hours or so battery life under real-world conditions, I estimate). I'm pretty thrilled about it: typing is fine (though the right shift key has some adjustment time), the screen is nice and crisp, and it is very snappy under Windows XP and in Linux. The 1.6 GHz Atom CPU is just enough to play Netflix Instant videos, which was the only thing I was worried about performance-wise.

My biggest complaint was the wireless performance in various types of Ubuntu Linux (the newest 9.04 Netbook Remix is by far my favourite). I'd see 500+ ms ping time to my router, and drop about half of my packets, etc. It was miserable, and pretty much unusable. It turns out that the open source Atheros driver doesn't support this particular chipset very well, and the proprietary madwifi-based driver from Atheros doesn't support it at all. The only decent solution was to use the ndiswrapper package with Asus's Windows driver, which makes me feel a bit dirty.

But regardless, with all that sorted out, I won't be booting up Windows much more.

Overall, I'm impressed and very happy with my purchase.

Apr19

Responsible Development Practices: Storing Sensitive Information

development passwords practices tiddlywiki | comments

I build and maintain web sites. I manage probably hundreds of online accounts or logins or passwords that I need access to semi-regularly. Organizing all of this information can be a bit of a pain.

I've heard people swear by spreadsheets or plain text files for storing this information, and there are some specialized programs out there meant to catalog and organize this for you.

My solution is one of the simpler: TiddlyWiki. It's a small (few hundred K), self-contained (one HTML file), free file you can carry with you to store lots of information. I simply create some big list tiddlers (what they call their wiki pages), each one of which links to a specific tiddler for one particular task. So, I have a list for all of the web sites I maintain, and the tiddler for each has MySQL information, passwords, and any other crucial information.

This is great, but there are two problems. One, I am putting all of critical info in one place. To fix this, I maintain copies of this wiki file in several places, such as my thumb drive, laptop, and DropBox.

The second catch is then, what if someone finds this file? Wouldn't they have access to everything? This is why encryption is important. Whenever the file needs to be transported somewhere (like, on my thumb drive or put into DropBox), I first encrypt it. GPG is nice for this, but in a pinch, you can just use the ever-present OpenSSL to encrypt a file with something like:
openssl enc -aes-256-cbc -salt -in file.html -out file.html.encrypted


Now, as long as I remember this password, I can safely pass this encrypted file around without any worries about losing my thumb drive in the airport.

For most of the actual machines I log into via SSH, I simply use public key authentication, that way I don't have to remember the passwords.

Feb16

The perfect programming language

ruby python programming c c++ gcc | comments

I've been doing a lot of programming lately, and in a multitude of different languages. For the most part, these are pretty "decent" languages: C, Python, Ruby, etc. However, none of the languages really makes me happy.

It's hard to define exactly what I mean by "happy". They're all, for the most part, fun to program in to various degrees. But I don't really feel like I have all of the power at my fingertips in any of them. For example:

  • Python is used for general purpose whatever. It's powerful, easy to read, easy to write, and fast.
  • C is used for speed. When I need to be screaming down the highway, I break down and code in C (perhaps callable in a higher-level language).
  • Ruby is used for much of my web development.


I don't really find Python that well-suited for web development, although I use it that way quite often, just as much as I find Ruby to be a bit odd at general purpose computing (say, doing a Project Euler problem). And I don't really break out the C unless I have to. It makes a two-week-long Python program turn into a three-month adventure in passing structs around.

Some people have commented that maybe what I need is C++, especially with the Boost libraries. It's not as easy as Python, but not as ridiculous as C all the time. And they may be right, but I'm not quite sold yet.

The primary reason for my displeasure with C++ is primarily two things. First, it's syntax is stuck back in 1969. Why do we need so many braces, so many semi-colons? The overused and overloaded &, *, ^, etc. operators really bother me, as they make things just so terse and ugly. I mean, if I wanted to see a statement that looked like line noise, I'd use Perl or APL.

Second, related to the above, C++ is extremely powerful and large, and it takes a long time to master. I don't believe that languages should take a long time to master.

Which brings me to Ruby. Ruby is not quite as bad as C++ in the long time to master, though it's still a far cry from Python. And Ruby is almost what I need (I think it has some of the best syntax of any language ever), but it's still not quite there.

  • Ruby is slow. I can't really compile it, and it has so many language features that really bog it down, not all of which I feel are necessary (see next bullet).
  • Ruby is weird. Most people who have been exposed to OOP know what a class variable and an instance variable is, but what the hell is a class instance variable? Reading through The Ruby Programming Language (which is a great book, btw), Ruby has a lot of really complicated things going on when doing variable and method name resolution involved with the myriad of different ways things are inherited. It bothers me a lot.
  • Ruby makes it worse by attempting to explicitly declare the scope of a variable with var, $var, @var, @@var. Often, this just makes things worse.


So, Ruby isn't quite there.

Python? Well, it's much faster (an order of magnitude or thereabouts). It doesn't have a lot of weird inheritance issues (only some).

My biggest problem is the direction that Python is taking with Python 3.0. Changing map() and friends to return iterators? Making print a function? Further confusing me regard to text vs. unicode vs. data vs. bytes vs. strings vs. regular expressions? And so forth.

And even syntactically there are problems: Why are there colons at the end of def/if/for/... statements? And is it really necessary to have syntactically important whitespace? I don't mind it, but a lot of people hate this with a passion.

So, what I need is a language that can be compiled (and run at great speeds), with Ruby-like syntax, and a Python-like typing and object model.

That would be nice, thank you.




And yes, if I get some time, I might consider forking a project related to one of the above to get what I want.