computer science, math, programming and other stuff
a blog by Christopher Swenson
2014-01-25

Adobe Source Code Pro

Just thought I would mention, since I haven't talked about fonts for a while, that I would give a short update.

About a year ago, I decided to try Adobe Source Code Pro, a new fixed-width font.

Since I kind of forgot that I was using it, and the experiment has gone on a year, I'll consider it a success.

I definitely notice that Source Code Pro is better suited to the desktop and laptop world: the font has a bit higher heights and feels a little more natural at lower dpi, larger screens (compared to Droid Sans Mono).

It's free. Give it a shot.

P.S. Also, the web site got a bit of a facelift!

2013-08-25

A Minimalistic RSS Reader

Saddened by the demise of Google Reader a few months ago, I completely gave up on RSS feeds and just cut myself off from the world.

A few days ago, I finally decided it was time to move on. Unfortunately, I haven't been super pleased with the options available out there. Plus, I realized I didn't want to get burned again when they inevitably close and break my heart.

So, like dozens of other people, I just decided to write my own RSS feed reader.

One user

One fundamental design decision I made early on is that this system would only support one user, and so I can throw out 95% of the complexity of designing the system right off the bat.

Examples:

  • No security. I can throw web page up under an Apache proxy with SSL and basic auth to allow only me in. I don't need to mess with logins or anything.
  • No database. The entire state of the program will just be written out as a JSON blob periodically and read in on startup.
  • No frills. I only use a few basic features, so I'm only going to implement those. This also means that the page looks incredibly ugly.

Open

I wrote it in Go (requires 1.1+) and open sourced it at https://github.com/swenson/littlereader under the MIT license. Feel free to use it, adapt it, whatever.

It's not finished, but it has nearly every feature I need right now.

2013-07-21

Day Camp 4 Developers

I'll be talking at this week's Day Camp 4 Developers on version control (or source code control, or source code management, or whatever we want to call it this week). It'll be an introductory talk introducing why you should use version control, how it works in general, and then going into some specific workflows with git, the notorious version control system. I've had a lot of fun writing the talk, so it should be a lot of fun to hear it.

There are several other great talks that are going to happen:

  • Lorna Jane Mitchell: Practical Software Estimation Techniques, or "How Late?"
  • Thursday Bram: How to Make Good Documentation a Regular Part of Your Day
  • Brendan Wovchko: How to Speak Business & Eradicate Confusion in Software Development
  • Me: Source Code Management and Version Control

It's only $40 to sit on this great set of talks. Friday, July 26, 2013. 8am PST to 12:30am PST.

2013-06-01

Configuration files in Go

The other day, I was starting to port an existing service I had into Go. There were a lot of issues that I had to tackle to get the functionality I wanted, including being able to run in at least four different environments: test, dev, stage, and prod.

There are a lot of "standard" ways to do this, most focusing on some sort of text or structured file that you load at runtime using file I/O.

However, in dynamic languages, a somewhat common practice is to use a file in that programming language as your configuration. So, in Python, you might have a settings.py file that is actual executed Python.

In non-scripting languages, like Java, you normally have an XML, YAML, INI, or JSON file that you read in. But, I've seen at least one non-scripting language, Clojure, that encourages using an executable Clojure file for configuration.

The primary argument against using a file in your programming language itself is that the compile time may be long, and deploying a brand new binary just to change the config file is laborious and slow.

But, I thought, Go doesn't have this limitation: Go compiles super fast, and the binaries tend to be reasonably sized, so deploys won't be that big of a deal.

So, can we just use Go code to be our configuration file?

Definitely. I wrote up a quick template (released under CC0, so feel free to copy and use) for a configuration file in Go. There's a small amount of boilerplate, but it is super easy to compromise.

https://github.com/swenson/goconf

There are four key parts:

  • var config = getConfig() – this triggers the configuration file to be read at initialization time. You can also use an init() function to do this.
  • type Config struct { ... } – specify all the variables you want in your config file
  • func readConfig() Config { ... } – populate a Config struct based on your environment, which I do via a switch statement.
  • Set your environment (ENV environment variable) when running

That's it. This is a pretty straightforward and easy way to do config files in Go.

2013-04-07

When to Start Testing and Why

I didn't begin to really appreciate the value of proper testing in code until well into my current software engineering career. Testing wasn't something that came up during my formal education in computer science, since it isn't really relevant to the academic pursuit (except for perhaps a course in software engineering, most code you write academically is more about learning or exercising something than creating production code). Now, though, I love testing, and love how it helps ensure stability in my code.

Two questions that come up when I talk to other people, especially relatively new programmers, are when should you learn how to test in a language or environment, and when in your project do you start testing on a given project?

To answer the first question, the best time to learn to test (in my opinion) is after have started to get a good handle on the language and environment you're in, especially if this is the first language you've ever learned. The simple reason for this is that testing is often almost like an entirely new sub-language you have to learn, and it can be a little confusing to dive right into it.

However, you really don't want to put it off for too long, especially if you've never written test code before. Writing good, idiomatic, testable code is an important skill, and living too long in a vacuum of untested code might make it harder to learn to write testable code.

What do I mean when I say "testable code"? Generally, to me, testable code means that you've designed each class or function to be easily tested in isolation. Often, this means making it slightly more explicit than you might naturally write, writing more, smaller functions and extracting out dependencies (such as database connections) so that you can test each small piece.

After all, if your function uses a database connection, you aren't probably too concerned with launching an entire database to test that function and testing that the database writes worked – you are more interested in validating your logic that is wrapped around it. When you call insert_entry, does it convert values correctly, does it call the appropriate methods that would invoke the database, etc.? In general, you can accomplish testing independent of the database by mocking out the database layer when testing out the checkbook logic.

Going back to our second question, on when do you start testing a given project? I like the philosophy of test-driven development (TDD), where you write the tests before you write the code. This is doubly true if this is code for a client, for your job, or some other code that is meant to be consumed by people other than you, or running any kind logic dealing with other people's data. Furthermore, writing tests first helps you to figure out the architecture of your project a lot better, and to make sure that your architecture is testable from the beginning.

Writing tests before you write every piece of your software is a good way to do practice defensive coding: ensure that every piece of your programs and libraries are extraordinarily well tested before they're even written. As you write more functionality, keep growing those tests as to ensure that everything cooperates well and that nothing new is going to break the old.

2013-01-01

Scala and Clojure

My primary language for the past few months has been Scala, with some occasionaly bouts of Java.

I've also started to learn Clojure. The company I am working for is into the JVM stack, and these languages are both based on the JVM.

As a long-time Java programmer, I appreciate the cleanliness of Scala, and I've grown pretty comfortable with it over the past few months. It's sort of the language that Java really wanted to be when it grew up: all that OO goodness that we love, but without so many mistakes as Java has, and with a helpful smattering of functional programming.

That said, Scala has some flaws, including some fairly deep ones.

  • IDE support is pretty terrible. I tried to use Eclipse and IntelliJ early on, and they were both incredibly broken. I've never gotten them to work well. I just use Sublime Text 2 these days, and I find that Scala is a clean enough language that this isn't too big of a deal.
  • Implicits are just a bad idea. The standard library uses them sparingly and fairly well, but this is such a dangerous feature. (For those of you who don't know Scala, implicit functions are used to automatically convert an object of one type to another, under the hood.) It makes sense in a few limited cases, but I groan when I see them now.
  • The standard fundamental data types are wasteful and have poor performance. The List and Map data types are optimized for adding things to them, but in doing so create lots of extra copies and have very poor access properties. List, the most fundamental structure, is a damned linked list! This makes me want to cry. OpenHashMap is better, at least.

That said, I much prefer Scala to Java at this point.

Clojure, on the other hands, I am much newer to, and don't have nearly as solid experience with yet, so take my points with a grain of salt.

Clojure's obsession with immutability is nice from the point-of-view of thread-safety and (theoretical, yet mostly unrealized) concurrency, but at a high cost. Its memory footprint seems to be the worst of the JVM languages, and it defaults to using reflection for most method calls, so it is significantly slower. (Even with proper type annotations, it is slower.)

Furthermore, I think Clojure is just fundamentally harder to read. I've done my fair share of Lisp programming in the past, and I think that Lisp is just a lot less natural to read, probably because we spend so much time with infix languages.

Plus, I would posit that Clojure's premise is doomed to failure. A Lisp that doesn't support tail-call optimization and proper recursion seems broken. (And yes, I know about loop/recur, but that is essentially a functional for-loop.) Scala does some rudimentary TCO, though it doesn't always tell you the function you wrote can use it.

The biggest plus for Clojure is that it is JVM (which is also a weakness). It's a plus because it means you can take advangate of the Java ecosystem (logging, libraries, experience, etc.). Which is probably the primary reason we are using it instead of some other functional languages.

That said, I will be spending quite a bit of time this year learning both in greater detail, so perhaps I will update with further opinions on them.

(I'm also thinking about getting more into Go this year, and perhaps finally learning Erlang.)

2012-10-10

Things I've Learned At Google, Part Two

I've spent the past 19 months working as a software engineer at Google. They have adopted a lot of great practices, some completely their own, and I certainly learned a lot from working them. I thought I would share some of the particular insights I had while working there.

This first post was about software engineering. This post is more about general workplace practices.

If you want to have the best software engineers, then hire smart people and treat them well.

I've worked at places in the past where they are good at hiring smart people, but then tend to treat them poorly. Then, they wonder why the leave in droves for companies, like Google, who aren't afraid to buy you a nice chair and keyboard.

If you want to have great employees, then you have to treat them like it.

Open source licensing is important.

Open source code is a lot harder than you'd think, and even the most competent software engineers will sometimes try to get away with not following the rules. The license implications in things like the GPL and AGPL are serious business. Before using any code that touches GPL or AGPL (or other similar licenses), educate yourself well on them, or designate someone at your company to be the open source czar to make sure you aren't messing this stuff up. It's not really hard or anything: it's just a little tedious, and incredibly important.

Shoes are optional.

I worked at mostly "business casual" places prior to Google, but Google goes straight for "casual", bordering on "who cares". And for all that, I never noticed it as a problem. Unless you are violating the health code, do what makes you productive and happy. (For instance, I am more productive when not wearing shoes, but wearing pants (or at least a kilt). YMMV.)

Working from home can be fine.

Personally, I work better in an office. But, my Google office is about 1 or 1.5 hours away. After a few weeks, I figured out a system where I could be very productive working from home — certainly efficient enough so that there is a net gain in productivity due to commuting time. So, I spent a lot of time at Google working from home, and did quite well. (Early on, I spent a lot more time in the office, as I was still bright-eyed and bushy-tailed and was still trying to understand all of Google's inner workings, and this required being able to communicate with my coworkers more.)

Don't be a gatekeeper unless it is really important.

As organizations grow, certain groups start to feel important, entitled, or some other emotion, and begin injecting themselves into every process in the business. Some of these are really important, like privacy and security. For instance, a lot of managers believe that they are not doing anything unless they change something, and so they sometimes like to start sticking themselves in processes they don't belong in.

My favorite instance of this at Google is that one of founders has to approve the application package of every single new hire (at least, for software engineers). Even though hiring is important, I am a little doubtful that his review does anything other than make him feel better.

Becasue of gatekeepers, you can end up with a giant checklist of people who have too much of a say in your project because someone in the past didn't have due diligence. This can really eat at the velocity of your project, especially small projects (where a just a few people have to deal with all of the gatekeeprs), and definitely at the morale of the team.

Google is large, and is brimming with gatekeepers. Which I find interesting, because Google has a really neat system for getting rid of gatekeepers: readability.

Let me explain readability. If you are new to Google or new to a programming language, most of your first code needs to be reviewed by an expert in that field — a Java expert for example, who can say that your Java code meets all of the standard guidelines to maintainability, readability, and general sanity. After a while, you will become recognized as an expert as well (through a formal process), and you won't need to have this be part of every review anymore.

Perhaps a solution to gatekeeping would be to have more readability-like things: if you've launched a few projects that tackled certain thorny areas very well, then maybe you now have "readability" in those areas, and we can trust you a little bit more, and remove some bureaucracy from your plate. It's an idea at least. But the moral of the story is: don't insert yourself into the bureaucracy and become part of the problem unless there is no other choice.

Gatekeepers are a symptom of trust and communication mismanagement. When your product is ready for launch or has made some significant progress, all of a sudden, a bunch of gatekeepers (like your senior managers) take an interest in your project. Some of them will have looked at your project in its early phases to greenlight the concept, but to many of them, they may not have ever heard of you before. And, especially at the very highest levels, they only have 1 bit of communication down to you: launch or cancel.

This low-bandwidth communication is bad, especially if they see problems with your product or direction, and unfortunately, you may have just wasted a lot of effort (perhaps months or years of engineering) on something that is going in the scrap pile. This is perhaps avoidable by trusting the management below you to make the right decision early on. A senior manager having so much drastic control over a project sends a mixed message about trust.

Writing is still incredibly important.

Don't get me wrong, I understood this pretty well before. But even at Google, writing is still important. For one thing, any large company probably has a convoluted promotion process, and Google's, while leaner than some, still requires a lot of paperwork. Paperwork means writing, and if you can't communicate effectively about your own accomplishments, then you are just not going to get promoted, at Google or anywhere.

I've seen people time and time again tell me that "they just aren't good at writing about themselves" — this has been nonsense every time. They were fine at writing about themselves, they just didn't like to. In which case, too bad: if you want to get promoted, then just do it.

If you don't write about it, it didn't happen.

Kids aren't nearly as good at computers as I thought they were.

While not strictly workplace-related, this is something fun I learned at Google.

While working from Google, I was often the token Google engineer that was paraded around for local high schoolers that we would do some outreach for. I would answer questions from "how does search work?" to "do they really let you drink at work?". It was rewarding and I really enjoyed talking with them.

However, they were all just terrible at technology, despite all that "kids these days" should be masters of all computers, cell phones, and whatnot. They should be tumblring circles around my desktop-loving self when it comes to, well, everything, but I'm pretty sure most of them couldn't program a VCR, or whatever the modern equivalent to that is. (For my younger readers, programming a VCR is perhaps the second most trivial "techie" thing to do in my generation, only slightly above having an alarm clock with the time set correctly (rather than blinking 12:00)).


I'm sure I learned more at Google, but those are perhaps some of the important things I felt I learned while working there.

I can't wait to see what I learn at my next job. :)

2012-10-08

Things I've Learned At Google, Part One

I've spent the past 19 months working as a software engineer at Google. They have adopted a lot of great practices, some completely their own, and I certainly learned a lot from working them. I thought I would share some of the particular insights I had while working there.

This first post will be about software engineering. Shortly, I'll post another about general workplace stuff.

First, an overall thought: Google was definitely the most professional place I've worked in terms of software engineering. However, just like any software engineering methodologies, their software engineering Kool-Aid is good, but you don't want to drink all of it.

Code reviews are a must.

Prior to Google, I had done very little formal code review as most projects we were simply "too busy" to "take time out" to do them. They are not even a little bit optional at Google: every change to the codebase must undergo at least one person's review, and possibly many more depending on the exact nature of the change and the person making it. At first, I was very skeptical about this, but I learned that there is so much much value in having this formal process, for a multitude of reasons.

  1. First, if you have a small team, code reviews are a good way to keep up with what other people on your project are doing, both on a personal level ("Jane is working on feature X"), as well as understanding what the code does. This means that it will be easier for you to make a modification to the codebase, if it is necessary, or even to take ownership of it should something happen (i.e., team member moves on).
  2. Code reviews don't take as long as you think.
  3. Code reviews significantly increase the quality of the code produced, both in readability, but in catching bugs and poor design decisions. Perhaps not as much as true pair programming, but with a lot fewer of the disadvantages.
  4. Unfortunately, code reviews have one downside: it is often tricky to work on two different reviews on the same part of the codebase at the same time. This depends heavily on your version control system — distributed version control systems, like git, tend to support this pretty well. But even so, there is some cost in terms of "mental swap space", as sometimes it is difficult to focus on an entirely different change until you hear back on your pending change. The best way to avoid this is make sure your team prioritizes reviewing code above pretty much all else, so that your other team members aren't waiting on you. (Code reviews are a bit like a turn-based game, and when it is the other person's turn, it can seem like it takes forever.) I always tried to have a spare project to work on that I could hack on a little while waiting.

IDEs are worth it for large projects.

Prior to Google, I despised IDEs. I still don't have a lot of love for them, but reusing Google's extensive codebase in my own code finally broke me out of my IDE-less existence: there is simply too much code to try to wrangle without features like autoimport, name completion, refactoring, etc. I chose Eclipse, because I was doing Java work and Eclipse is well-supported inside Google. As slow as Eclipse is, it saved me a lot of time hunting through docs looking for the right class, with the right class name, the right function name method, etc.: I could just tab-complete my way to freedom a lot of the time.

Build systems still suck.

Google has their own build system that integrates with, well, everything. But it still sucks. Not as much as autotools, SCons, Makefiles, but it still sucks. Where, oh where, is the build system of my dreams? To be fair, this was probably the best build system I have ever used.

Python and other untyped languages make life harder for large projects.

For small projects or stuff that I am doing by myself, I love Python. The time it saves me is tremendous. However, nothing is more frustrating than programming with a large, unknown codebase in Python. I would often see code like this, even in a well-manicured codebase:

def ReadFromDatabase(self, query, metadata):
  """Perform the query on the database."""
  ...

So... what is query? A string? What is its format? Could it be a tuple, a list? What is metadata? Essentially, the only answers to these questions were to go digging through documentation, look for other code that calls that function, read through the function itself. And sometimes I just don't have time to go on a coding adventure: I want to call the method and go on with my day. Java, for all of its faults, at least tells you the type information of the parameters, so I have a strong hint for my adventure. This might involve navigating factories, factoryfactories, and all other kinds of abominations, but at least I would have a good start.

It should be easy to run your program.

Google programmers love command-line flags. They sprinkle them everywhere. The problem is: they often specify the default values elsewhere, perhaps in a script that only runs in production. It can make running a program locally difficult, which in turns makes testing and debugging hard. The moral I learned is that it should be dead simple to build, run, and test your program. Hunting around for documentation on the perfect incantation magic is a real bummer.

Test ALL the things.

Before Google, I hadn't really practiced a lot of TDD. Sure, when I had a small piece of code that had very well-defined inputs and outputs, and if I had time, I would create some doctests (in Python) or the equivalent elsewise, but it was never a priority. At Google, good testing is the state religion. If it doesn't have tests, it doesn't exist. No excuses. I don't care how complex the code is: it has to be tested. Writing the tests firsts often helps, but doesn't always make sense.

2012-09-26

Blog Redo

I got tired of the old site having issues, so I rewrote the site using a custom static site generator.

Things pretty much work. I am in the process of moving old Disqus comments over to the new system, and working with a small issue or two there.

I should have some new material up soon as well!

2012-04-28

What Every Programmer Should Know About Memory

I was just thinking about this paper today: Ulrich Drepper's "What Every Programmer Should Know About Memory".

http://www.akkadia.org/drepper/cpumemory.pdf

If you haven't read it, and you are a programmer, then you really need to read this paper.