Alan Fineberg: 2009

Sunday, December 27, 2009

What's wrong with PHP?

Just as it's easy to completely write off a band you don't care for (because they sold out? or their singer meets with world leaders and has a terrible op-ed album in the New York Times), the same often happens with technologies and languages.

For example, there was some point in my life where I told myself I would avoid any project heavily involving PHP.

But why? What's wrong with PHP? Yahoo and Facebook use it...and I haven't used it enough to warrant such strong feelings.

I wanted to explore this (baseless?) apprehension further, and I thought of this characteristically absurd/profound quote from Perl creator Larry Wall:

"Perl is worse than Python because people wanted it [to be] worse."

Larry Wall

The goodness of Perl v. Python is a common religious argument. The above quote squelches the debate once and for all, but begs further questions.

Who is they? Why did they want Perl to be worse? Do the words I added ([to be]) change what Larry actually meant?

Consider this Mind Hacks post about trends and their origins; the concepts extend to any sort of idea that can grow legs. Trends in technology are the same way, and spread because a distant idea reached some critical mass of connectivity and settled in with your own circle of peers.

Developers can vote for or against any technology by speaking for or against it, but their most powerful influence is in choosing which technologies to devote their energies toward learning and improving. Enthusiasm for a project leads to expertise contribution, and contribution leads to enthusiasm.

Conversely, people who have a distaste for a technology will discourage others. They (the technologies) will languish if their community does not grow.

In other words, labeling a technology good or bad is a self-fulfilling prophecy given enough influence and/or critical mass. Soon companies using the "good" technology will flourish with resumes and job postings for "bad" technologies will receive lesser attention, serving only to reinforce existing stereotypes.

I'm largely paraphrasing Paul Graham. He wrote a great essay on this phenomenon, which he calls "The Python Paradox".

I believe this is what Larry was getting at. And it explains my own PHP aversion, as my experiences were underwhelming at worst, but the number of developers I've heard lament PHP convinced me to stay away, as if it's the bad part of town.

Arguing over which language is better or worse is a waste of time; what matters is the consensus of the larger community (see wikiality). Even if it's unjustified. But just like a neighborhood can turn around, opinions change, and that's why phrases like "JavaScript Renaissance" are thrown around. And although I can't shake all the PHP prejudice, I wouldn't make a point of avoiding it (PHP that is), either.

Thursday, December 24, 2009

Easy hexagonal tiles with css

I just started building an online game with appengine as a learning exercise, and I'm attempting to do so without flash, java, or any other plugin.

There's nothing to play yet, but I already have something that is generically useful: a hexagonal tile layout purely in html and css. It's pretty clean too, check it out:

The Django Template:


{% load mod %}


<html>


<head>


       <link type="text/css" rel="stylesheet" href="/stylesheets/board.css">


</head>


<body>                                                                                      


   <div id="board-container">                                                                     


       {% for i in rows %}                                                                        


           <div class="row {% if i|mod:2 %}offset{% endif %}">                                    


           {% for j in rows %}                                                                    


               <span class="hex">                                                                 


                   <img src="/img/hexagon.png" />                                                 


               </span>


           {% endfor %}                                                                           


           </div>                                                                                 


       {% endfor %}


   </div>


</body>


</html>

The mod filter came from here.

All this does is create a grid of hex images (100x100 transparent pngs, edited from this source image). Every other row is "offset" so that the hex pieces nest together.

Here's the css:


#board-container {

    border: solid black 5px;

    width: 875px;

    height: 700px;

}



.row {

    /* This has to be .hex width * # of cols */

    width: 800px;

}



.hex {

    /* The width and height depend on the size of the hex image. */

    width: 80px;

    height: 65px;

    border: none;

    float: left;

}           

.offset {       

    position: relative;

    /* half of the hex width. */

    left: 40px;

}

And the final result for a 10x10 grid:

The next step is to implement the game rules, ajax calls, and draw a sidebar with the playable pieces.

Thursday, December 3, 2009

TechFlash's credulous hackery

TechFlash is dedicated to covering the Seattle tech scene, just as TechCrunch is focused on the Silicon Valley tech scene.

Unfortunately TechFlash is TechCrunch's insecure younger step-sibling and apparently needs to issue one-sided press releases poorly disguised as blog post reportage in order to maintain its insider access.

Case in point:
Microsoft escalates war on piracy

War on piracy? Really? Is that like the war on terror? Or the war on drugs? Or the war on poverty?

Anyway, the article quotes Microsoft only, and offers no analysis or commentary on why piracy is rampant or whether this strategy is the an intelligent one. It offers no commentary beyond Microsoft's slant on the story.

The Microsoft source claims that everyday consumers are opposed to piracy "because they're increasingly frustrated and angry about the connection to viruses and malware."

Actually maybe the problem is that users can have something for free instead of paying for it, and it dents the bottom line of Microsoft's current business approach (which may or may not still be relevant). But you won't see such a nuanced view in this article, or any other quid-pro-quo TechFlash articles to be frank.

I criticize because I care--I want the Seattle tech scene to flourish, I want TechFlash to be informative and critical instead of parroting the mouthpieces of the region's big players. And I speak for myself only, of course.

In summary, I just posted a critical blog post, about a blog. What has my life come to?

Saturday, November 7, 2009

I just finished reading Goedel Escher Bach

GEB alternates dialogues and essays on the author's staggering range of fixations and obsessions in which Hofstadter adds his own remarkable insights and comparisons on topics including music, art, mathematics, genetics and philosophy. GEB's sprawling nature follows tangled paths but eventually loops back to the question of intelligence, souls, free will and self-awareness.

The unorthodox structure is what sets the book apart, but the braiding of ideas may lead to a takeaway of: "Ok, that's interesting, and you're clever, but what is the chief takeaway here?"
This is a common criticism which led led Hofstadter to author the psuedo-sequel "I Am a Strange Loop" (which I haven't read), in which he distills his arguments more directly and concisely (and personally -- he imagines a low-resolution "simulation" of his wife's mind within his own after her death).

Anyone who completes GEB comes away with an honorary computer science degree, as there's a heavy focus on computability theory, data and data structures, natural language processing, AI, recursion, memory, hardware, software, all under the guise of philosophy and investigating what gives rise to self-awareness and intelligence.

On top of this, the GEB reader receives an introduction to the art of Escher, Magritte, Bach, John Cage, as well as some basics of neuroscience, boolean logic, zen philosophy, number theory, music theory and molecular biology.

Although the number theory and biology chapters dragged, I was fascinated by the discussion of the Goedel incompleteness proof and the implications it had for the mathematics community at the turn of the century. There is also a fascinating section on AI that starts with Turing tests and pattern recognition and ends with the remarkable conclusion that a truly AI may be terrible at all the things that people may struggle with as well, such as rapid computation or chess playing.

At the least, GEB presents a multitude of ideas and food for thought. At its best, it instills lucid notions of how our minds work and what gives rise to life, language and intelligence.

Thursday, September 17, 2009

Zip those lists

A lot of people know about zip() in python but did you know it operates on a variable argument list?


a = [1,2,3]
b = [4,5,6]
c = [7,8,9]

>>zip(a, b, c)
[(1, 4, 7), (2, 5, 8), (3, 6, 9)]

Don't roll your own list transposition functions....
It's also useful for iterating over a couple of related lists:


for message, email in (messages, emails):
    hdrs, body = parse_email(email)
    assert message in body

Sunday, August 16, 2009

When a band I like comes to town, I'll know

The List is a comprehensive listing of all the known shows coming to the Bay Area listed by artist and by venue.

There are hundreds of concerts and nearly 1500 bands in the list so I threw together a script to scrape it and intersect those bands with the artists in my iTunes library.

The result is, I now know that these acts are playing in the near future:

atmosphere
bat for lashes
beirut
blink 182
butthole surfers
calexico
cat power
collective soul
dan deacon
deerhoof
deerhunter
dropkick murphys
elvis costello
fever ray
flipper
ghostface killah
girl talk
green day
grizzly bear
in flames
kenny rogers
lil wayne
m.i.a.
mastodon
meat puppets
mirah
modest mouse
mstrkrft
no age
nofx
pearl jam
placebo
porcupine tree
sunny day real estate
tenacious d
thievery corporation
tv on the radio
weezer
yo la tengo

The next step is to scrape the concert details as well, use fuzzy matching, run it automatically, and set up alerts.

But this only took 15 minutes to write in python and it would have taken me way longer to parse manually.

EDIT:

Instead of doing a set intersect, I now use difflib to find "close matches." It's slower but still runs start to finish in about 10 seconds, which is fine considering especially that the data changes infrequently.

I also unescape the ampersand in the iTunes xml, and filter out "The " because:


>>> import difflib
>>> difflib.get_close_matches('foo', ['the foo', 'foods'], n=1)
['foods']

...an exact match preceded by 'the' is penalized more than a suffix. So 'pixies' would match 'pixiestickers' instead of 'the pixies' in the case where I only select the top match (since ideally there's a one-to-one mapping).

Instead of writing my own fuzzy matching algorithm, for now I'll just chop off 'The ' and live with the results. Although some of the matches aren't useful, it does better at finding bands such as ...and you will know us by the trail of dead and The Ting Tings.

Tuesday, August 11, 2009

timsort visualization

This blog post is quite effective at illustrating the timsort algorithm, found in python (and soon java)

Saturday, August 8, 2009

vim zen moment

There comes a certain time in one's life to put aside the variety of editors they might use or sometimes dabble in, and perhaps choose one they like best, or see the most potential with down the road, and work monogamously toward advanced proficiency in this editor, regardless of the bumps in the road or hardships which may provoke longing for other editors along the way.

I've made this commitment to vim recently and I'm still a novice.
But I just had what may be my first true zen moment with the editor!

The problem:
* I needed to fix a single spacing annoyance in a set of over 40 php files.

The solution:
* Open all php files rooted in foo, luckily 90% were all at the same leaf level
vim /foo/*/*/*\.php
* Start recording a macro in register a
qa
* Make edits
[a fair bit of jj and dd and i etc.]
* Write changes
:w
* Open next file
:bn
* Stop recording
q

After pressing @a a few times to execute the macro, or :bn if the file could be left alone, I was done.

I've always wanted to be able to do this sort of thing so easily, but it's elusive or cumbersome with most GUI editors. Many UNIX text munging tools exist too, but it's often easier to take direct route of showing the machine what you want, rather than, say, building a DFA.

So, I'm liking the taste of vim kool-aid thus far.

Sunday, July 19, 2009

Smack talkin' D.R. Hofstadter delivers sick iceburn on R. Kurzweil

This is a profound interview with Douglas Hofstadter, author of Goedel Escher Bach and I Am a Strange Loop.

Money Quote/Excerpt:

I think Ray Kurzweil is terrified by his own mortality and deeply longs to avoid death. I understand this obsession of his and am even somehow touched by its ferocious intensity, but I think it badly distorts his vision. As I see it, Kurzweil's desperate hopes seriously cloud his scientific objectivity.

...Kurzweil sees technology as progressing so deterministically fast (Moore's Law, etc.) that inevitably, within a few decades, hardware will be so fast and nanotechnology so advanced that things unbelievable to us now will be easily doable. A key element in this whole vision is that no one will need to understand the mind or brain in order to copy a particular human's mind with perfect accuracy, because trillions of tiny “nanobots” will swarm through the bloodstream in the human brain and will report back all the “wiring details” of that particular brain, which at that point constitute a very complex table of data that can be fed into a universal computer program that executes neuron-firings, and presto — that individual's mind has been reinstantiated in an electronic medium...

Rather ironically, this vision totally bypasses the need for cognitive science or AI, because all one needs is the detailed wiring plan of a brain and then it's a piece of cake to copy the brain in other media. And thus, says Kurzweil, we will have achieved immortal souls that live on (and potentially forever) in superfast computational hardware — and Kurzweil sees this happening so soon that he is banking on his own brain being thus “uploaded” into superfast hardware and hence he expects (or at least he loudly proclaims that he expects) to become literally immortal — and not in the way Chopin is quasi-immortal, with just little shards of his soul remaining, but with his whole soul preserved forever.

Well, the problem is that a soul by itself would go crazy; it has to live in a vastly complex world, and it has to cohabit that world with many other souls, commingling with them just as we do here on earth. To be sure, Kurzweil sees those things as no problem, either — we'll have virtual worlds galore, “up there” in Cyberheaven, and of course there will be souls by the barrelful all running on the same hardware. And Kurzweil sees the new software souls as intermingling in all sorts of unanticipated and unimaginable ways.

Well, to me, this “glorious” new world would be the end of humanity as we know it.

Hopefully you're hooked by now, so read the rest!

Thursday, July 16, 2009

Loblaw's law

When you have numbers that are really really really big, you can stop talking about probabilities and start talking about laws.

Statistics are often unintuitive for people, and the Monty Hall Problem is a famous example of that un-intuition in action.

And of course, as humans, we distrust machines. And statistics are, so far, the best tool machines have for imitating humans in areas such as language and vision. How we hate these machines, with their statistics.

Yes, so if you are a human reading this, you may feel a certain amount of smugness knowing that language is a built-in feature for you; you can read an article and comprehend with absolute certainty its key points and discuss it in an intelligent and natural way. Even if a computer were able to analyze the same article, it could only offer up soulless suggestions of meaning with varying degrees of certainty, and with little or no actual intelligence.

But as it turns out, you and I live in a crazy universe not governed by laws so much as statistics.

Mandatory Djikstra quote: The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.

On their own, atoms can be in several possible states, each with varying probabilities, where the most probable state has lowest energy.

With more heat added to a closed system such as a gas, the likelihood that each atom is in a higher energy state increases. Each macro state view of the entire system is equally likely, keeping in mind that particles are coaxed from lower micro energy levels with likelihood directly linked to the work performed on the system. All arrangements of molecules in the container are equally likely and they are all interchangeable with one another.

Now if you were to take each of these equally likely macro energy states and group them in buckets by total energy level of the system, the distribution would produce a bell curve. So, looking at this bell curve, you could state with some amount of certainty the likely range of energy (temperature actually).

EXCEPT, that's not how the universe actually works!!

The contradiction arises because the number of particles is on the order of 10^23.

So there's still a bell curve, but it's not really a curve so much as an extremely tall spike with almost zero width and almost no variance. Temperature and entropy and the first and second "laws" of thermodynamics all work because 10^23 is an enormous, enormous number, and the expected value (the center of the curve) is the only value anyone actually ever experiences.

To illustrate I'll create a law right now, named after a brilliant attorney, called Loblaw's law. Loblaw's law states that it is not possible to flip a (normal) coin 10^23 times and produce only heads each time. You may say, well technically that is still possible, so how can it be a law?

Well, technically it is (sort of) also possible for the entropy of the universe to decrease for a few minutes. Technically, it is also possible that heat could flow from a cold object to a hot object. Technically, a broken egg could assemble itself from the floor and leap back into my hand...blah blah blah, Loblaw's law is a law because these events are so unlikely that it they will never happen.

That's the way it goes with ginormous numbers, and that's why we can call them the laws of thermodyamics, although a better name for the field is actually Statistical Mechanics.

Back to computers, the point is that statistics is a strength, not a weakness of computers which may perform natural language processing, image recognition, or other tasks wherein humans naturally have the upper hand. Feed a somewhat sophisticated AI a few petabytes of salient data and I believe suddenly that Dijkstra quote will ring true.

Friday, July 10, 2009

Dtrace

Up until very recently I had never heard of Dtrace.

Sun Microsystems, makers of brilliant software but not money, created Dtrace for their Solaris OS. It allows you to create probes and listen on a port where syscalls or other OS events are reported.

This is awesome and also ported over to Mac OS X. This set of one-liners gives a small example of what's possible. If you're on a mac or FreeBSD or Solaris (!) and looking for a good time try running a few of these.

And if you have Instruments installed, you can launch apps and monitor their calls, file I/O, memory leakage, etc...

For my own personal Rube Goldberg machine, I might write a Dtrace script which listens for whenever files of a certain type are opened, and pipe them to a script which automatically backs them up using some version control.

Or similarly, automating other tasks such as running make whenever code files are written.

I was excited to learn about and had never heard of Dtrace, so I thought I'd share, enjoy!

Thursday, June 4, 2009

11 week Project Manager Manifesto

If you are taking a project course that lasts 11 weeks, and you are leading a group of 5-8 software engineers, congratulations, you are a project manager for CSE 403!

Read this post-mortem fresh from the source, the project manager on Robot Rock. Robot Rock is an open-source interactive music framework built in 11 weeks by 5 UW undergrads for CSE 403, a software engineering course.

Suggested reading:
The Pragmatic Programmer
How To Win Friends and Influence People

At the beginning

You are the one to set the tone. Work as hard as it takes to get the project off the ground, and demonstrate what your expectations are. The sooner you demonstrate that the project could be a success, the sooner you win the rest of the team's dedicated efforts, and you can't get far without that.

This is the time for experimentation--it will never get easier to test out different concepts!

For example, Robot Rock started with separate demos of drum sounds in a loop and a trivial non-functional UI. Mashing these together was a major milestone and morale boost, demonstrating the fact that all of our technologies can live and work together in the same environment in a doubtful time when we still struggled to set it all up.

It also provided a tangible way to demonstrate that a lot of the hard work was already done, and was useful for settling disputes through experimentation instead of abstract arguments (for example, when designing architecture based on the performance limitations of our libraries).

Establish regular meeting times during the week, and hold onto this inertia. Meeting often is the best thing your group can do.

Don't think of yourself as any sort of authority, think of yourself as the sniffable glue keeping your team happy and productive and it's up to you to figure out exactly how to make that happen! If you read Dune you know that as soon as you start giving orders, people stop acting autonomously, so don’t command, lead instead.

Testing

Write lots of tests in the beginning, but keep them fairly general/flexible since things will be fairly tumultuous at the start and tests will die quick and noisy deaths.

You need the tests for just two purposes: to verify and enlighten. Tests help maintain confidence and momentum, and allow you to breath easy knowing that you aren't breaking anything as you hurtle with breakneck speed through the rapid development which takes place after the design is settled. Of course, with all this code, you need a way to demonstrate what it all does. Tests are a great way to communicate to your team mates how to actually use the code you're writing.

Don't let tests prevent refactoring--accept that you will have to some tedious effort fixing them up. If you hate the thought of this, don't go hog wild writing tests for things likely to change, and write professional tests which don't have lots of repeated code.

Don't leave your tests broken.

Planning

Assembla is a hugely valuable tool. It integrates a ticket system, file hosting, milestone tracking, wiki, and repository with online code browsing.

Use the wiki to maintain your documentation and deliverables. Use the milestone tracking to organize and shuffle your tickets around in time.

As soon as you think of something that needs to be done, create a ticket for it.

Design

So, you're in 403, which means you're writing code for 11 weeks, then throwing it away, right? There's some truth to this, but if you write ugly code you can't use it to convince people to hire you. Often job postings require an example of code you've written to solve a non-trivial problem, and so this is your chance to shine to all future employers.

There are a lot of general things which constitute a good design, such as modularity, simplicity, modules which are decoupled, interfaces, etc.

This isn't really about bad design/good design though. You want a design which will make your high level goals natural to achieve.

For example, we knew at the start that it was critical for Robot Rock's music to feel very responsive and sensitive to user manipulations.

We also knew that we wanted the audio generation to have no influence over the rest of the design, in case we wanted to drastically change the way we generate the actual musical tones.

From these first principles, we could evaluate each potential design (everyone came up with a rough design as a homework assignment) and examine if it allowed for real time responsiveness, as well as decoupled abstract song data from the underlying generation. Once these requirements were met, we went on to examine other more general qualitative characteristics, such as separation of components and simplicity.

If we started with an arbitrary design goal, such as simplicity, we may have locked ourselves into a design where real time responsiveness was anything but natural, despite whatever other elegance is achieved. In other words, we would have picked A Good Design, but it'd be The Wrong Design.

Features

Build the simplest thing that meets the requirements.

If there's a feature you want to add, consult your users first. Make a prototype before investing the time in something which the user may not actually care about.

No optimization until feature complete, or later even.

Understand when code is good enough...I'm just parroting The Pragmatic Programmer at this point, so just read that instead.

Grow Experts

Assign each group member an area of ownership. It doesn't have to be something they are familiar with, just something they care the most about ideally. There should be no part of the project without an associated leader. Everyone can just refer to this person for questions, and it will deepen their understanding of the respective area. It also provides resume bullet points.

Not everyone will have the same knowledge level of the languages you use. It helps to have at least one person experienced with each language utilized, and they should oversee code reviews with the less experienced members, at least during the initial stages. It goes without saying that the instruction and learning are the crucial goals for these code reviews, never judgment.

Teamwork

I forget where I read this, but the difference between a team and a group is that a group all leaves a meeting at its deadline regardless of where things stand, whereas a team will stay as long as it takes to get things to a satisfactory point.

In practice this makes no sense. Sometimes people really do have to be somewhere at a certain time, and sometimes despite your best efforts, you have to force a less than ideal resolution. But the idea is not to leave things open and unresolved at the end of the day. Make some sort of plan for the next session, or if someone can stay and play cleanup, they should and earn the gratitude of the group. Make sure it's not the same person every week.

Get the entire group involved. If someone's quiet, explicitly invite them to weigh in with their position. Assign homework to the whole group to prepare a decision on items which affect the overall team course. Send around status updates during times when everyone's too busy on different areas to meet.

Code and documentation is communal. If any problem is found, it's your problem to fix, regardless of its origin. No drawing territorial boundary lines, if it's someone else's code, take it as an opportunity to learn about its inner workings.

When Conflict Occurs

Let everyone speak their minds, and focus on the ideas, not the person.

Try to be the first to find and acknowledge the weaknesses of your own idea, and the strengths of each opposing idea.

Find common ground. Find and point out things about ideas you disagree that are positive, and be sincere.

If the discussion drags on, put it to a group vote. Once the group decides, that's it, no revisiting the topic until taking action to make things work as they stand.

Think Big

Able developers seek worthy, difficult projects to grow their skills. Give yourselves a difficult and interesting project, so you'll attract hungry developers, and everyone will rise to meet the challenge.

Tuesday, June 2, 2009

How to buy coffee

If you're a consumer in the market for 'specialty' coffee (i.e. anything above Folger's grade) then there are a myriad of options, brands, and considerations designed to address concerns ranging from taste, to environmental impact, and social justice.

Fair Trade is the most well-known label, designed to protect the price points of workers on small coffee farms. Although it guarantees a floor price for the coffee and other protections for the workers, even experts who support Fair Trade, such as Mark Pendergrast, are quick to point out its limitations.

Fair Trade can only offer protections to a small percentage of the workers in the coffee trade since large and medium farms aren't eligible, and quality is left to its purchasers to determine.

Certified organic coffee may seem likely to benefit the health of coffee consumers, but its real impact lies elsewhere. In a speaking engagment at the University of Washington, Pendergrast noted that the pesticides affect the cherry hulls, and don't penetrate to the bean itself. Drinking organic coffee, then, does not necessarily offer a health benefit over traditional methods, but it does benefit the workers who harvest the cherries by hand, and are exposed to the poisons.

Beyond fair trade and organic, there is direct trade, which has no certification, and several bird friendly/shade grown certifications which have numerous variants.

With careful research, you may learn about the organizations and certifications involved for these different brands and labels, or discover a roaster which participates in Direct Trade practices which are agreeable with your purchasing habits.

Yet, for whichever stories come printed on the label or distributed in information on certification practices, however, they simply remain stories. The truth about the chain from the farm to the cup often remains unknowable opaque, and is spun by marketing professionals looking to cash in on the conscientious consumer of specialty roasts.

For a glimpse of what could be the reality of direct trade coffee, for example, look no further than this review of Zoka coffee roasters (my emphasis):

I worked for Zoka for a little over a year and in that time I saw: sexual harassment (from the head of the company which made the girl move out of town), failure to pay coffee farmers, threats from various upper management to the employees (I can't tell you how many times I heard the head of the company say "You have to like me, I'm your boss!", firing people for calling in sick and finally their finniest and latest: embezzlement of employees tips to make up for the money that was lost during a break in. The owner does not care about anything but himself and his money which he has no idea how to manage.

Although I'd like to believe that purchasing specialty coffee with certifications from international organizations is making the world a better place, clearly there is still a certain amount faith involved and the actual results are difficult to know.

My recommendation to the truly conscientious consumer: buy organic Hawaiian coffee. It may be expensive, but:

You are buying American products, backed by American labor laws.

It is delicious coffee.

The farmers are not exposed to harmful pesticides, and neither is the environment.

You can visit the farm yourself, or see pictures from tourists. I highly recommend visiting, though!

Otherwise, prepare to invest plenty of time reading literature about different biodiversity certifications or getting to know your local importer better than they may be comfortable with.

Tuesday, March 24, 2009

Project Euler

I started working on Project Euler since I had some down time during spring break.

I started working yesterday evening and did some more problems this morning. I've solved the first 10 but still am not even halfway to completing 'Level 1.' The good news is that most of the problems don't take longer than 10 minutes to start to finish, especially when using Python. Generators, list slice assignments and list comprehensions made things like iterating Fibonacci sequences and computing prime numbers fast and easy.

Saturday, March 14, 2009

What I've learned in each course this quarter

Networks:

Now that I've taken Networks, I believe I could describe, at some points in excruciating detail, everything that occurs when a user clicks 'Go' to load a webpage, to the page displaying on their screen.

This includes all the Operating System considerations, sockets, threading, buffering, packets/segments, error detection schemes, framing, Ethernet, DHCP, ARQ, NAT, STUN, TCP, IP, reliability, ordering, congestion control, routing, forwarding, heterogeneity of network topology, BGP; plus real-time application support, wireless protocol, and RFID thrown in just for fun.

Things we didn't learn about: DNS. But I can fill in most of the blanks on this one myself. And we didn't spend much time on application layer protocol such as HTTP, though we touched on SMTP some.

We covered so much in this class that my understanding of network principles didn't congeal until it was time to review for the final.

Algorithms:

We surveyed a suite of useful problems solved using Greedy algorithms, Divide and Conquer techniques, and Dynamic Programming. We also learned how to show that a problem is NP complete or prove that an algorithm is correct and/or efficient.

Key takeaways: the incredibly useful technique of dynamic programming to exploit redundant subproblems to solve a larger problem efficiently, and firm understanding of the definition of NP and NP-complete.

Technical Communication:

This class sharpened my understanding for and gave me practice with presenting technical information. Still, the one liner for this class remains:
Tell them what you're about to tell them, tell them, then tell them what you just told them.

Thursday, March 5, 2009

Warfare and network algorithms

How much does computer science owe to the military?

Well let's see...

The Department of Defense basically invented the internet, and with it "our entire field" to paraphrase the UW CSE department chair during one of the lectures he gave...on operating systems.

The Ford-Fulkerson network flow algorithm used to route traffic through nodes was invented in 1956. Not for sending oil along pipelines or routing civilian traffic--to determine if the Russian military could successfully move enough troops into neighboring countries using the roads currently in place to successfully invade them. The Cold War is responsible for so much useful science!

The limitations of reliable communication with TCP were known long before, based on the notion of two separate armies coordinating an attack on a territory from two fronts. If only one attacks alone, devastating losses are incurred. If both strike simultaneously, the territory is easily conquered. The only problem? Trying to agree on a time to strike, with an unreliable messenger running back and forth "routing" messages between the two on horseback or foot. What protocol would you use?

It's nice to think that most of the early computer science leaps and bounds were born out of the circle surrounding the hippie 60's XEROX PARC culture, but the truth is far more...crew cut.

Tuesday, March 3, 2009

Write-Only Memory

It is real.

Here it is, in all its Wikipedia glory

Out of frustration with the long and seemingly useless chain of approvals required of component specifications, during which no actual checking seemed to occur, an engineer at Signetics once created a specification for a write-only memory and included it with a bunch of other specifications to be approved. This inclusion came to the attention of Signetics management only when regular customers started calling and asking for pricing information. Signetics published a corrected edition of the data book and requested the return of the 'erroneous' ones.

And here is the April Fool's prank pulled by Signetics:

Saturday, February 21, 2009

Dynamic Programming

Dynamic programming is one of the more interesting Computer Science techniques I've learned in my time as an undergrad. And I can't believe I almost received my diploma without really knowing what it means.

An Algorithms course isn't required to graduate, and I didn't realize at first exposure to the term that dynamic programming has nothing to do with writing code or dynamic languages. It's an optimization technique to simplify problems which could otherwise blow up exponentially.

It applies to nearly ever other subject I've held interest in, such as computer vision (with convolutions) or graphics (culling) or optimization in game theory. Never mind that Algorithms has crucial implications for other topics such as Networks, Databases, AI...every computer science topic is some goal-oriented application of specific algorithms. But before I digress....

Formally learning dynamic programming and its technique of solving subproblems with memoization has only increased my interest in the technique:

Until now, we've focused on greedy and recurrent algorithms in this algorithms course. Although we began by defining an efficient algorithm as any that ran in polynomial time, until this point the aim was to achieve O(n) or O(n logn) at worst, and rarely O(n^2) solutions as acceptable.

With dynamic programming we are not nearly as optimistic. The problems have constraints or conditions which render prior techniques as overly simplistic and ineffective. We creatively combine both greedy and and recurrent techniques and use swaths of storage space in hopes of achieving polynomial runtime, meaning that O(n^3) or O(n^4) is not only acceptable--sometimes it's the best we can hope for.

Despite the pessimism, dynamic programming drastically reins in very difficult problems and occasionally leads to slick O(n) solutions for problems which may otherwise run as O(2^n). It's now one of the most sophisticated, versatile tools I have in my belt--I only wish I knew it before taking several other courses.

So back to ranting, on my disbelief that the algorithms course is not required to graduate from the department. I realize it's a difficult time-consuming proof-based (everyone hates proofs, although I hate them a little less finally) course that must compete with other "essential" courses in a limited schedule space. But the thought that I nearly graduated without taking this course is scary given how fundamental it is to everything other course and topic in computer science.

True, I may never prove anything about the runtime to an algorithm I invent in The Real World. But any amount of formal reasoning is far more valuable than being "pretty sure" the solution works. Not to mention using solved problems as templates for real world stuff, or just making positive impressions at job interviews. I'm glad I'm taking algorithms now, especially since I can't travel back in time and take it any sooner.

Friday, February 20, 2009

9 hours of messing with iPhone SDK

The iPhone feels like a new frontier, at least for non-early adopters such as myself.

After learning about Walk Score, I was in a bit of disbelief that their application was not available on the iPhone. Their lookup simply requires a latitude and longitude, which the iPhone readily provides. I dreamed of running around while looking for an apartment and pressing a button on my phone to get its walk score.

So knowing nothing about Objective C, mobile development, XCode, or much else related to Apple development I set out to whip up a simple mashup of iPhone + Walk Score today and went at it for about 9 hours.

Unfortunately I haven't yet produced anything close to finished.

But I learned quite a bit.

For example, I learned that diving straight into the iPhone samples was a fruitless, frustrating process. The interface builder was not intuitive and I spent over an hour trying to add functionality to a button.

After downloading a 50-page tutorial on "Your First iPhone Application," I learned how rigorously the SDK enforces the Model-View-Controller principles, such that a simple Hello World app requires substantial scaffolding (50 pages worth!).

Additionally I got a taste of Objective C, a taste I have not yet acquired. It's more verbose than Java, yet is dynamically typed with difficult syntax and a reasonable number of dependency issues.

I learned that Objective C prefers a SAX style XML parsing.

I figured out how to send HTTP queries to Walk Score, parsed their XML response, and displayed a score for a fixed latitude and longitude.

After this I wondered if a browser based approach held more advantages. It would spare XML parsing and simply direct users to the search results page for their coordinates. This idea held appeal, since I would reuse the existing browser functionality, and some scores took time to calculate and would require a redirect anyway.

As my networks professor mused a couple of days ago, "Perhaps one of the best programmers I've met is also one of the laziest programmers I've met, he never does more than he has to for his code to get the job done. Maybe his laziness is what makes him so good." (He was relating it to the design decisions behind NAT)

After learning how to load web pages using an embedded browser, it came time to retrieve live GPS coordinates. Unfortunately, this is not as simple as a one-line API call. Using GPS is expensive battery-wise and is best done sparingly according the dev docs. For this reason a custom manager is required, which I was hooking up before deciding to call it a day and write this developer diary entry.

Perhaps later this week I will wrap up the GPS retrieval and integrate it into the iPhone app browser call. But I also learned it might not be worth it:
--Walk Score's heat maps show certain places, such as San Francisco, have near universal scores of 80-90 with only small pockets of lower quality. If the result is nearly uniform, is the experience of walking around with this app anything like I imagined?
--Apple has a certification program in place. Testing and verifying the app might be a costly endeavor, and it might not even find its way into the Apple store.
--Now that I learned some basics, I find myself less interested in this project versus what I could now begin working on.

All in all, a great experience. I'm still very excited to see what I can produce on this platform, even if I have to hold my nose to write Objective C...

Wednesday, January 28, 2009

Company Profiles

Ever since working for software companies which were spinoffs of other software companies which were spinoffs of other software companies, I've wondered what a tech company family tree would look like.

Although it runs slowly in the browsers I've tried, this map reveals an amazingly thorough time and space relationship graph for Puget Sound tech companies.

On top of this, LinkedIn has beefed up their company profile pages with illuminating statistics such as incoming and outgoing career paths. Plus, statistics on median age of employees and breakdown of which titles constitute which percentage of company employees sheds light on facts which previously would have required some form of insider knowledge.

Now, for example, I know that prior to joining Microsoft, employees often worked for HP or Oracle, and after leaving tend to arrive at Amazon or Yahoo. I also learned that median age of a Microsoft employee is 34 years old and 13% of its nearly 90,000 employees are software engineers.

Its juicy information like this which I covet for visualization purposes--perhaps simultaneously depicting the flow of workers in and out of a company as well as size and age. I'll have to see if I can snag myself an API partnership.

Tuesday, January 13, 2009

"Artifacts" of taking Graphics/Computer Vision courses

from http://flickr.com/photos/csb13/99025844/

When I am in a restaurant I sometimes notice the reflection and refraction of lights and objects through my water glass and try to trace the beams or other objects back to their origin. This is especially true in restaurants with colored lighting.

When I am walking around areas with lots of solid geometry like between passageways of buildings on campus, I sometimes imagine that the world is translating or transforming instead of me, the viewer (since they produce equivalent images often the approach in graphics is to shift the entire world). I might sometimes figure out where the vanishing points are relative to all the edges and corners.

I am aware of my internal edge sharpening and occasionally notice the effects of mach banding, both phenomenons which occur as the result of signal processing built into the wiring of the rods and cones of the eye itself.

Monday, January 12, 2009

Diving into "Dive Into Python"

Well I'm now five chapters deep in Dive Into Python, which is available completely for free as a PDF.

It's an older book, but still relevant for its unusual and fantastic approach of teaching a new language to experienced programmers. The approach seems so obvious, and yet it's difficult to come up with examples of other titles which follow the same approach.

Each chapter begins with a short code example which highlights the power of Python while performing non-trivial tasks such as outputting the language's own documentation or displaying directory listings with file metadata.

These examples highlight the succinctness of the language while including a new language concept per chapter, effectively booting the reader into a frigid pond of fresh semi-obtuse language constructs ('Get thrown into freezing cold pond of python' doesn't have quite the same ring though).

The remainder of the chapter is always devoted simply to exposing the constructs and syntax of the example; both what and how it all works, line-by-line if necessary. Code samples are available from the website, and the chapter encourages a hands-on approach of running and modifying sample code.

At each chapter's conclusion, a checklist of concepts is presented and the semi-obtuse sample seen at its start reads as clearly as a language you've known all along.

It makes perfect sense that the best way to learn a language is by dissecting code beyond one's current reading level and then changing and authoring examples to reinforce it all. It's a bit strange that fewer books go as far as this one, but it's possible the "by Example" series follows a similar tack.

Dive Into Python's chief drawback is it's age--the book was written way back in 2003, the time of Python 2.3 (it's now the age of 2.6 and the dawn of 3.0). Still, this is a much more fun way to learn than reading, for example, Python in a Nutshell, as I was doing (it's more of a reference manual anyway, though more up to date).