Ooh, sneaky

Today I thought it might be fun to go back to some old code and see what was going on, and I ended up fixing some DirectX resource leaks that I had had lying around for ages.

Now, the leaks where actually kinda strange, as I’d adopted a pretty strict policy of using auto_ptr and CComPtrs where I thought it made sense (ie on stuff that was either exposed, or stuff that I thought I wasn’t accessing that often for there to be any point in making it a raw pointer).

The problem boiled down to me not really having thought about the implications of my singleton use. I like to think that I’m being pragmatic when I’m making stuff singletons, because I think that I grok the implications, and realize that they are glorified global variables, and all the jazz. But! I hadn’t thought about the fact that I actually had a couple of auto-ptrs lying around as member variables in my singletons, and even though I was calling the singleton’s close methods when I wanted to kill them, close wasn’t doing an explicit “delete this”, so those guy weren’t actually being deleted.

Pretty sneak, no?

Continue reading

Installing Thrift under Ubuntu

So, I recently installed Thrift under Ubuntu, and while not exactly being rocket science, it’s always nice to document the steps for coming generations (or for when I reinstall my laptop and have to do it all again :).

1) Follow the steps in the guide, which boil down to:

$ sudo apt-get -y install subversion g++ make flex bison python-dev libboost-dev libevent-dev automake pkg-config libtool make

$ svn co http://svn.apache.org/repos/asf/incubator/thrift/trunk thrift

$ cd thrift

$ ./bootstrap.sh

$ ./configure

$ make

$ sudo make install

2) This will install Thrift in /usr/local and subdirectories, so you’ll want to add something like

THRIFT_INC_DIR = /usr/local/include/thrift

THRIFT_LIB_DIR = /usr/local/lib

to your makefile.

3) You’ll need to add the lib path to your LD_LIBRARY_PATH to be able to find the .so files:

export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

aaand you’re all set!

Continue reading

Options you’ll just never get right

Runner up

Sort ascending or sort descending? My current way to figure out what I want is to first choose what I think is correct, and then just do the opposite.

The winner!

Actual Size, Fit Page, Fit Width or Fit Visible? What the hell do any of these options mean? All I ever really want is “make a page in the document as large as the current size of the reader window”, but it takes between 3 and 5 clicks to get there (Yeah, sometimes I’ll forget what I’ve tried, and choose the same wrong option twice).

Continue reading

Postfix Applied!

It’s fun when you can get stuff out of your system; simple stuff that you’ve always wanted to hack together, but never gotten round to.

I just had one of those moments, when I hacked together a function plotter, based in large on the postfix code that I wrote a couple of days ago.

It was also a good project to give WPF another go; The last time I looked at it, I dove in at the wrong end, and really didn’t get what was going on.

Code is up on Bitbucket, if anyone is interested. Nothing revolutionary, although I do think I’m pretty clever using reflection on the Math class to get access to a load of math functions without writing them myself :)

Continue reading

Getting my daily postfix!

Sometimes you write some code that you’re not really sure when it will come in handy, but it’s pretty cool, and your gut tells you that it’s a good routine to have lying around, so you happily hack away. And this post is about one of those days!

It all started a week or so ago, when my brain suddenly asked, “Magnus, how does a regular expression parser work?”. Yeah, indeed, how does a general regular expression parser really work. I’d read the entry in “Beautiful Code” on the 30 line parser, but it wasn’t general in the sense that I was looking for.

I Googled around a bit (ok, this is funny; I’m writing this in Google docs before I publish, and docs complains that Googled isn’t a real word :), and found two articles that described how it was done. It took a couple of reading to grasp what was going on, and to get that a regular expression can be converted to a state machine, and that getting a match then translates to seeing if a given input transforms the state machine to its final state.

Along the way I also began to suspect that this was in some way related to parsing (which probably isn’t that strange, seeing as lexers use regular expressions), but before I could write any real code to test stuff out, I got sidetracked into something that I thought was really cool. Get ready for it! *drum roll* Converting infix to postfix notation! Yeah, seriously. That’s it! Get excited. Oh, you’re not. In fact, you’ve got the same “r u srs” look that my girlfriend had when I tried to convince her that it was exciting..

No, but seriously, I started to think that post fix notation was the shit for two reasons. 1) There’s no need for parenthesis, as precedence is explicit in the expression, “1+2*3″ becomes “1 2 3 * +”, and 2) it’s way simple to implement a stack machine that computes the value of the expression: scalars push their values on the stack, and operators pop their operands and push their results.

The cool thing is that converting from infix to postfix is dead simple, and in pseudo-code looks like this:

List<char> infix_to_postfix(List<char> input):
  List<char> result;
  Stack<char> op_stack;
  foreach c in input:
    if is_value(c):
      result.push_back(c)
    else is_operator(c):
      while precedence(op_stack.top()) >= precedence(c):
        result.add(op_stack.pop());
      op_stack.push(c)

while (!op_stack.empty()):
  result.push_back(op_stack.pop());

You keep adding scalars to the result list, and when you find an operator, you check the operator stack for any operators with high precedence, and add these to the result.

I’ve omitted handling parenthesis, but they just work like a “local scope”, so a left parenthesis just adds itself to the operator stack, and a right parenthesis pops the operators to the result until it finds a left parenthesis. It’s written out in better detail in these slides.

Once you’ve got your postfix string, it’s a piece of cake to run thought this and build an AST of some sort, or just evaluate on the fly.

In my test code, I first split the input into tokens, where I also get rid of white space and crap like that, which makes the infix-making easier, as I don’t need error handling there.

Enjoy!

#include
#include
#include
#include
#include #include
#include

template< class T >
T top_and_pop(std::stack& s)
{
T tmp = s.top();
s.pop();
return tmp;
}

enum TokenType
{
token_none,
token_value,
token_lparen,
token_rparen,
token_op,
token_fn,
};

typedef std::pair Token;
typedef std::list Tokens;

uint32_t precedence(const Token& t)
{
// operators in ascending precedence
const char* ops = “+-*/^”;
switch (t.first)
{
case token_op:
return strcspn(ops, t.second.c_str());

case token_fn:
// make sure that function calls have higher precedence than ops
return strlen(ops);

default:
return 0;
}
}

const char* match_value(const char* str, TokenType& type)
{
if (!isdigit(*str)) {
return NULL;
}
type = token_value;

while (isdigit(*str) || *str == ‘.’) {
++str;
}

return str;
}

const char* match_fn(const char* str, TokenType& type)
{
if (!isalpha(*str)) {
return NULL;
}
type = token_fn;

while (isdigit(*str) || isalpha(*str)) {
++str;
}

return str;
}

const char* match_paren(const char* str, TokenType& type)
{
const char cur = *str;
if (cur == ‘(‘) {
type = token_lparen;
return str + 1;
} else if (cur == ‘)’) {
type = token_rparen;
return str + 1;
}
return NULL;
}

const char* match_op(const char* str, TokenType& type)
{
if (strchr(“+-*/^”, *str)) {
type = token_op;
return str + 1;
}
return NULL;
}

Tokens string_to_tokens(const char* str)
{
Tokens tokens;
const uint32_t len = strlen(str);
uint32_t idx = 0;
while (*str) {
// if a match is found, match points to the character after the match
const char* match = NULL;
TokenType type = token_none;
match = match != NULL ? match : match_value(str, type);
match = match != NULL ? match : match_fn(str, type);
match = match != NULL ? match : match_paren(str, type);
match = match != NULL ? match : match_op(str, type);

if (match != NULL) {
tokens.push_back(std::make_pair(type, std::string(str, match – str)));
str = match;
} else {
str++;
}
}

return tokens;
}

Tokens infix_to_postfix(const Tokens& tokens)
{
Tokens result;
std::stack operators;

for (Tokens::const_iterator i = tokens.begin(), e = tokens.end(); i != e; ++i) {
const Token& cur = *i;
switch (cur.first)
{
case token_lparen:
operators.push(cur);
break;
case token_rparen:
while (!operators.empty() && operators.top().first != token_lparen) {
result.push_back(operators.top());
operators.pop();
}
operators.pop();
break;
case token_value:
result.push_back(cur);
break;
case token_op:
case token_fn:
while (!operators.empty() &&
operators.top().first != token_lparen &&
precedence(cur) <= precedence(operators.top())) {
result.push_back(operators.top());
operators.pop();
}
operators.push(cur);
break;
}
}

while (!operators.empty()) {
result.push_back(operators.top());
operators.pop();
}

return result;
}

int main(int argc, char* argv[])
{

const char* input0 = “2+1″;
const char* input1 = “A*B+C/D”;
const char* input2 = “A*(B+C)/D”;
const char* input3 = “3+4*2/(1-5)^2^3″;
const char* input4 = “(1*(5+6))*2″;
const char* input5 = “1 + tjong(10 + tjong2(20.5))”;

Tokens token0 = infix_to_postfix(string_to_tokens(input0));
Tokens token1 = infix_to_postfix(string_to_tokens(input1));
Tokens token3 = infix_to_postfix(string_to_tokens(input3));
Tokens token4 = infix_to_postfix(string_to_tokens(input4));
Tokens token5 = infix_to_postfix(string_to_tokens(input5));

return 0;
}

Continue reading

A few thoughts on API design

The project I’m involved in at work is easily the largest I’ve worked on to date, spanning almost 2 years, and involving around 100 developers. Without going into all that much detail, I can say that the purpose of the project is to build a new platform for current and next generation telephone base stations, which, as far as I’ve gathered, are the stations located all around the country, that let my mum phone me and ask why I never call.

The part that I’ve been most involved with has been the design and implementation of the upper layers of the communication stack, i e the interface that a fair percentage of the other developers have to come into contact with when they want their components to talk to other developer’s components.

I’ve worked on a few different libraries and APIs before, so I really didn’t think this would be that different, but the sheer scale of things meant that oversights that previously only required a few minutes and a quick chat to fix, now could incur several weeks of planning, and slews of angry mails when new releases suddenly broke lots of, previously working, test cases.

After I while I noticed that the general issues mostly fell into a handful of categories, and that writing these things down while they were still fresh in my mind would do me good for coming endeavors. I also realized that I haven’t blogged that much lately, and that this might make a good post!

It’s your API, so it’s your fault
We are a very diverse group of developers on our project, with varying background and varying experience. This meant that if you released something with a somewhat diffuse interface, then there would be as many different usages as there were developers.

At first I was annoyed at this, and naturally expected my co-worked to be clairvoyant and be able to get what I _really_ meant, but after an (embarrassingly long) while, I came to grips with the fact that it was my responsibility as the library writer to make sure that people where using my interface in the way I actually had intended.

This lead to me seeing unintended usage as a game, where the goal was to get people to do what I wanted, and when they didn’t, try to figure out why, and how I could more clearly convey my intent. This means things like consistent and clear naming and making sure that you say const when you mean it.

Once it’s in the .h file, it’s free game
In the book “Refactoring” it’s called a published interface, which is exactly what it is. Once you’ve written something in a header file and released it, that file is from that point on free game to use any way people see fit. This means that you need to put even more thought into everything you publish, and really consider if the things you expose in your header file really need to be exposed, or if you can add some form of indirection, or otherwise make them opaque.

It’s no surprise that people hate breaking API changes, and when the project grows in scale, the negative impact of a change that breaks an API might actually be bigger than the positive effects of that change (this was hard for me to come to terms with, as I’d never worked on such a large project before, where a breaking API change meant man days of work for the whole project to update, and instead just thought that I could make whatever changes I saw fit, as long as they made something better).

We tried very hard to avoid exposing any hard coded values, i e #define SUCCESS 0 in the public headers, as this could easily lead to cases where people just compared against 0 instead of the define, and instead supplied Win32-type macros, IS_SUCCESS(x) that tested if the success bit was set or not.

In one part we weren’t as careful, and supplied something like #define FUNKY_SYSTEM_ERROR 1. This worked fine for a while, but then we wanted to start adding more detailed error codes, i e #define FUNKY_SYSTEM_MEMORY_ERROR 2 etc, which turned out to be a pain, as there were already a lot of compares against the generic FUNKY_SYSTEM_ERROR value in client code.

Add validation code first
We wanted to get our API out quickly so, even if the underlying functionality wasn’t actually there, people code at least compile against it.

This went well, and after a while, we got round to adding real functionality in place of the stubbed code, and also adding in validation code (asserts and the like). This is when we realized that even if you only write stub functionality, make sure that you’ve written the code that tests the correct values on the inputs.

People don’t have a problem at all with making sure that they give you the correct values (especially if they’re documented), but adding validation code later in the project annoys people, because it breaks something that’s already working.

Adding extra validation code late in a project that makes peoples code fail is a good idea in the ideal world, but not always that popular in the real one!

Hard coding, or limiting the freedom isn’t always a bad thing
In our network, we had a predefined topology, but we realized that this could be taken advantage of very late in the project.

Instead we had an API based upon ids that were strings, where both parties trying to talk to each other needed to enter the same strings in several places. This naturally lead to a lot of typos, and because we allowed arbitrary strings as inputs, we didn’t know if we were getting handed the correct string or not, so we had no way of validating the input. We toyed with the idea of having a global table of allowed string values, but there wasn’t any centralized list of what was actually allowed, so this would be very hard to maintain.

Lots of errors due to typos could be avoided if we instead had (for example) generated a header file containing enums that should be used as ids instead, so that the scope of inputs would have been shrunk down to stuff that actually made sense. The “template file” used to generate the header file could also have been used to generate the parts of the documentation that described the topology.

In the end we partially solved this by supplying a bunch of helper functions of the form “a_talk_to_b” that clients could use. These functions would in turn make the connection calls with string parameters, that had been heavily scrutinized, and were hopefully correct.

Who are you writing your error codes and messages for?
This is tricky, and we didn’t really figure out a good strategy other then “hmm, we need to think more about this”.

Is an error specifying that a pointer to some internal structure is NULL really useful for the person using your code? Probably not, as he’d rather have something more descriptive that tells him why he’s getting it, and how to avoid it. Perhaps an explanatory sentence first, and then the hard core debug info for you to look at?

All along the project we had a wiki, that started off very empty, but grew over time, as we realized that this was a great place to write documentation and guides for our system. To help with the “cryptic error messages” problem, we had a troubleshooting section that listed the actually error message that you got (i e “bla bla pointer is NULL”, which is what the test framework would say if an assert triggered), and then a short paragraph saying what this meant, and the common causes.

A living documentation
A wiki based user guide turned out to be a very good idea, but it needs to reach some sort of critical mass so that people start going to it for help, and they need to be encouraged to poke you when they can’t find what they’re looking for, so that you in turn update it, and people confidence in the documentation can actually start to build.

At one point it almost seemed like spamming, with constant replies of “have you read the wiki?” when people came with questions that were answered there, but over time people’s faith in the documentation grew, and I couldn’t help but feel a little proud as I walked by developers sitting debugging using my documentation as a sort of reference.

After a while people are going to read your documentation, and it’s by having up-to-date documentation that people can trust your code. You are going to be in the situation when you’re debugging something strange, and the person having the error says “but the documentation says XXX”. At this point, you can’t just say “yeah, but I wrote it, so I know it’s really YYY”, so it’s important that once people start reading the documentation that you also keep it up to date.

And finally, something not really related to API design, but a fact that I learned the hard way.

Tools take time, and code generation is hard
I started out thinking that I could just whip up some advanced code generation scripts and what not in an afternoon (“but, but, Python’s template functions are so powerful!”), but a couple of failures made me realize that making things production quality does take time, so you’re best off telling your boss what you’re about to attempt, and what the benefits will be. Both so he knows that you’re not goofing off, and so that you can take some time to debug your stuff when it malfunctions later on.

Sometimes just talking to someone else will make you notice that you actually just want to try something because it sounds like fun, and that you’re just solving a relatively simple problem, that already be solved and proved to be working several times over!

There’s a related blog post called The mythical man weekend that’s a good read on how easy it is to underestimate the amount of work needed to get something ready for mass consumption.

Puh, I guess that’s about it for this time. Working on an API that was used by lots of developers (who are also your co-workers, and aren’t’ afraid to give direct feedback!) has been great fun, and given me a lot to think about, both from a practical coding point of view, and also from a theoretical and design point of view, and is something that I wouldn’t mind doing again, now that I feel I have a greater understanding of how to do it. At least I think I do :)

Continue reading

From rags to riches

You know, over the past 10 years, I’ve coded a shitload of half-assed reflection systems in C++. Despite never really writing something that I’ve actually been pleased with, it’s become one of those things that I go back to from time to time, when I think I’ve learned something new that will finally solve the task.

And, of course, it doesn’t. If it actually was solvable in a clean way, without a Herculean effort, it would be done once and for all, and all I’d have to type would be “#include <boost/funky_reflections.hpp>”, and everything would just be fine and dandy.

Now, a couple of weeks ago, I started writing parts of my demo code in Python, and I came to the point where I start bitching about how hard introspection is, and start writing up some weird preprocessor and template hackery, as is customary.

Only this time, I was working in a language with awesome support for reflection, where types are actually tangible, and where you can do funky stuff like creating classes by getting a pointer to a constructor via a string, and just invoking it.

After trying to grok what this really meant, I slowly began to realize that I could write in 10 lines of code, what I hadn’t been able to write in 10 years. And it kinda freaked me out :)

Footnote:

This is one of those weird moments when I realize that if I’d never been a C++ programmer for a gazillion years, then I wouldn’t actually think this was a problem at all, I’d just write the code in a hour, and, well, I dunno, code something else for the rest of the day.

Continue reading

Ooh Ooh Peeing in another pond

You’ve probably read a lot of people saying that you should learn a new language every X years, because it will give you new insights into stuff yada yada yada. Mmm, I’ve read this too, and haven’t really given it that much credit, until I started trying to write some stuff in Python and C# that wasn’t totally trivial.

That’s when I noticed how only using a single language for a long time teaches you these dirty tricks that you can use to hide the fact that perhaps you’re not really designing stuff as well as you should.

I can’t really put my finger on it, but I’ve noticed that when I do Python-stuff, I haven’t really learned the tricks yet, so I have to think a little harder about my structure, classes and what not, if I want my code to not turn into a mess. When I write stuff in C++, on the other hand, I’ve been doing it for so long, that I can sometimes hide design issues behind some template hackery or other contraption, and to the casual observer (and myself, a couple of hours later!), everything looks fine.

I’m not really sure where I’m going with this, but I think my point is something along the lines of “using a couple of different languages will mean that you write code in a language agnostic fashion”, and that this is most probably a good thing.

Hihi, a comment that just made me laugh a bit: “Refactoring without coverage is just changing shit” :)

Continue reading

Moar codez!

Phew, so I finally got around to uploading some more code to my BitBucket repository instead of having it lying around all over the place. Two older projects (my property manager, and my lua exporter), but also the first version of a fun little project I’ve been tinkering with on and off the the past weeks.

It all started a couple of weeks ago, when I started thinking about how to code an interactive, non-intrusive profiler, and the voice (yes, only one!) inside my head told me “that shouldn’t be so hard”.

After some googling (gah, despite the fact that I’m sitting a mere 1 meter away from my large Bill G photo, I just don’t see myself binging), I found the Very Sleepy profiler (based on the Sleepy profiler!), which almost did what I wanted, except that it wasn’t “real time” in the way I wanted.

Good enough, I thought, and also saw this as an excellent opportunity to finally code something using Qt. Using Qt actually turned out to be a treat; the code feels solid, documentation and examples are good, and I think I can safely say that it’s now gonna be my GUI framework of choice the next time I need something like this in C++. Part of me tells me that I’m being silly, and that I should code this in C# instead, but for the time being I can keep that part under control :)

Anyway, the profiler in itself is very simple, and pretty much just a loop suspending the profiled thread, getting it’s callstack, and then resuming the thread. All this stuff is done by just a handful of win32 calls, and the rest of the application is basically bookkeeping and GUI.

Continue reading

Gaah, I can’t wait!

Ever since reading about the C++0x features that are coming to VS2010, I’ve had a hard time not giggling like a little girl when I think about type inference and lambda notation. I really think that these features are going to change the way I write my code.

Having type inference means that I can actually start using stuff like boost::tuple, and pretty basic stuff like iterators, because I’ll no longer be faced with the problem that I have to type so god damned many characters to say what the compiler already knows.

Lambdas solve the same problem; I no longer have to make a shitload of stupid functors that basically just have a single statement that’s actually interesting.

Ah, man, to quote Dollhouse, “it’s no oak, but it’s definitely wood” just from thinking about it.

This morning I downloaded and installed the VS 2010 Beta 1, and everything looked very awesome indeed, until I realized that Visual Assist wasn’t updated yet. I had a quick soul search, and came to the conclusion that I actually couldn’t use a new Visual Studio without my Visual Assist. Damn joo, Whole Tomato! I browsed their forums quickly, and read a post that they were working on a compatible version, and it should be ready soonish. Oooh, here’s hoping that the release coincides with my vacation! It wouldn’t get any more awesome than that :)

Continue reading

prev posts