Target Agnostic ETL

Brainstorm — Dillon @ 10:42 pm

This is a question I posted to stackoverflow. It’s something I’ve been wondering about for a while. The polling nature of ETL has always bugged me but now it seems like the parsing bit of it is annoying too.

ETL is pretty common-place. Data is out there somewhere so you go get it. After you get it, it’s probably in a weird format so you transform it into something and then load it somewhere. The only problem I see with this method is you have to write the transform rules. Of course, I can’t think of anything better. I supposed you could load whatever you get into a blob (sql) or into a object/document (non-sql) but then I think you’re just delaying the parsing. Eventually you’ll have to parse it into something structured (assuming you want to). So is there anything better? Does it have a name? Does this problem have a name?

Example

Ok, let me give you an example. I’ve got a printer, an ATM and a voicemail system. They’re all network enabled or I can give you connectivity. How would you collect the state from all these devices? For example, the printer dumps a text file when you type status over port 9000:

> status
===============
has_paper:true
jobs:0
ink:low

The ATM has a CLI after you connect on port whatever and you can type individual commands to get different values:

maint-mode> GET BILLS_1
[$1 bills]: 7
maint-mode> GET BILLS_5
[$5 bills]: 2
etc ...

The voicemail system requires certain key sequences to get any kind of information over a network port:

telnet> 7,9*
0 new messages
telnet> 7,0*
2 total messages

(more…)

Server side websockets

Brainstorm — Dillon @ 10:58 pm

For some reason I started researching server side websockets with a python module for apache. I thought, “oh this would be cool to create a reverse proxy or a socket to a port that’s not accessible through a firewall”. You could just have an apache module create a path back to some service.

Not knowing a lot about websockets, I kept reading and eventually realized that this problem would be best solved in an app server. The websocket module I found was written in python but if I used something like passenger to reverse back to a rails app server and then I could use whatever language I wanted. Besides, the web server shouldn’t be running code.

Anyway, fizzled idea.

Stream of thoughthose: Game Store

Brainstorm — Dillon @ 10:24 pm


I created a rails app that acts as a harmless baseline app that I can fork and play around with. Railscasts has an example app that I based my theme on. I call this example app “Game Store”. It’s really simple. It’s a video and board game store that has users and a cart. There’s nothing beyond that. No admin interface, no checkout and nothing else fancy.

Again, another stream of raw consciousness while stuck in traffic, then transcribed from the iPhone.

Ok things to do for the game store app. Right now I have everything checked into git, I’m pretty sure I branched it for rails3. I think the master branch is current. Login and the cart is working. If I squashed any bugs and I’m in the wrong branch, that would be bad, so I need to check on that. Maybe merge them back in to the master branch. So there’s some CM things I need to do, baseline it and that should basically be the baselined app. This should be all I need to work with, I created this app so I’d have a test app with basic functionality that wasn’t trying to do anything crazy. It’s just a basic shopping cart app that I can futz with.

So there’s two big things I want to do with this app. One, get LDAP authentication working with Authlogic (which could also spill into doing some kind of special authorization thing or gem). Right now it’s just using database authentication. Get a fancier version of authentication working, goal #1, which could be it’s own branch.

Two, port the whole thing to rails3 and/or ruby 1.9, whatever. Rails3 would be the first thing. So those are two big things, which screw with the baseline app (which is working right now). I would want to branch those two efforts, screw with them, get them working and then merge them back in. So that way I learn more about git, rails3 and the authlogic stuff. But there’s also kind of a “meta-goal”, how am I going to know if these things are working while I’m developing it? So the third goal (which is sort of a meta-thing), I should have tests. I know how the game store works right now, I know what defines success. Basically, a user logs in, using a seeded password for a test user that is part of the fixtures. There’s also fixtures for products of the game store (like Scrabble, Pacman, whatever). So I know that when I add a certain quantity of products to my cart given a set of test products from the fixtures, I will know what the total should be. The cart view will show this total too so I can test the web UI the same way.

So basically the high level test is adding a bunch of items, looking at the total and seeing if the total adds up to the expected number. So I can write a cucumber test that hopefully tests the functionality of the cart model in the game store. Of course before you can add anything to your cart, you have to log in. So I’ll have another test that basically says “log in as this user” and if you log in successfully, you’re going to see a message, like “user logged in successfully” or a cart link at the top. So there could be like 2 or 3 really high level tests that test a lot of functionality and then maybe I could write smaller unit tests that test like my helper methods or tests certain methods on the carts. So to do this I need to figure out my testing stack. The stack needs to be able to do unit testing (meaning ruby classes) and then something that can do web UI type testing (filling in a login form etc). Maybe this could be webrat with cucumber. But I’m not sure about the integration between all the testing frameworks. Like I don’t know if cucumber uses webrat or if cucumber can use rspec. So on my week off, before I start on the #1 and #2 goals (which are about changing functionality and code), I want to figure out how to do TDD. Meaning, I’ve got my app working and I should have written the tests before I got my app working. But now I’m going to write a test suite that can verify that my app is working. Then, once that’s completed, I can branch, hack and break stuff all I want. My tests will help the actual development part of goals #1 and #2 where I’m changing a bunch of different aspects of my app.

So I’ll probably need to create a TDD test app to screw around with the testing stacks. Like follow their example or get someone else’s example. Just to make sure that I have a working test stack. Cucumber’s syntax is a bit weird to me right now. So there’s some learning there. But it’s crucial for my two big changes. The rails3 migration is going to screw everything up. So my tests are really needed. Even if this is a contrived project. So at the end of it I’ll know more about TDD, maybe testing in general, rails3, authlogic plugs and also CM and git branching and merging. So that’s a lot of work but I hope I tackle a lot of it on my week off.

Stream of thoughthose: ElevatorSim

Brainstorm — Dillon @ 9:30 pm

Another herecation. I have a lot I want to get done. On the way back from a training class, I got stuck in traffic and decided to brainstorm and transcribe my stream of consciousness to the iPhones voice recorder. I was hoping to find a speech to text package for cheap (sub $50?) but looks like Scribe is the only one and it’s $150. I really don’t need it beyond the 12 minutes of “drink from the thought hose” recordings, so I can’t buy it right now. Maybe if I keep talking to myself, I’ll buy it. It’s exactly what I need in this situation actually, just don’t know how often this situation will occur.

Hey I don’t want to be the guy with the personal recorder with the crap ideas: “Note to self. Chocolate saucepan. Makes a great gift for chocoholics. No mess to clean up. Research chocolate that can withstand 400ยบ.” Whatever.

So there’s two projects rattling around. One, not really important right now, an elevator simulator (which has been blogged about previously). Really stale project, don’t know why I started thinking about it, I realized I got stuck on it and wanted to jot down a TODO in case I had some time. So here’s what I shouted to myself. It’s really raw. Just posting it here for myself rather than putting it in Evernote. If Scribe + iPhone works out, I’ll stop posting them publicly. This is just a follow-up to the now very old ElevatorSim post on here.

What’s left to do is to load people onto elevator car. This could be done by creating four slots which would be dots on the floor of the elevator car representing the offsets where those people would stand. This could be defined as a series of constants in the car as offsets to the center of the car, which could be reusable for each of the cars. So I need to add a series of x and y, maybe z values representing all the spots on the floor where the people stand basically. Each of the slots also should be pointers that can hold people. So I also need a loadCar method, that when a person gets on the elevator, they go into the next free slot. And also maybe a convenience method to say whether the car is full not or maybe the loadCar method returns false if it’s full (IDK), uh, that’s the easy part.

When the car is loaded, the draw list should still iterate through the people in the car. So when the building manager or the drawing manager draws, it can draw the car but it also draws all the people in the car because the people haven’t fallen off the drawable list. If this is too messy or confusing possibly the elevator car, when it draws, it could draw the people in the car. But I’m pretty sure that the project, as it’s set up right now, that it draws the people independently. So the drawing shouldn’t change. But what will happen is that when the elevator moves, that move method will need to update that same list of four slots (the pointers to the people riding in the car). So when the elevator car’s tween method fires, that updates the X, that also needs to iterate through the list, updating all the people in the car. Basically, the people will tween with the car.

When the car arrives, and there’s a callback or a sysout that prints “we’ve arrived!”. The people will need to be unloaded onto the bind point on the floor. So basically all the people’s positions will need to be updated to the bind point. At that point, the people need to decide which exit to exit to. Which should just be like a random value to make the simulation more interesting. So basically, there needs to be a method for unloadPeople() that either could be put in the car or that could be put in the building. The building could have an unload, which basically the car would call the building unload method to a person object and then that unload method on the building could take over the setting of the bind point and then randomize the exit that the person is going to go to.

And then, once the person decides which exit they are going to go to on the ground floor, when they reach that exit, there will be like another “we’ve arrived” callback hopefully and that should destroy the person off the user population list (if there is one right now — I don’t think there is).

Ok. So that’s basically the user movement stuff of the building that’s left to do. The other big problem with the project is watching the simulation when debugging. Because the rotation of the camera is all screwed up right now. So what I need to do is to implement WASD style flythrough. So I can move around, fly through with the camera/keyboard/mouse WASD movement. And then I can actually peer inside the car to watch all the bind points because while I’m developing the loading and unloading of people, it’s going to be really hard to debug with sysouts or whatever. I’m going to wanna draw the bind points in the car, have like filled in circles when they’re populated and empty circles when they’re not populated. Just so I can get all the logic coded up, etc. I’m going to wanna fly through, watch the people go down with the car, make sure there’s no clipping, any bugs or whatever. Watch the people load, go to the bind point, decide which exit they’re going to go to, exit the building and get destroyed. All that stuff is going to be hard to watch unless I freakin fix all the camera movement (which is all jacked up right now).

So those are the two big high level features that need to be done. There’s a ton of little tiny features that need to go in to make the elevator car load people getting on the elevator and coming off the elevator. But basically, this is where I got stuck last time. So, I need to write this into some psuedocode, write a little TODO list, put it in a pomodoro list and start working on it on my week off, if I have time. That’s it.

So basically, I’m saying the people moving on the floors aren’t getting on the elevator car. I got stuck on this not knowing which design was best or really how to think about it. I didn’t really start design either. It just kinda died at that point. Then I’m saying I need to fix the camera controls. Not only for the user experience but also for my own debugging. It’s hard to use System.println() calls all over the place (sysouts) to try to debug a game or simulator. It’s a state engine and so you get lots of text that you have to later sort through and think about how the state changed. Instead, it’s better to build the tools inside the simulation/game itself. Like the Quake console or whatever.

Anyway, this brainstorm wasn’t really important. And it took a long time to transcribe by hand. Meh. Onto the next little recording…

When is information correct?

Brainstorm — Dillon @ 7:26 pm

A wiki is constantly edited. Unlike a document which can have revisions and snapshots in time, a wiki may or may not ever be correct. But hold on. What is so magical about a revision stamped out? A book isn’t naturally correct. Information has nothing to do with correctness.

Think about a filesystem check (like Windows7 CHKDSK, ha). While fixing and committing filesystem fixes (with or without transactions I don’t know), the state of the data is in flux. When the program finishes, it declares the information to be correct. Is this declaration any different than someone looking at 1+1=2 and saying “yep.”?

Information is correct when an observer correctly or incorrectly says it’s correct. Any deeper discussion of what is correct or real is getting into Descartes territory and I’m not talking about that. Print off a wikipedia page (version), you (observer) review it (validation) and say it’s correct. The data hasn’t changed. Rather it’s a data decoration, metadata about the print-out.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
(c) 2012 SQUARISM | powered by WordPress with Barecity