HBase Shell Color

Ruby,Systems — squarism @ 7:59 pm


Since the hbase shell is irb, I wanted to get color output because that’s what I’m used to. Although the appropriate place to put this is in an .irbrc file, that would conflict with any ruby development environment already on the system and luckily jruby and hbase don’t seem to invoke it anyway.

First find a copy of wirble. If you don’t have it anywhere, download it from github:


cd ${hbase_home}/lib/ruby
wget https://raw.github.com/blackwinter/wirble/master/lib/wirble.rb

Now edit ${hbase_home}/bin/hirb.rb. Add to the end but above IRB.start

Now when you start hbase shell, you’ll have lovely color output. Why would you want this? I don’t know. You probably don’t want it. But I was happy to understand how the hbase shell works. It’s just jruby irb that loads hirb automatically.

Rails behind Enterprise SSO

Rails,Systems — squarism @ 8:12 pm

This is a quick write-up without a lot of detail. We hacked together a quick rails app to do provisioning in the style of OIM behind an OAM SSO webgate. The complete guide and detail would be tens of pages so I’ll just give a quick overview for the strategy.

The Goal

  • Ldap authentication
  • No db
  • Sso protected
  • Weblogic deployment

Develop app steps and failures

We used activeldap for the LDAP pieces and defined our user model to narrowly search for a particular objectclass and attributes. Tried to use Authlogic. Fail. Acts as authenticated fail. Devise fail. Ended up using filters and activeldap. Integrating the gems and activeldap was actually kind of hard. A lot of the security gems assume you’ve got activerecord users and depend a lot on the validation helpers etc. So some of the authN gems didn’t work for us. We also had to hack a bit on the activeldap validations. Password policy was non-trivial. I just rolled my own like so:


def validate_password(password)
// initialize a password score
// call password rule methods like:
// check_special_characters
// check_length
// check_uppercase
end

The score from each check method is added to a total score. If the score is greater than zero then your password fails the checks. Each check, a hash for flash[:error] is used so that a precise error message is possible. It works ok except the flash error display is for some reason not ordered correctly.

All configuration constants are stored as YML as an app_config.yml file. For example, the LDAP server, port, password policy rules etc.

For the SSO config, we just detect header as HTTP_REMOTE_USER even though OAM is creating REMOTE_USER. Quick and easy. You have to append the “HTTP_” for the name. It’s a naming convention thing that you can’t do anything about. If your OAM header variable is UNICORNS, then you have to use HTTP_UNICORNS.

We used formtastic for the forms. This was a bit problematic with it trying to detect the activeldap model instead of activerecord.

Testing while developing

Ok so how do you test integration? Are you going to SSO enable your dev laptop? That’s way too hard. You can hardcode the credentials for a while but then eventually you’re going to want to test. I got around this by using a firefox plugin called modify headers. It’s pretty straight forward except for the small detail that you have to keep it open while hitting pages. I thought it would run in the background but it doesn’t. Just keep the modify headers firefox plugin open and it’ll let you create an auth cookie. Don’t worry, this isn’t a security hole. OAM in production won’t let you do this. It’s just used for development.

Warble

install warbler with gem install warbler
Generate default config warble config
Install jruby-openssl because activeldap requires it
Edit config/warble.rb to include jruby-openssl note that you don’t have to have jruby installed or anything.

The rest of the steps are not rails related. Deploy war to Weblogic as usual. Set up a Proxy webgate back to Weblogic for /app (you can’t protecte Weblogic directly with OAM). Protect /app with an OAM policy. If your firefox header test worked then when you turn that off and hit it behind OAM it will work the same. I was able to identify and trust the REMOTE_USER header coming in.

Bam, you’ve got a rails app working in a big scary enterprise SSO environment. The best part about all of this was how fast it went. Compared to JSP/Java EE dev, it was a breeze. The only big multi-day hangups we had was with activeldap. Many gems and auth models really expect you do have your user in the DB. Unfortunately, putting users in the DB creates a silo. Fine for small shops, not so good if you’re using Active Directory, OID, OpenLDAP or Fedora DS (389) for a centralized login.

Technology Knowledge Debt, Part Two

Systems — squarism @ 10:42 pm

Knowledge debt is then you don’t spend the time to catch up. This is Part Two, discussing common knowledge gaps in technology projects regardless of company, industry, project or people. Part One is here and talked about Troubleshooting, VNC, X11, Version Control, Sudo and Cron as common debted areas. The original post that posed the questions is here, where I wondered why the same patterns happen from project to project, company to company and person to person.

Myth of supportability

In most teams, one person is responsible for the work that they do. If you’re lucky, you’ll have two people responsible for some of the same things so that if one person goes on vacation, the work can continue. In most projects I’ve been on, this is not the case. One person is responsible for the beginning, middle and end. That means figuring out the question, figuring out the answer and then writing the answer down in such a way that anyone (with some level of experience hopefully) can do the work in the future. That’s the idea anyway.

However, most projects actually don’t work out like that. In the movie Ironman, Tony Stark invents a mini arc reactor which powers his suit. The antagonist asks an engineer to build what Tony Stark made. The engineer says he can’t do it, I’m not Tony Stark. Of course it’s just a movie but I feel it rings true. Maybe it’s going on where you work. Here are some symptoms:

  • You can’t go on vacation.
  • You’d probably get a raise or a bonus if you threatened to quit.
  • No one knows what you’re actually working on.
  • To cross-train someone means starting from the beginning (ie: beyond two week notice).
  • You fix your own mistakes even when they are years later discovered. This doesn’t have be negative. You could enhance your own project years later or answer questions about projects long since turned over to other people or teams.

It’s called being “one deep” and it’s not where you want to be. However, sometimes it’s impossible to find qualified people to support an effort. Maybe it’s money, maybe it’s recruiting. I don’t know. It’s certainly not a problem at a single company. I suspect it might have to do with the specialized nature of Enterprise Software. For example, a generic problem is only bound by budget. If you have an array sorting algorithm to write, a database schema to design and a cluster or Apache servers to set up; it’s going to be really easy to find these people. But if you have a Documentum connection pool performance problem when using AIX 5.1 for SCO mainframes but only when using the thick Oracle database driver that’s being load balanced by a … blah blah blah; it’s going to be really hard to find someone that’s done whatever craziness you’re doing and then even harder to find someone who immediately knows how to fix your problem.

What I’m talking about is COTS. COTS stands for “I don’t want to take a risk” software. It means you’re going to buy vs build. It’s been a strategy forever and yet we’re all not on the beach with the information problems of the world solved for cheap. Of course, reinventing the wheel is not the greatest approach either. There’s a middle ground, not directly in the middle, that’s probably the best approach.

I was tasked once to get a workflow engine running. A meeting happened and there were suggestions like buy a workflow engine or reuse a web service out there somewhere. The web service of course was BPEL compliant. BPEL is a pain in the ass and I thought it was overkill just to do notifications when something happens. I implemented OSWorkflow as a library and it worked out fine. In the end, the effort was changed completely and my time was filed under Cheese Movement. If I had killed myself getting a BPEL workflow engine working, I would have been a lot more upset when the project was canned. And even worse, if it hadn’t gotten canned and the project went live, they’d have an overcomplex system that’s harder to hand off to cheaper support staff. In the end, a workflow library was easier to integrate than a stand-alone COTS workflow product.

Anecdotal argument aside, something is broken. Is it a complete lack of development skills in systems engineering? Is it a fear of concentrated thinking? Do customers react horribly when you pitch “we’re going to roll our own”? Maybe it’s eating your own dog food: if you used Sybase on one project, why would you store your preferences in XML? Let’s use Sybase again even though it’s impossible to find Sybase people! Yay!

And even if you are allowed to roll your :

  • If you’re single threaded you only have one guy doing something. If he does it manually, that’s considered ok.
  • If he leaves, the manual step is gone.
  • If he codes it in COBOL (something unfamiliar), that’s bad.
  • If he does code it in COBOL, at least you have that asset if he leaves.
  • If you force him to do it in Java or whatever your shop “knows”, you need to hire a dev to replace him, which you don’t have anyway. If you did, you’d have your dev do it in Java right now.
  • But just because you do it in Java, doesn’t guarantee people are available or can read it. I’ve gotten thrown some crazy code that was insane despite it’s language.

I say bring the solution and not the language. Instead of harping on “you shouldn’t have done it that way”, say “I’m glad you got it done, now we own it and you’ve created an asset for us”. Don’t FUD. Just do it.

Copy and paste

First of all, a really whiney semantic point: cut is not copy. Cut removes the text and copies it. Copy leaves the text and copies it. I know cut has one less syllable but when people say “I cut and pasted from that webpage”, I die a little. You can’t cut and paste on a read-only thing, like a webpage.

Anyway, copy and paste changes with the application and OS. Here’s some examples:

  • Native windows app: Ctrl+C and then Ctrl+V
  • Native Linux app in Gnome: Ctrl+Shift+C and then Ctrl+Shift+V
  • Putty copies to the clipboard when you select. You don’t need to Ctrl+C. Ctrl+C means something else when you’re SSH’ing to a Unix box. Just select and Ctrl+V back to notepad or whatever you’re using.
  • Putty pastes with the right mouse button. Don’t Ctrl+V.
  • The command-prompt in Windows pastes with the right mouse button too. There’s also a nice option to do Quick Edit if you go to Properties in the cmd.exe icon while it’s running. Quick Edit will let you select on left click drag and then copy with Enter.
  • X11 copies on select. It pastes with the middle mouse button. If you select anything before you paste, guess what, you just copied that.

I want to re-iterate that last point because I’ve seen this for years. Copy on select is hard to grasp for Windows-y people. I see people copy a bunch of text from Notepad, move their mouse over to the putty window and click on the right-ish portion of the screen. When the click, a large white line has appeared and then they paste. Nothing but blanks! What the hell happened! Argh! I hate computers!

When you click in putty, click on the left portion of the screen with text. If you click on text, you won’t select the blank space on a line. Even better, just click on the menu bar to switch windows. Or use alt+tab. If you click on empty space, you’re going to overwrite your clipboard with nothings.

Linux virtual consoles sometimes have a mouse pointer (btw I’m talking about the Ctrl+Alt+F1 – F6 consoles if you walk up to the server). On RedHat, it has this by default. It’s a little mouse driver and it’s handy for copy and paste. Select some text and right click. It’ll paste in. But this clipboard isn’t shared from the virtual console to the X11 GUI or to an SSH client.

Hierarchies in Software Architecture

As you go from project to project, some patterns should emerge. You should see the same themes over and over in different software packages, different approaches to a problem and in the problems themselves. I’m not the only one who’s noticing patterns. I’ve met lots of smart people that can relate one thing to something else that they know. It’s a good sign that they are paying attention.

But of all the recurring patterns, hierarchies are the least discussed. I don’t know why I notice hierarchies so much. Maybe because I’ve been working with LDAP a lot lately. Maybe I’m a tree hugger. Bam. Pun’d.

Hierarchies organize. Hierarchies can’t avoid defining relationships. A hierarchy has at least one path to get to a node and to that point, hierarchies have paths.

Here are some hierarchies and typical paths:

  • A filesystem: C:\temp\whatever.txt or /tmp/whatever.txt
  • An LDAP DIT: DN: cn=joe,dc=yahoo,dc=com
  • A workflow definition: Start -> Request raise -> Manager -> Deny -> Create monster.com account for employee -> End
  • An HTML document: html -> body -> p ->Hello World
  • A solid state disk: Controller -> Block -> Page
  • A hard disk: Controller -> Cylinder -> Track -> Sector
  • Ruby class: Class -> Module -> Object -> nil
  • Java class: java.lang.Error -> java.lang.Throwable -> java.lang.Object -> null

These are just example paths, they are not how you’d persist these things. For example, a workflow can be stored in XML, which is similar in structure to an HTML document:
<workflow name="Raise process">
<step id="1" name="Start"/>
<step id="2" name="Request Raise">
<action class="Email" to="joe-boss@company.com" />
<unconditional-result status="Waiting for your boss to approve"/>
</step>
<step id="3" name="Manager Decision">
<condition name="Decision" class=CheckForApproval>
<arg name="approved" string="true" status="Approved" step="4">
</condition>
<unconditional-result status="Denied" step="5">
</step>
<step id="4" name="Raise">
<action class="RaiseSalary" status="Raised" step="6" />
</step>
<step id="5" name="Job Board">
<action class="CreateMonsterAccount" status="Getting you a new job" step="6" />
</step>
<step id="6" name="End" />
</workflow>

This is sort of an OSWorkflow style document which is not by any means something that will parse correctly (If I had to do workflow again, I’d use the ruby gem state machine, which is awesome). I’m just illustrating that this workflow is really a hierarchy/tree of steps. And because trees are logically so generic, you’ll see them everywhere.

Here’s a Java class hierarchy:
public class Hierarchy {
public static void main(String[] args) {
System.out.print(Error.class.getName() + " -> ");
System.out.print(Error.class.getSuperclass().getName() + " -> ");
System.out.print(Error.class.getSuperclass().getSuperclass().getName() + " -> ");
System.out.println(Error.class.getSuperclass().getSuperclass().getSuperclass());
}
}

Which will spit out: java.lang.Error -> java.lang.Throwable -> java.lang.Object -> null

And here’s the equivalent ruby code:

ruby -e 'print "#{Class} -> #{Class.superclass} -> #{Class.superclass.superclass} -> #{Class.superclass.superclass.superclass}"'
outputs: Class -> Module -> Object ->

(there’s a hidden null there)

So that’s hierarchies and especially that ones that I’ve noticed are all very much related. You could transform any of these things into the other, like LDAP to XML to a file system hierarchy to a physical structure and back (and people do).

Web application (webapp) vs webpage

A lot of the projects I’ve worked on haven’t been purely a development task. It’s lately been lego-style integration projects so I understand that all people won’t be developers. However, I find it interesting that the words webapp and webpage can be so wrongly mingled. Even in product names (such as Websphere and Weblogic) and product features. Templates, wizards and new project dialogs confuse me sometimes. If I saw a “Create New Web Application Website” button, I’d wonder if I’m going to get some HTML and CSS or weather I’ll end up with WAR project. Sometimes there are hints like “dynamic” and that’s really the distinction.

Webpages are static. Webapps are dynamic. Let’s not get muddy with client-side javascript (it’s a blurry line). But where the terms really matter is on the server stack you choose. Application servers can serve HTML, yes. But if you want to really use a pure web server, you’ll go with Apache/Nginx/etc. Which in turn, can run code through modules. But it’s not in the spirit of Apache as a web server. A web server is really just a file server on the web. In some sense, an application server is a programming language on the web. Every app server has its own language that it was built for. Tomcat, JBoss, Weblogic and Websphere are all Java app servers. Mongrel, Thin, WEBrick and Phusion Passenger are Rails app servers (with JRuby, Weblogic can run Rails but that’s muddy again). You can front these app servers with a web server and create a powerful and flexible environment with Apache modules. In this way, you are using a web server not to serve “web pages” but to serve “web apps”.

So don’t say app server when you mean web server and vice-versa.

Plain-text

Think about what plain-text is. It’s human readable, it’s not encrypted (which is really human readable) and it’s also written, ultimately, by a human. There are protocols, file formats and other standards that are simply plain text. Sometimes plain text allows you to manipulate the system interactively and can help you debug or understand. For example, when I wrote my IRC bot, I simply wrote strings to a network connection object that complied with the RFC protocol. But before doing that, I simply telnet’d to the IRC server and tried typing commands to stay connected.

Try this from a command prompt or unix shell:
telnet irc.freenode.net 6667
You’ll see the server hang and try to do an identd query. You probably won’t have one set up so you’ll see the message “No Ident Response”

Now type this:
NICK ohmynickname
USER ohmynickname 0 * ohmynickname
JOIN #PieIsGreat

Now join up with a regular IRC client so you can see your message. Back in the telnet window type:
PRIVMSG #PieIsGreat :Isn't pie just wonderful?

You’ll see this in your regular IRC client (irssi in this case):

And so on. All IRC is, is a plain-text protocol. It’s just like SMTP.

To illustrate, let’s send an email manually. You’ll need a Linux box for this one.
$ dig -t mx gmail.com | grep smtp
gmail.com. 3249 IN MX 20 alt2.gmail-smtp-in.l.google.com.
gmail.com. 3249 IN MX 30 alt3.gmail-smtp-in.l.google.com.
gmail.com. 3249 IN MX 40 alt4.gmail-smtp-in.l.google.com.
gmail.com. 3249 IN MX 5 gmail-smtp-in.l.google.com.
gmail.com. 3249 IN MX 10 alt1.gmail-smtp-in.l.google.com.

$ telnet alt2.gmail-smtp-in.l.google.com 25
Trying 74.125.43.27...
Connected to alt2.gmail-smtp-in.l.google.com.
Escape character is '^]'.
220 mx.google.com ESMTP a10si18542430bkc.24
helo me
250 mx.google.com at your service
MAIL FROM:
250 2.1.0 OK a10si18542430bkc.24
RCPT TO:
250 2.1.5 OK a10si18542430bkc.24
DATA
354 Go ahead a10si18542430bkc.24
From: Bob Slow
Subject: Manual SMTP
This is great.
.
250 2.0.0 OK 1279640642 a10si18542430bkc.24

Go check your gmail. You’ll see an email. It’s probably going to be listed under your spam folder because we didn’t specify a proper sender, our domain is bogus. You’ll want to use an actual domain name as the from address. It really depends on Google’s spam filter. Sometimes they can check to see if you’re actually sending from that domain, which is tricky unless you have a proper mail server to send from. If you get something through, it’ll look like a normal email.

And why should it be any different? This is what Thunderbird, Outlook or Mail.app program is doing behind the scenes. This is also why email is so insecure. Anyone can sniff your mail as it goes over the network (provided they have their sniffer set up right). Even if you’re using https on gmail.com’s web interface, it’s hard to say if SMTP is encrypted to your destination (probably not). SSL certificates are hard to set up.

Ok, one last example for our pattern. We’re going to make a manual connection to the HTTP listening port to a Yahoo! web server.

$ telnet yahoo.com 80

You’ll see this message:
Escape character is '^]'.

You can hit enter and some blank lines will appear. This means you’re connected but the web server doesn’t think much of newlines (ignores them). Now type “GET /” before it times out. If you time out, just telnet again. You’ll get a response:

Your browser did not send a “Host:” HTTP header, so the virtual host being requested could not be determined. To access this site you will need to upgrade to a modern browser that supports the HTTP “Host:” header field.

Hmm. That’s interesting. Yahoo uses an accelerator that needs to switch on a host header for virtual hosting and load balancing. Let’s do as they say. First, we’ll grab just the HEAD from the root document.

$ telnet yahoo.com 80
Trying 98.137.149.56...
Connected to yahoo.com.
Escape character is '^]'.
HEAD / HTTP/1.1
Host: www.yahoo.com
Connection: close

Make sure to add an extra newline (enter) at the end after Connection: close. You’ll see a HTTP/1.1 200 OK at the top of the little response block, which is a success. The following is what you’ll see (I put some [snips] in there to reduce length).

HTTP/1.1 200 OK
Date: Tue, 20 Jul 2010 22:03:14 GMT
P3P: policyref="http://info.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC GOV"
Cache-Control: private
Set-Cookie: IU=deleted; expires=Mon, 20-Jul-2009 22:03:13 GMT; path=/; domain=.yahoo.com
Set-Cookie: PH=deleted; expires=Mon, 20-Jul-2009 22:03:13 GMT; path=/; domain=.yahoo.com
Set-Cookie: fpc=d=[snip]&v=2; expires=Wed, 20-Jul-2011 22:03:14 GMT; path=/; domain=www.yahoo.com
Set-Cookie: fpms=u_30345330=%7B%22lv%22%3A1479641794%2C%22uvc%22%3A1%7D; expires=Wed, 20-Jul-2011 22:03:14 GMT; path=/; domain=www.yahoo.com
Set-Cookie: fpps=_page=%7B%22wsid%22%3A%22879345330%22%7D; expires=Wed, 20-Jul-2011 22:03:14 GMT; path=/; domain=www.yahoo.com
Set-Cookie: fpt=d=[snip]0._vZOYA-&v=1; path=/; domain=www.yahoo.com
Set-Cookie: fpc_s=d=h[snip]&v=2; path=/; domain=www.yahoo.com
Vary: Accept-Encoding
Content-Type: text/html;charset=utf-8
Age: 0
Connection: close
Server: YTS/1.17.23.1

Connection closed by foreign host.

Looks like they set a ton of cookies in the head. Ok, fine. So now just change the HEAD / to GET /.

$ telnet yahoo.com 80
Trying 98.137.149.56...
Connected to yahoo.com.
Escape character is '^]'.
GET / HTTP/1.1
Host: www.yahoo.com
Connection: close

You’ll get the entire HTML document, header and all. You’ll see the same header that we saw before. Scroll up past the <!DOCTYPE html> and <html> opening tags to find it. At the end of the document, you’ll see the </html>. Your browser turns this plain-text into widgets, buttons and pictures using it’s rendering engine. Note that the security concerns of plain-text SMTP previously discussed also apply to HTTP. Simply because it is plain text. Instead of a yahoo front page, this response could include a DIV tag with a bank account number and balance or a HTTP POST with a username and password.

So what? Why would anyone use telnet as a test? Who cares about plain-text? What kind of weirdo are you?

  • It’s interactive – you can watch a log while you type manually
  • It’s verbose – many times, errors will pop up when you do thing manually that would otherwise be hidden by infinite layers of enterprise software abstraction
  • It’s a good test that you can do if you’re stuck at a client site without your dev machine because almost every OS (like Windows) has telnet
  • It’s near-absolute – If you can’t telnet to port 80, your browser won’t work (excepting proxies and other things)
  • It’s security related – Watch in horror as your html comes over the wire in the clear. Shouldn’t you be using SSL?

Hands-On Security

I don’t have a lot to say about this one so let’s recap what my original complaint was:

Policy makers vs whitehats. Like actually checking Sans for actual 0-day remote root-level ultra-bad exploit on foo software. Security is always a huge blanket statement with very few actual experts. I’m certainly not one but I have met very few that I thought were teaching me something. I want the guy who actually knows what a XSS attack or a buffer overflow is. This might just be a reflection on the projects I’ve been on.

Like I said, I haven’t really met a hands-on security analyst that had a VM with a bunch of security demos/hacks etc on it. There’s some good security related videos out there but the really good ones aren’t around me I guess. Maybe that’s telling of something. Maybe they’re ranting on about me. I dunno. Mark this one as incomplete.

Conclusion

Ok well this was longer than I wanted. Hopefully these tips help you directly, these tips could be linked to someone you know or perhaps seeing these common shortfalls listed out by me is some kind of cathartic release for you. Until the next post, I’ll be trying to understand ruby threading. Threading and concurrency is a personal debt of mine.

Technology Knowledge Debt, Part One

Systems — squarism @ 2:44 pm

As a follow-up to my previous post where I psuedo-whined about common knowledge gaps in technology projects, I thought I’d try to contribute a solution to the problem. I’ll list out each of the areas that I previously posted about; the areas I think people are behind on. It’s knowledge debt. When you don’t spend the time to catch up, you’re in knowledge debt.

Troubleshooting

Troubleshooting is a learned skill and it’s hard to “train”. A lot of effectiveness in troubleshooting is related to the specific skill or expertise domain so a novice is not going to magically become effective at problem-solving just by knowing how to troubleshoot generically. Let’s instead try to focus on some specific examples and talk about common patterns and strategies. The easiest examples revolve around so-called “computer problems” which really mean desktop, networking, environmental and software corruption problems. I’ll stay away from debugging which is a much more precise process where you have the advantage of memory inspection, breakpoints and god-like control over the problem unless you hit external resources which would land you back in the “computer problems” category.

Let’s say a developer is designing a web application on their laptop. Everything works as they designed. Now it’s time to deploy to the test server for other team members to integrate/play with. Ok, let’s scp our .zip/.war/installer/.tar/whatever up to the server and then see if it works. A near-infinite list of things can go wrong but here are a few:

  • You can’t scp to the server.
    • If you got a network error, can you ping the server? If you can ping, can you telnet to port 22? Ping just hits the IP. Telnet will hit the IP plus the port. If the IP works and the port doesn’t then something is filtering the port (firewall) or SSH on the server is not listening on 0.0.0.0 (every interface) but on localhost or another interface.
    • If you are getting a login type error after you connect then try just ssh’ing to the box first. Maybe your shell is messed up. Are you overriding the shell with WinSCP? Are you trying to start in a directory that you don’t have +x on? Maybe your account is locked. The idea here is to ignore network problems. You’ve got a socket and a login prompt. It’s OS/shell related at this point.
    • Understand that putty’s configuration is not shared by WinSCP but pscp is. Try pscp’ing with the -load switch. For example pscp -load testserver-saved-sessions file user@testserver:/tmp will load the “testserver-saved-session” putty config and SCP a file named “file” to the server. If you have proxy settings or other parameters that are needed, you can use pscp instead of reconfiguring WinSCP.
  • The webapp won’t start.
    • Obviously, check logs. You don’t have logs? Are you on Linux? Try strace. If you’re running a java webapp, strace is going to be impossible to follow. On Solaris, run truss. Both of these programs will give low level C system calls that can add up if you’re using a java container. Even rails or php can create a crapton of logs. This is not the first place to start but a fallback when your more exact logs fail you.
    • What is different between your laptop and the test box? If you’re moving from Linux to Solaris, you need to understand the environmental differences between the two OS’s. This is especially true from Windows to Unix. Your middleware stack might be completely different.
    • What is different between your laptop’s network and interaction between systems and your test network? If you have all software on a single box, moving to a distributed test environment will most likely break stuff. You should abstract away hostnames and services to local names that can be configured in DNS or in a hosts file. Instead of pointing to a database called “DBSVR001_L2″. You should have an alias called “webapp-db”. Then configure your app to use the alias and not the box name. /etc/hosts files can have 8 aliases per line.

The list of problems can go on from “the webapp behaves differently” to “you can’t get to the webapp at all”. And this is just deploying to the test environment! In any case, the common pattern should be:

  • Is the problem reproducible? Is it intermittent? This answer should help narrow down the problem scope. Intermittent problems could be external, timing issues, race conditions, memory management, resource limits, functional bugs, synchronization and temporal problems like this.
  • Log mentally or in a journal what each finding means. You should be thinking critically and expecting results as you try things: “if the following test does A then it means X. If it doesn’t then it means Y”. See this fantastic single-bit-flip troubleshooting session here where the author systematically dives deeper and deeper into a segfault until he proves that a system binary bit-flipped while in memory.
  • Change one thing at a time.
  • Try to break the problem into levels if it’s related to lifecycle or something serial. Start high-level and work lower (ie: trace the network cable before setting up debugger breakpoints). Once you’ve proved a level is working, ignore problems on that level. Unless you’re doubting your sanity, if ping works, IP is working.
  • Assume it’s your code and your problem forever. When it’s not your code’s problem, assume it is. Try typing up an angry email to the author, corporation or project and blame them with vigorous detail about why you think this is their problem and not your fault. Do not send it. By the end of your rant, you might have thought of things you haven’t tested or tried enough when proposing what their problem is.
  • Break down the problem and prove hypotheses. Can you break the app even more? If you fix it, can you break it again to prove your sanity?

Even though many of my examples are deployment and networking related, the same process can be used for development and software. Since so many abstraction layers exist in software, it’s very similar to a system or a network stack. Learning these troubleshooting strategies can be a valuable skill to learn and might avoid just giving up or asking a coworker for help. I’ve met people that have no domain experience in a problem area but have enough previous experience and troubleshooting skill that they seem magically able to figure out a problem (and suddenly are regarded as experts).

VNC

Understanding VNC can be difficult because many devs/admins/dbas/whatever are used to other things. Maybe they are even used to VNC but don’t understand how powerful it can be or why it’s being used in the first place. So let’s talk about the basics and maybe this will clear up what VNC is and what it isn’t.

First, VNC is cross platform and open source. That means I can use a Windows VNC client to connect to Windows, Linux, Solaris, Mac, iPhone or whatever. Open source means not so much that VNC is free but that there are many tools available because the spec is open. I can do “many-to-many” client to server. Remote desktop is Microsoft only. VNC can compress remote sessions for slow WAN links. VNC can be faster than X11 if you use it right. VNC can keep a GUI process/installer/program running while you disconnect. VNC can let you broadcast your screen to a classroom of students. VNC can let you bounce around a network when a firewall or network layout prevents you from connecting directly to a server. It’s flexible, familiar to many and open source.

Let’s talk about what VNC isn’t. VNC is an onion skin, it’s not the root window. If you move your mouse in VNC, someone who walks over to the server won’t see the mouse moving. It’s not SSH, you can’t necessarily copy files over VNC, you won’t automatically get a shell either. It’s not secure. It’s not remote desktop. It’s not X11. It’s different on Windows than it is on UNIX, UNIX lets you pick your window manager (Gnome,KDE, twm). VNC has a separate password than your OS user account.

When you start vncserver, it reads your ~/.vnc/xstartup file and launches whatever is in there and fires up a port to listen on for your user. So you can fire up multiple servers if you want but each time you do, everything in ~/.vnc/xstartup is duplicated. If you use Gnome in your xstartup, this might make Gnome mad because it’s not good at running twice.

Once you are connected to your vncserver with a client, you stay logged in. Let’s say you fire up vncserver with Gnome configured for xstartup. So your ~/.vnc/xstartup looks like this:

#!/bin/sh

# Uncomment the following two lines for normal desktop:
# unset SESSION_MANAGER
# exec /etc/X11/xinit/xinitrc

[ -x /etc/vnc/xstartup ] && exec /etc/vnc/xstartup
[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources
xsetroot -solid grey
vncconfig -iconic &
xterm -geometry 80x24+10+10 -ls -title "$VNCDESKTOP Desktop" &
gnome-session &

This is going to open xterm everytime your log in and you’ll have a Gnome desktop. That’s because the process stack looks something like this:


vncserver (includes the X11 server)
|-- X11 Desktop (ie: Gnome as a gnome-session process)
|---- vncconfig (a vnc utility dialog box app, not needed most of the time)
|---- gnome-terminal (one of many windows within your vnc window)
|------ bash (unix shell within gnome-terminal)

I see people make a common mistake when using VNC. They’ll log out of Gnome instead of closing their client window. So the vncserver process lives but there’s no desktop shell. So when they log back in, they just have a blank blue (or whatever color) screen and they’re like “IT’S BROKEN!”. Look at the process stack above. If you log out of the X11 desktop (clicking System->Logout in Gnome) then your process stack looks like this:

vncserver
|-- nothing

Everything is gone now except the vncserver and X11 server which aren’t running anything interesting. So you can connect but you can’t do anything. So don’t log out with the Log Out menu. Just close the VNC client window.

VNC is really good at WAN traffic (like from home to your office). Well, you can configure it that way. There are many types of clients out there but many of them either have a preset for slow links or let you set the compression to fast crappy quality so that you can work over a WAN. From more of a CS perspective, I like to think of VNC as local CPU-hit compression and then a smaller network packet and blown up on the client side. Since you’re network-bound on a WAN, this makes this easier on the WAN and the experience is faster.

X11

X11 is called the X Server because clients actually connect to it to display GUIs like a client-server model. It’s designed this way to allow for flexible displaying of GUIs. Because X11 is a client server model, you can forward X11 over SSH for example and have X11 apps pop up on your local desktop X11 server like magic. Or you can have VNC display your apps instead, which is how the Xvnc process works and no Xorg process is needed.

So X11 on Linux kicks off in init level 5. The root window, which you’ll see if you walk over to a server, is running as the root user and will let you log in as whatever user on the box. Here’s an example process:

$ ps auxww|grep Xorg
root 3320 0.0 0.6 34220 27721 tty7 Ss+ Jun24 1:06 /usr/bin/Xorg :0 -br -audit 0 -auth /var/gdm/:0.Xauth -nolisten tcp vt7

From there, your desktop shell (like Gnome) is kicked off as the user you logged in as which displays back to the X server. If you kill X11 (with Ctrl-Alt-Backspace) then you lose everything within the X server. But VNC is not using the Xorg process, so you won’t kill off anyone’s VNC sessions or anything contained within them. You also won’t kill people that are forwarding X11 apps back to their laptop or what-not. Because they are running a local X11 server such as Xming or Cygwin. Because the desktop shell is running as the current user, it’s usually a bad idea to log into the console as root. It’s unnecessary. If you need root, log in as a normal user, open a terminal and sudo or su -. This will prevent you accidentally doing something bad as root with the file browser, prevent hackers from forwarding malicious apps to you (if you do a xhost + for example) and in general just a better “least privilege” habit to get into. Of course, most people just log in as root to the server because it’s easy (boo).

The DISPLAY variable. You set the DISPLAY variable to an X11 server hostname and port. If you log into the GUI console, you’ll see this:
$ echo $DISPLAY
:0.0

The :0.0 is a special value which signifies the root window. If you

and
$ gnome-terminal
Failed to parse arguments: Cannot open display:

It won’t work. You can’t unset it either. You have to set it to “:0.0″ if you’re actually on the box. When you forward X11 over SSH, the variable will look like this:
$ echo $DISPLAY
localhost:10.0

X11 isn’t encrypted by design. So use SSH forwarding.

X11 can be slow. It sends the whole damn GUI object uncompresed over the network. So if you’re on a WAN, use VNC.

X11 can be buggy with Java swing. VNC can be used as a workaround.

Version control

Let’s say you’re working on a boring spreadsheet full of team members and their skills. You’ve put a lot of time into this spreadsheet and you don’t want to lose what you’ve done. At the same time, you want to completely reorganize it by skill type and seniority. Also at the same time, your boss likes to refer to your spreadsheet to see what kind of people you have available on your team. It’s going to be a lot of work to rework the spreadsheet and it’d be bad if your boss couldn’t access it if you saved a crappy/corrupt version to so you save a copy to the fileshare as “Team_Version_2.xls”.

Great. Except what does Version 2 mean? Also, if “Team.xls” is out there and “Team_Version_2.xls” is out there, which version is the “real” one? Maybe you tell your boss “just use the one with the most recent last modified date”, which is Team.xls right now. But then Rick from sales changes the font color in Team_Version_2.xls and then your boss is checking that for updates because that’s been updated last. Your boss is all confused now because he’s been checking the wrong one and he hired a bunch of DBAs that you already had. Argh! What is going on here?!


The problem is, you are trying to version control a spreadsheet using a fileshare. You’re trying to create different versions using a filename convention. This problem has already been solved using software called Version Control Software (VCS) like CVS, SVN (Subversion), Git and so on. At the very least you should be using Sharepoint if you’re in love with MS Office, if you’re on a software project, you should be using a VCS. Even if you’re using VCS, you might be using it wrong (like trying to version control with filenames inside a version control repository).

First, let me explain extremely briefly what each of CVS/SVN/Git is all about. CVS is old. SVN replaced CVS. Git is sort of replacing SVN but also changing many things along the way. SVN has the most mature toolset, Git has very few GUIs available although XCode 4 looks really nice. Short answer, at least use SVN. If you can hack it, use Git.

So in our example, assuming you are using SVN, you’d create a root hierarchy in SVN called

/trunk
/branches
/tags
/releases

We’d commit a file under /trunk/Team.xls. This file will be the only file that anyone (like our boss) will look at for “the place to go to get our team layout”. When it comes time to make a major change, we’ll do an SVN copy to /branches/omg_major_team_reorg/Team.xls and create a branch. We’ll go to town on our branched version and make major changes. Later, when everyone has been editing the trunk version of Team.xls, we’ll merge our branched version back in and everybody’s edits will be lumped together intelligently. Recent versions of Subversion even has a diff viewer for Office docs.

So how does branching actually work? This took me a while to see when and why I’d use branching. There’s a simple diagram to the right to help visualize the process. I think it helps to just try it out a few times when you think you have a need for it and let yourself get used to the process. Without a concrete need or example, you might think it’s foreign and unnecessary. When it works for you, it’ll be more familiar and useful.

In Git, you should be branching and merging a lot. There’s a great book called Pro Git which has this and more explanation. Git really changed the way I thought about developing. I wouldn’t say I’m an expert yet. I still need to learn to commit more often and branch more. But I’m ok with these future optimization goals, I’m not ok with manual version control on a file share. :)

Sudo and cron

Sudo is a program that lets you run something as root. But more importantly, sudo can look at a file called sudoers to define a list of people and programs. You can let sally run `reboot’ or you can let dave run /etc/init.d/apache. Sudo lets you delegate to normal user accounts without everyone knowing the root password. Sudo also allows for actual, real and useful system logging. When everyone logs in as root, no one is accountable and logging is basically useless. If everyone has root and you need to find out who screwed up the server or even who is doing the most amount of good on a server, you can’t identify people because the logs will tell you that root did it and not dave or sally. So sudo allows for finer grained access control and more accountable logging. Sudo is not “su”. Also, the sudo concept is not limited to Unix. Windows Vista and 7 have UAC which is a very similar concept. UAC prompts for your password and temporarily elevates your access. They took the popup concept from OSX which actually uses Unix sudo. Ubuntu followed this design idea too. XP doesn’t have this ability and thusly, most people run as administrator all the time.

Unrelated to sudo is a scheduling utility called cron. Cron runs things at a certain time and most people get that. But there are multiple places for cron jobs to be run from. There’s a global /etc/crontab which can only be edited by root and there are individual user crontab files that normal users can edit with crontab -e. But some distros of Linux (like Ubuntu) have some special directories for cron jobs:

Ok so this is pretty basic sysadmin stuff but I wanted to touch on a more important idea of cron: ETL. ETL is extract, transform and load. But don’t think in terms of data warehousing or anything specific. Think of cron and ETL as polling. Cron is stupid polling. It’s not event driven. It’s going to run whatever you say every interval you specify. If you have a “send forgotten passwords emailer” cron job set for 15 minute intervals, your users are going to be waiting up to 15 minutes for their email that lets them get into the system. This can be good or this can be a flawed design. On the other hand are events/callbacks/hooks/triggers etc. Events are smarter, more exact and better to use if you can. However integrating events all the way down to the OS layer is nearly impossible. It’s hard to ‘hook’ into an application stack from the bare OS level. Many times, polling is the only solution.

Polling vs events. Very important pattern to look out for.

Next

In the next part, we’ll wrap up with the other dearthy areas and put this topic to pasture.

Dearth Patterns

Systems — squarism @ 11:36 am

There seems to be a lack of understanding and experience in the same areas from project to project, company to company and sector to sector. It’s nothing specific to any one company. I’ve seen this in small .com companies to large contractors. It’s very odd to me. Even if everyone can’t know everything, it seems certain things are never taught or learned.

Some of these things are high-level skills and some are very specific. I think the specific ones are weirder. Osmosis, you’d think, would have some people learn certain tools or languages or whatever.

Troubleshooting
Basic troubleshooting skills. Like, change one thing at a time. Or, turn on debugging. Or what the likely culprit is versus chasing after every possibility. Or, feeling overwhelmed and not trying anything: it’s just broken. Broken how? What changed last? What is persisted? Is there a tmp file? Is there a cache? Can you repeat this every time? If not, what are the variables?

VNC
No one seems to know how VNC works. But everyone uses remote desktop just fine. With VNC, it’s an effort to get everyone connected and it’s a hand-holding operation. I understand it’s tricky. But VNC has been around for almost a decade and it hasn’t really changed. I figured osmosis would kick in.

X11
Cygwin, Xming, putty and getting X11 working. Or how X11 isn’t encrypted. Or security problems with xhost. Or running X as root. Or how X11 is different than the Windows display. Or how X11 is separate from the X11 shell. How to forward X11 over ssh. All these X concepts don’t come from the Windows world but are in every other OS: Solaris, OSX (X11 app), Linux.

Version control
Conflicts, merging, branching. Why SVN is different from SharePoint is different than git is different than just calling files .year.mon.day in a file share. Even ‘senior developers’ that don’t have any high-level knowledge of CVS, a program that was released in 1990. I’ve seen this everywhere.

Sudo and cron
Even if in a Windows environment, what the UAC momentary elevation of privileges is like (also see OSX ignorance). Or how `at` works on Windows. At is also a program in UNIX.

Myth of supportability
Everything has to be written in the language that’s delivered. If we’re delivering a COTS program that runs on C# then every script has to be C# because that’s what we know. If you offered to write a utility in a higher level language, then all the Java devs would cry foul because it’s too foreign. So everything is a hammer problem even though all tech changes in a few years. I wish someone would add up all the “this is going to save you so much money” promises of middleware.

Regarding COTS, Mike Taylor’s Whatever happened to programming? is a good read. I think of his libraries as my COTS/middleware.

Copy and paste
How to paste in X11 (middle click). How to paste into putty (right click). How to copy in X11 (select), how to copy in putty (select). How to copy and paste across VNC (enable clipboard support). How to strip out formatting (paste to notepad, start->Run, anything plain text).

Hierarchies in Software Architecture
How many nodes make up an X? How many X’s make up a Y? Can I have multiple Y’s per Z? What’s shared between Z and foo? Do I have to create foo first? Enough of the contrived examples. Let me give you some real examples:

I have a bunch of disks I’m going to share out for a database.

  1. Combine one or more disks into raid or jbod array on SAN
  2. Slice up SAN into one or more LUNs
  3. Create volumes or filesystems with one or more partitions
  4. Organize partisions into one or more diskgroups
  5. Database is spread across one or more diskgroups
  6. Database has one or more schemas
  7. Schema has one or more tables
  8. Tables can have one or more data partitions

And so on. It can keep going like this for quite a while. In terms of performance, if the partitions go to the same busy disks then there’s not a whole lot of point in sharding. In terms of design and planning, obviously I have to do step 1 before I can do step 5 and often undoing step 1 will cause the house of cards to tumble.

That’s not really a software thing like I said, ok. I can have multiple listeners in Oracle DB but I can’t have multiple listeners on the same port. I can have multiple reverse HTTP proxies on different boxes point back to the same web server on a single port though. I can have multiple IPs on a box but not multiple programs using a port on a single IP. I can have multiple virtualhosts against one IP in Apache but I can’t have name based virtualhosts against one IP with ssh. This hierarchy is especially important when load balancing webapps. Even the F5 has an internal hierarchy to make the configuration flexible.

Ok, ok. That’s still not software enough. I can have multiple sessions in a webapp but only one identity. The identity is in the persistence layer of the above SAN example and my webapp can scale out to the horizon with shared-nothing or whatever I want. I’m still really logging in only once even if I bounce between 500 boxes. The session could be just an instance of my identity.

Webapp vs webpage
Website is to webserver as webapp is to application server. HTML is a webpage/website. JSP/PHP/CF/ASP is a webapp. This is simplified but so many people don’t know the difference even when their job revolves around integrating or even developing ‘webpages’.

Plain-text
Many protocols, passwords and network traffic can be viewed in plaintext. There’s some protocols you can interact with just by typing text to an open telnet session. You can do IRC with telnet. You can do SMTP with telnet. You can do a HTTP GET.

Hands-On Security
Policy makers vs whitehats. Like actually checking Sans for actual 0-day remote root-level ultra-bad exploit on foo software. Security is always a huge blanket statement with very few actual experts. I’m certainly not one but I have met very few that I thought were teaching me something. I want the guy who actually knows what a XSS attack or a buffer overflow is. This might just be a reflection on the projects I’ve been on.

Freedom Systems vs Safety Systems

Systems — squarism @ 3:49 pm

Code Craft had an extremely intuitive post about Freedom Languages vs Safety Languages. He covered what is popular vs what is fringe, where the party-lines are drawn and (imo) almost made an analogy for safety within the USA.

I’ll sum it up: You’re safe or free but rarely absolutely both.

So of course, me, the junior (mid-level if I’m around monkeys) programmer probably can’t offer the same level of insight into software. However, I can make an attempt to “port” Code Craft’s post into Freedom System vs Safety Systems.

Can’t we all just get along?

Windows (safety) vs Mac (freedom). Windows vs Linux. Linux vs Mac. A horrific never-ending battle and argument to plague the Internet and geek circles forever. But even within the Linux community, Ubuntu vs Gentoo. Debian vs Gentoo. Redhat (safety) vs Gentoo/Ubuntu/Debian (freedom). Advocates of safety usually point-out the use of their choice OS in the business world thereby enabling them to get a job. Advocates of freedom platforms usually point-out weaknesses and flaws of the safety platforms.

Switching a business to an all-Linux or all-Mac platform would be a giant risk. It’s the opposite of safe. However, getting into bed with Microsoft is not very freedom-enabling. Vendor lock-in, security problems and lack of diversity in tools get you into a unified headache.

Protect me from myself

So what to do? I don’t have time to worry about platforms. I want to move on to the fun stuff. User space. In user space, I get to write apps, play games, make money and do all the things that I’m supposed to do above and beyond building boxes all day. I’m not an OS loader so “[insert vendor here], help me get on with my day.”

Self-updating systems. Self-updating apps. Warning dialogs, system file permissions. They are all mechanisms that state “I don’t have time to know it all”. I don’t have to track 0day exploits full-time to keep my workflow running. I don’t have to stress that as a normal user I can’t delete a critical file. Almost all systems protect the users from harming themselves (or they should).

However, even in Windows Safe Mode, the user can still do damage. You can delete NTLM in XP (and others) and the system won’t boot. You can delete most of the C: drive with non-administrator rights. In Linux, things are a bit harder but in common practice people actually getting things done usually have root or full sudo rights. The same is true in OSX, the productive user needs root level access to get things done. So where is the happy medium?

Well it’s a bit inverse of programming languages. Actually the safe thing to do in the Enterprise is to enable the user to screw up their own box. It’s more IT tickets but it’s usually less of a headache provided a small-enough organization. In my experience in software and integration, most of the time we get root anyway. The IT department will segment a section of the network off and it’s our own sandbox to wreak havoc. At the same time, we operate much faster than if we can’t even set the system time as it is on Windows with no admin rights.

Me too

Kevin Barnes compared a C-like for loop to a Ruby for loop as evidence that safety languages are safe because they are very similar. So do systems on a broader scale suffer from a copy-cat mentality. DOS became Windows and OSX borrowed some ideas from Windows. Minimizing all open windows is Windows Key+M on Windows and on OSX it’s Apple Key+M. But simple asthetics are not all. The fact is, even files and folders are all copied concepts. Command line commands are very similar. For example, most OS’s have similar commands to change into a subfolder and view a file:

Linux

$ cd docs
$ more resume.txt

Windows

C:\> cd docs
C:\> more resume.txt

Mac

$ cd docs
$ more resume.txt

Of course where the Freedom Factor comes in is with the more complex and arcane things like building software. On RedHat and mainstream Linux distros, building an rpm is usually unnecessary and even complex build processes are point and click with Ubuntu. Freedom advocates of Gentoo or Debian would cite performance differences between self-built code and generic-built code.

“Me too” in platforms is very much a function of copying, standards and memes.

Ring the bell for freedom

So in conclusion, Kevin @ Code Craft has a much easier time comparing languages vs systems. Systems are generally not a science and higher-level which means they are harder to express in snippets. Systems are best described in diagrams and concepts versus strict syntax. So he is fortunate to be a blogger in a discrete subject such as software development.

Although I tried my best to copy his post, I don’t think it really turned out that way. I think the better approach would be to take gaping void’s sex vs cash theory and apply it to technology. It’s really what Kevin should have done to begin with. A Sexy technology is very much full of freedom (Ruby, OSX, OpenGL) whereas Cash technology is very much full of business and work (Java, Windows, DirectX).

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
(c) 2013 SQUARISM | powered by WordPress with Barecity