File.open("/tmp/space.txt", "w") {|file| file.print "\n\n\n" } # xxd space.txt # 00000000: 0a0a 0a ...

001f3f70: 3a20 4174 7461 6368 696e 6720 4950 7636  : Attaching IPv6
001f3f80: 206f 6e20 6177 646c 300a 4d6f 6e20 4a61   on awdl0.Mon Ja
001f3f90: 6e20 2034 2031 303a 3237 3a33 322e 3032  n  4 10:27:32.02
001f3fa0: 3920 496e 666f 3a20 3c61 6972 706f 7274  9 Info:  PRIORITY
001f3fc0: 4c4f 434b 2041 4444 4544 205b 636c 6965  LOCK ADDED [clie
001f3fd0: 6e74 3d61 6972 706f 7274 642c 2074 7970  nt=airportd, typ
001f3fe0: 653d 342c 2069 6e74 6572 6661 6365 3d65  e=4, interface=e
001f3ff0: 6e30 2c20 7072 696f 7269 7479 3d37 5d0a  n0, priority=7].
  
---

  
pry> File.open("/tmp/space.txt", "w:utf-8") {|file| file.print "\u0020" }

$ /tmp > xxd space.txt
00000000: 20

pry> " ".eql? "\u0020"
=> true
  
---

pry> File.open("/tmp/space.txt", "w:iso-8859-1") {|file| file.print "\u0020" }
$ /tmp > xxd space.txt
00000000: 20

pry> File.open("/tmp/space.txt", "w:us-ascii") {|file| file.print "\u0020" }
$ /tmp > xxd space.txt
00000000: 20
  
---

Ruby literals come in these flavors:
\uNNNN Unicode codepoint U+NNNN
\xNN   Character with hexidecimal value NN
\nNN   Character with octal value NN
  


# So now we can do this:
pry> File.open("/tmp/space.txt", "w:us-ascii") {|file| file.print "\x20" }
$ /tmp > xxd space.txt
00000000: 20
  
---

pry> File.open("/tmp/hello.txt", "w:us-ascii") {|file| file.print "\x48\x65" }
/tmp > xxd hello.txt
00000000: 4865                                     He

And so we could continue with this until we got:
/tmp > xxd hello.txt
00000000: 4865 6c6c 6f20 576f 726c 64              Hello World

209.to_s(16)
=> d1
  
---

   pry> File.open("/tmp/rus.txt", "w:koi8-r") {|file| file.print "\xD1" }
   Encoding::InvalidByteSequenceError: incomplete "\xD1" on UTF-8

   # We need to tell ruby that these bytes aren't UTF-8.
   File.open("/tmp/rus.txt", "w:koi8-r") {|f| f.print "\xD1".force_encoding("koi8-r") }
  
--- ---


class PantsController < ApplicationController
skip_before_filter :verify_authenticity_token, :only => [:create]
respond_to :json

def index
  respond_to do |format|
    format.json { render json: Pant.all }
  end
end

def create
  render text: request.body.to_s.encoding
  # => ASCII-8BIT
end

end
  
---

 $ curl -X POST -H 'Content-Type:application/json' -d '{"title":"Nice Pants", "size":38}'
 http://localhost:3000/pants
   ASCII-8BIT

 $ curl -X POST -H "Content-Type:application/json;charset=UTF-8" 
 -H "Accept-Charset:UTF-8" -d '{"title":"Nice Pants", "size":38}'
 http://localhost:3000/pants
   ASCII-8BIT
---

Data Corruption


# Let’s set up the failure scenario.
wizard = "마법사"
wizard.bytes

File.open("/tmp/mysql-backup.sql", "w:UTF-8") 
 {|file| file.puts wizard.force_encoding('iso-8859-1') }
import = File.open("/tmp/mysql-backup.sql", 
encoding:Encoding::ISO_8859_1).readlines.first
=> "\xC3\xAB\xC2\xA7\xC2\x88\xC3\xAB\xC2\xB2\xC2\x95\xC3\xAC\xC2\x82\xC2\xAC\n"
import.force_encoding('utf-8')
# nope

import.force_encoding('utf-8').  # undo the wrong file read
  encode('iso-8859-1').        # undo the file write 
  force_encoding('utf-8')     # undo the force in the file.puts block
=> "바나나\n"
---

UTF-8 No Silver Bullet


require 'base64'
encoded = Base64.encode64 'bacon is great'
=> "YmFjb24gaXMgZ3JlYXQ=\n"
decoded = Base64.decode64(encoded)
=> "bacon is great"
# Yay for ascii?

# Wait a minute ...
encoded = Base64.encode64 '마법사'
=> "66eI67KV7IKs\n"
decoded = Base64.decode64(encoded)
decoded.force_encoding('utf-8')
# The bytes didn't change, so force_encoding is correct here
'마법사'.bytes
---

The End

Daniel Miessler's blog post Encoding in Ruby and Everywhere ---