The programmable programmer

20100513

A Pattern for Non-Local Returns in Javascript

A week or two ago, someone on reddit was complaining about javascripts lack of non-local returns and stating the language was poor as it couldn't be extended to include them.

So, for fun, I wrote a quick implementation of non-local returns in javascript. Probably inefficient as hell, but functional nontheless.

// javascript non-local return
<html><body><script language="javascript">

var WithNonLocalReturn = function( block ){
     var ex_t = function(){
         var hidden = null ;

         this.set = function( retval ){
             hidden = retval ;
         };

         this.get = function( retval ){
             return hidden ;
         };
     };

     var ex = new ex_t() ;

     var valid = true ;

     var leave = function( retval ){
         if( valid ){
             ex.set( retval ) ;
             throw ex ;
         } else {
             throw ( ""
               + "It is invalid to call the non-local return"
               + "outside the stack of the creating WithNonLocalReturn" 
               ) ;
         }
     };

     try {
         return block( leave ) ;
     } catch( e ) {
         if( e instanceof ex_t ){
             return e.get() ;
         } else {
             throw e ;
         }
     } finally {
         valid = false ;
     }
 };

var Each = function( iterable , fn ){
    for( k in iterable ){
        fn( iterable[ k ] ) ;
    }
};

alert( WithNonLocalReturn( function( nonLocallyReturn ){
            Each( [ 1, 2, 3, 4, 5, 6, 7, 8, 9] ,
                  function( i ){
                      alert( 'Testing : ' + i ) ;
                      if( i == 6 ){
                          nonLocallyReturn( 'Success : ' + i ) ;
                      }
                  });

            return 'Failure' ;
        }));

</script>

For fun they're nestable and since you name the value-returning function, you can easily and readably select what level nesting to return to. Or horrifically pass a non-local-return through as a sort of continuation down the stack of an otherwise rational bit of logic.

alert( WithNonLocalReturn( function( dontDoThis ){
            return 'lolwut'.replace( /lol/g , dontDoThis ) ;
        }));

One should probably avoid doing that.

20100308

Go Language is lovely

I've been playing with Google's go language for the last week or so, and I have to say its a lovely language. I generally grab the mailing list for anything new I get into, and today someone was looking for a way to wait for a set of go-routines to complete before continuing on in the main thread. Seeing as how common this sort of thing is, I decided to write up a quick library for it.

To use the library first issue a dispatch.New() to acquire a new dispatch mechanism. Then attach any number ( technically upto uint ) of processes to it it via dispatchInstance.Go( func(){ other_process_here() } ). Any number of other processes ( still technically uint ) can wait on the runners to finish before executing.

Anyway, here it is, and hope it's useful for someone.

In dispatch.go :

package dispatch

import "sync"

type Manager interface {
        Go( func() )
        Wait()
}

type manager struct {
        lock sync.Mutex
        running uint
        waiting uint
        wakeup chan bool
}

func New() *manager {
        m := new(manager)
        m.wakeup = make(chan bool)
        return m
}

func (m *manager) Go( fn func() ) {
        m.lock.Lock()
        m.running++
        m.lock.Unlock()

        go func(){
                fn()

                m.lock.Lock()
                m.running--
                if (m.running == 0) && (m.waiting > 0) {
                        oc := m.wakeup
                        nc := make(chan bool)
                        i := m.waiting
                        go func(){
                                for ; i > 0 ; i-- {
                                        oc <- true
                                }
                        }()
                        m.wakeup = nc
                        m.waiting = 0
                }
                m.lock.Unlock()
        }()
}

func (m *manager) Wait() {
        wait := false

        m.lock.Lock()
        if m.running > 0 {
                m.waiting++
                wait = true
        }
        m.lock.Unlock()

        if wait {
                <- m.wakeup
        }
}

And some example usage in main.go :

package main

import "fmt"
import "rand"
import "time"

import "./dispatch"

func main () {
        w := dispatch.New()

        for i := 0 ; i < 100 ; i++ {
                c := i
                w.Go( func(){
                        time.Sleep( rand.Int63n( 1000000000 ) )
   fmt.Print( c , "\n" )
                        w.Go( func(){
                                time.Sleep( rand.Int63n( 1000000000 ) )
                                fmt.Print( c , " - second effect\n")
                        })
                })
 }

        fmt.Print( "All Launched\n" )

        w2 := dispatch.New()

        for i := 0 ; i < 5 ; i++ {
                c := i
  w2.Go( func(){
                        w.Wait()
                        time.Sleep( rand.Int63n( 1000000000 ) )
                        fmt.Print("[ " , c , "] This should happen after the first set\n")
                })
        }

        fmt.Print( "Second set all launched\n" )

        w.Wait()

        for i := 10 ; i < 15 ; i++ {
                c := i
                w.Go( func(){
                        time.Sleep( rand.Int63n( 1000000000 ) )
                        fmt.Print("[ " , c , "] reusing first queue\n")
                })
        }

        fmt.Print( "Main thread past first set\n" )

 w2.Wait()

        fmt.Print( "Main thread past second set\n" )

        w.Wait()

        fmt.Print( "Main thread past reuse of first queue\n" )

}

I really like this language. It feels like somebody took many of the best facets of C, javascript and erlang and tucked them into a single package. The static duck-typing is beautiful.

20091220

I'm pretty sure it was a joke.

HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA

      <?php
      
      // This is a source file, it ends in ?> ok?
      
      print "ha ha ha HAHAAHAHAHAHAHAHA" ;

    $ php test.php 
      ok?
      
      print "ha ha ha HAHAAHAHAHAHAHAHA" ;

HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA HA

This is obviously a practical joke left in the parser to amuse those of use that dare comment out a line containing a string that starts an xml document.

    //            echo '<?xml version="1.0" encoding="UTF-8" ?>'

Oh PHP, you wonderful bastard.

20090713

Garbage to UTF-8

I had a problem last year with a legacy database filled with a mix of utf-8, windows cp-1252, extended and regular ascii. I needed a way to clean up the information without losing any of the information contained in it.

Being familiar with regular expressions, I looked up how UTF-8 was formatted, made a couple of assumptions about the malformations I would find therein, figured out which code points were the ones I should replace and the following function was born.

Or something like that. Anyway here is the code :

function garbage_to_utf8_character_replacement_function( $matches ) { // converts binary 10000000 -> 11111111 that do not appear // as part of a unicode character into a unicode character // under the assumption that a portion of them are windows // cp-1252 characters, and the rest are exteneded ascii // characters $o = ord( $matches[ 0 ] ) ; switch( $o ) { // check for windows code page 1252 characters case 130 : return "\xe2\x80\x94" ; // Single Low-9 Quotation Mark case 131 : return "\xc6\x92" ; // Latin Small Letter F With Hook case 132 : return "\xe2\x80\x9e" ; // Double Low-9 Quotation Mark case 133 : return "\xe2\x80\xa6" ; // Horizontal Ellipsis case 134 : return "\xe2\x80\xa0" ; // Dagger case 135 : return "\xe2\x80\xa1" ; // Double Dagger case 136 : return "\xcb\x86" ; // Modifier Letter Circumflex Accent case 137 : return "\xe2\x80\xb0" ; // Per Mille Sign case 138 : return "\xc5\xa0" ; // Latin Capital Letter S With Caron case 139 : return "\xe2\x80\xb9" ; // Single Left-Pointing Angle Quotation Mark case 140 : return "\xc5\x92" ; // Latin Capital Ligature OE //gap case 145 : return "\xe2\x80\x98" ; // Left Single Quotation Mark case 146 : return "\xe2\x80\x99" ; // Right Single Quotation Mark case 147 : return "\xe2\x80\x9c" ; // Left Double Quotation Mark case 148 : return "\xe2\x80\x9d" ; // Right Double Quotation Mark case 149 : return "\xe2\x80\xa2" ; // Bullet case 150 : return "\xe2\x80\x93" ; // En Dash case 151 : return "\xe2\x80\x94" ; // Em Dash case 152 : return "\xcb\x9c" ; // Small Tilde case 153 : return "\xe2\x84\xa2" ; // Trade Mark Sign case 154 : return "\xc5\xa1" ; // Latin Small Letter S With Caron case 155 : return "\xe2\x80\xba" ; // Single Right-Pointing Angle Quotation Mark case 156 : return "\xc5\x93" ; // Latin Small Ligature OE //gap case 159 : return "\xc5\xb8" ; // Latin Capital Letter Y With Diaeresis default : return chr( 192 | ( 3 & ( $o >> 6 ) ) ) . chr( $o & 191 ) ; } } function garbage_to_utf8( $text ) { // locate all bytes with 0x80 set that are not a proper // component of a unicode character. pass them to // garbage_to_utf8_character_replacement_function // to convert them to unicode under the assumptions they // are either windows characters or extended ascii $bad_replace = '' . '/(' . '(' // find 1111110x not followed by 5 10xxxxxx . '[\\xFC-\\xFD](?![\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF])' . '|' // find 111110xx not followed by 4 10xxxxxx . '[\\xF8-\\xFB](?![\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF])' . '|' // find 11110xxx not followed by 3 10xxxxxx . '[\\xF0-\\xF7](?![\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF])' . '|' // find 1110xxxx not followed by 2 10xxxxxx . '[\\xE0-\\xEF](?![\\x80-\\xBF][\\x80-\\xBF])' . '|' // find 110xxxxx not followed by 1 10xxxxxx . '[\\xC0-\\xDF](?![\\x80-\\xBF])' . '|' // find 10xxxxxx not part of code point . '(?<!' . '[\\xFC-\\xFD][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF]' . '|' . '[\\xFC-\\xFD][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF]' . '|' . '[\\xFC-\\xFD][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF]' . '|' . '[\\xFC-\\xFD][\\x80-\\xBF][\\x80-\\xBF]' . '|' . '[\\xFC-\\xFD][\\x80-\\xBF]' . '|' . '[\\xFC-\\xFD]' . '|' . '[\\xF8-\\xFB][\\x80-\\xBF][\\x80-\\xBF][\\x80-\\xBF]' . '|' . '[\\xF8-\\xFB][\\x80-\\xBF][\\x80-\\xBF]' . '|' . '[\\xF8-\\xFB][\\x80-\\xBF]' . '|' . '[\\xF8-\\xFB]' . '|' . '[\\xF0-\\xF7][\\x80-\\xBF][\\x80-\\xBF]' . '|' . '[\\xF0-\\xF7][\\x80-\\xBF]' . '|' . '[\\xF0-\\xF7]' . '|' . '[\\xE0-\\xEF][\\x80-\\xBF]' . '|' . '[\\xE0-\\xEF]' . '|' . '[\\xC0-\\xDF]' . ')' . '[\\x80-\\xBF]' . ')' . ')/' ; return preg_replace_callback( $bad_replace , 'garbage_to_utf8_character_replacement_function' , $text ) ; }

A quick search shows I'm not the only one to have solved this using regular expressions.

FixLatin

Too bad he didn't post sooner, it would've saved me having to figure out the encoding transformation on my own. Ah well, at least I'm not the only one.

20090514

Human Readable Sort

I was using sort the other day at work and got annoyed that it wouldn't sort by human-readable units. I wrote a patch that night, signed up for coreutils mailing list the next day and emailed it in.

I worked with one of the developers trading the patch back and forth the next two nights.

Now, `du -hs * | sort -h` is ready to be added to the coreutils. My first FSF contribution.

http://www.nabble.com/Human-readable-sort-td23223205.html

20080306

Hidden backup files with emacs

Ever since I started using emacs the directories full of `whatever~' files have annoyed me. No more! I put this into my .emacs ( along with a (require 'cl) ) and voila, backups for `whatever' end up in `.whatever~' in the same directory. .emacs? `..emacs~'. Thus when I run ls my directory listing is clean.

;; hidden backup files - i hate seeing them in listings ...                                                                                                                                                       
;; prefix with a dot as well as postfix with a tilde                                                                                                                                                              
(defun custom-make-backup-file-name ( file )
  (let ((d (file-name-directory file))
        (f (file-name-nondirectory file)))
    (concat d "." f "~")))
(setq make-backup-file-name-function 'custom-make-backup-file-name)

(defun backup-file-name-p ( file )
  (let ((letters (string-to-list (file-name-nondirectory file))))
    (and (> 2 (length letters))
         (equal "." (first letters))
         (equal "~" (last letters)))))

(defun file-name-sans-versions ( file )
  (if (not (backup-file-name-p file))
      file
    (let ((d (file-name-directory file))
          (f (file-name-nondirectory file)))
      (let ((letters (string-to-list f)))
        (concat d (subseq letters 1 (- (length f) 1)))))))

While I'm busy dumping from my .emacs file, I like the truncated lines when I use ( C-x 3 ) to divide the display vertically except when I'm running a shell. Then I want to see everything.


;; do not truncate lines in shell                                                                                                                                                                                 
(add-hook 'shell-mode-hook (lambda () (progn
                                        (make-local-variable 'truncate-partial-width-windows)
                                        (setq truncate-partial-width-windows nil))))

20080225

Batch-fu

I don't know how many avid windows batchers are out there ( I was one some years ago ), but perhaps a few of you can use / be horrified by this little helper. Ever get annoyed because you can't easily reuse functions between scripts since they'll stomp all over each others environment variables? Probably not. Just in case, here's how to create lexically scoped batch file functions.

::#
:ServerName_Service
setlocal

:: blah blah do anything to namespace blah

::now pass the full name / status back out of the setlocal
for /F "usebackq tokens=1,2 delims=~" %%a in (`echo.%ServiceName%~"%Status%"`) do (
  endlocal
  set CACHE~%%a=%%b
)
goto :eof

Tada!

Even better if you structure them such that the first argument is the name of the variable to receive the value from the function call and then write your exit similar to this :

::now pass the full name / status back out of the setlocal
for /F "usebackq tokens=1,2 delims=~" %%a in (`echo.%ServiceName%~"%Status%"`) do (
  endlocal
  set %1=%%b
)

BTW, I don't really recommend writing large programs in batch, but if draconian network policies make it all you've got, good luck.

Remember that it is two phase, first variable expansion happens, then execution occurs. Execution of lines starting with `:' makes these lines into labels. Lines that start `::' are label errors and dropped ( making for better comments than rem, which executes and freaks out all to hell if special characters are in its argument list / comment area ). Lines starting `%%en_var%%' where the environment variable `en_var' has the value `::' are label errors and dropped. ( This can be used to great effect. I can't take credit for this hack though. I found it on Rob van der Woude's scripting site ).