2014-01-07

How many lines in a file?

It's an easy (but not clear, please refer to the Note of wc below) question, but sill quite funny.

Tested under the Bash shell on Ubuntu 12.04 32-bit.

(1) wc

cat sample.txt | wc -l

or

wc sample.txt | awk '{print $1}'

awk is used to extract the 1st field.

Note: wc only counts the newline characters! The last line without newline is not counted.  For example:

echo -ne "ab\ncd" | wc -l

would generate 1 (line), not 2.




(2) cat

cat sample.txt | cat -n | tail -1 | awk '{print $1}'

or just

cat -n sample.txt | tail -n1 | awk '{print $1}'

Option "-n" is used to prefix the line number before each line.



(3) sed

cat sample.txt | sed -n "$ ="

Option "-n" of "sed" is used to suppress (the default) print action of each line.

"$" will match the last line.

"=" will print the line number.


(4) awk

cat sample.txt | awk '{c++} END {print c}'

"{c++}" is used to increment the counter (c) each line.

"END {print c}" is used to print the report.

Note: In AWK, "$" is not used for accessing variables, but for "fields" ($1 for the first field, etc.  $0 for whole line).

or

cat sample.txt | awk 'END {print NR}'

"NR" (Number of Record) is a builtin variable of AWK.


(5) perl

cat sample.txt | perl -lne '$c++; END {print $c}'

Option "-n" let perl loops each line, just as AWK.

Option "-l" let perl prints "\n" by default.

Option "-e" let perl to execute the command line program.


Note: In perl, "$" is used for accessing variables. "$F[]" for accessing fields (0 based index).

or

cat sample.txt | perl -ne '$c++; END {print "$c\n"}'

Without "-l" option, we have to print "\n" by ourselves.


(6) grep

cat sample.txt | grep -c ^

"-c" to count the matched pattern.

"^" to match the line begin. "$" also works (match the end of line).

(7) bash

cat sample.txt | (while read; do ((r++)); done; echo $r)

Note: The "(...)" is necessary, otherwise while loop will enter its own subshell (because of the pipe), and the variable scope would have problem.

or

cat sample.txt | (mapfile rs; echo ${#rs[@]})

mapfile (or readarray) will read the entire file into an array.

"${#rs[@]}" will return the number of elements of an array (rs here).


(8) PHP

cat sample.txt | php -r '$c=0;while(fgets(STDIN)){$c++;} echo "$c\n";'

or

cat sample.txt | php -r 'for($c=0;fgets(STDIN);$c++); echo "$c\n";'


(9) C

cat sample.txt | (echo 'main() {int c,r=0; while((c=getchar())!=-1){if(c==10)r++;} printf("%d\n",r);}' | gcc -w -x c -; ./a.out)

Use "echo" to generate a little C snippet which just count the newline characters (just as "wc -l").  The confliction of quotation marks between BASH and C is quite annoying.
"-1" is EOF
"10" is '\n'

"-w" of gcc would disable warnings.
"-x c" informs gcc the language of the source code is C.

(10) common lisp

cat sample.txt | sbcl --noinform --eval '(let ((nr 0)) (loop for r = (read-line *standard-input* nil) while r do (incf nr)) (prin1 nr))'

"--noinform" of sbcl disables welcome greetings.

The "nil" in "read-line" would disable eof error exception.


(11) tcl

cat sample.txt | (echo -e 'set nr 0\n while {[gets stdin line] >= 0} {incr nr}\nputs $nr' > tmp.tcl; tclsh tmp.tcl)

Note 1: "set nr 0" can't not be omitted, otherwise "puts $nr" would complain "no such variable" in case the input file is empty.

Note 2: There must be at least one space between the test block and body block of "while".


(12) tr & uniq

cat sample.txt | tr -cd '\n' | uniq -c | awk '{print $1}'


"-c" of tr is to complement the selected character (i.e. any character except "\n").

"-d" of tr is to deleted the selected character.

So we use tr to delete any character except "\n".

"-c" of uniq is used to print the count of repeated lines.

So we use uniq to count the number of newlines.

"awk" here is used to extract the 1st field.

Note 1: This method only count the newlines (just as "wc -l").

Note 2: Once the input is empty, there is no output at all (because tr, uniq and awk would do nothing.)


(13) m4

cat sample.txt | echo "len(patsubst(\`$(cat)',\`.+'))"  | m4

Use "patsubst()" of m4 to remove all characters except newlines, just as "tr -cd '\n'".

Use "len()" of m4 to count the number of newlines, just as "uniq -c".

Note 1: Because the $(cat) (command substitution) would delete all trailing newlines, the trailing blank lines and the last line with characters may not be counted, somewhat like "wc -l".

Note 2: The left quotation mark of m4 ("`", backtick) must be escaped ("\`"), otherwise it would conflict with bash.


(14) nl (number lines)

cat sample.txt | nl -ba | tail -1 | awk '{print $1}'

Option "-ba" means to number Body All.  The default behavior of nl is to number only nonempty lines.

"nl -ba" works just like "cat -n".

nl has versatile numbering options.  Quite interesting.


(15) Ruby

cat sample.txt | ruby -e 'nr=0; while gets; nr+=1; end; puts nr'

Option "-e" let ruby to execute code in place.

This codelet is quite clear and easy.  But in fact, Ruby in command line can act like AWK or Perl.


(16) Python

cat sample.txt | python -c $'import sys\nc=0\nfor r in sys.stdin: c+=1\nprint c'

Option "-c" let Python to execute commands directly.

Note 1: "Suites" in Python (compound statements, such as if, while, for, def, class) can not be terminated with ";".  So we have to insert newlines for the "for" statement.

Note 2: Bash "ANSI-C Quoting" ($'...') is used here to insert newlines.


(17) Haskell

cat sample.txt | ghc -e 'do {rs <- getContents; print $ length $ lines rs}'

Option "-e" let Haskell to evaluate the expression.

"getContents" to get the input as a string, "lines" to split a string into a list of lines by newline characters, and "length" to count the list elements.


(18) JavaScript

Using Node.js:

cat sample.txt | js -e 'console.log(require("fs").readFileSync("/dev/stdin").toString().split("\n").length)'

Note: The answer is always the "newline count + 1" because it uses "\n" to split.


(19) BASIC

Using sdlBasic:

cat sample.txt | (cat >tmp.txt; echo 'open "tmp.txt" for input as #9: nr=0: while not eof(9): file input #9,r$: nr=nr+1: wend: print nr' > tmp.bas && sdlBrt tmp.bas)

"sdlBrt" is the runtime of "sdlBasic".

Note: sdlBasic having trouble to read from "/dev/stdin" directly.


(20) Java

cat sample.txt | (echo 'import java.io.*; public class Tmp { public static void main(String[] args) throws Exception { int nr = 0; BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); while (br.readLine() != null) nr++; System.out.println(nr); }}' > Tmp.java; javac Tmp.java && java Tmp)

Quite clumsy.


(21) C++

cat sample.txt | ((echo '#include <iostream>'; echo '#include <string>'; echo 'main() {int nr = 0; std::string r; while (getline(std::cin, r)) nr++; std::cout << nr << "\n";}') | g++ -x c++ - && ./a.out)

Option "-x c++" informs g++ the source code is C++.

Note: The first 2 "echo"s are used to generate "#include ..." with newlines.


(22) Pascal

Using fpc (Free Pascal):

cat sample.txt | (echo 'var r:string; nr:integer; begin nr:= 0; while not eof() do begin readln(r); nr:= nr+1 end; writeln(nr) end.' > tmp.pas; fpc -Fe/dev/null tmp.pas && ./tmp)

Option "-Fe/dev/null" informs fpc not to display messages.  (But there is still one message coming from "ld" which is called by fpc.)


(23) D

Using gdc (GNU D Compiler, D Front End for GCC):

cat sample.txt | (echo 'import std.stdio; void main() {int nr=0; while (stdin.readln() !is null) nr++; writeln(nr);}' > tmp.d; gdc tmp.d && ./a.out)


(24) Logo

Using UCBLogo (aka Berkeley Logo):

cat sample.txt | (echo 'make "nr 0 while [not eof?] [make "r readrawline make "nr :nr + 1] show :nr bye' > tmp.logo; logo tmp.logo)


(25) PostScript

Using gs (Ghostscript):

cat sample.txt | (echo '/r 1024 string def  /si (%stdin) (r) file def  0  {si r readline {pop 1 add} {length 0 ne {1 add} if exit} ifelse} loop  =  quit' > tmp.ps; gs -q tmp.ps)

Option "-q" informs gs to run quietly.

Note 1: The counter is pushed on stack with initial value 0, add one (1 add) when necessary, and print out (=) finally.

Note 2: The "readline" would return with an EOF mark, different from other languages slightly.  So we have to check whether there are data at the last line (length 0 ne).

Note 3: "quit" to end gs directly (without entering the interactive mode).

Note 4: The buffer length is hard coded with 1024 bytes long, not correct in logic, but working in practice.


(26) Assembly

Using AT&T 386 assembler syntax to count newlines.
as -o nlcnt.o nlcnt.s
ld -s -o nlcnt nlcnt.o
cat sample.txt | ./nlcnt
"-s" of ld is used to stripe debug information.  There are 432 bytes in the final execution file.
Using gdb to debug.  Some useful commands:
gdb nlcnt
break "some label"
info registers
step
x (some address)
display /3i $pc


    .text
    .global    _start
_start:
    xorl    %esi,%esi
rdloop:
    movl    $1,%edx
    movl    $r,%ecx
    movl    $0,%ebx        #stdin
    movl    $3,%eax         #sys_read
    int     $0x80
    testl    %eax,%eax
    jz    eof
    movb    r,%al
    cmpb    $0xa,%al
    jne    rdloop
    incl    %esi
    jmp    rdloop
eof:
   
    movl    %esi,%eax
itoa:
    xorl    %edi,%edi
    movl    $10,%ebx
itoa0:
    cdq
    idiv    %ebx
    addb    $0x30,%dl    #'0'+%dl
    movb    %dl,buf(%edi)
    incl    %edi
    testl    %eax,%eax
    jnz    itoa0

    movl    %edi,%edx    #strlen
strrev:
    movl    $buf,%esi
    lea    buf(%edi),%edi
nxtpos:
    decl    %edi
    cmpl    %esi,%edi
    jle    endrev
    movb    (%edi),%al
    movb    (%esi),%bl
    movb    %al,(%esi)
    movb    %bl,(%edi)
    incl    %esi
    jmp    nxtpos
endrev:

    movb    $0x0a,buf(%edx)    #append '\n'
    incl    %edx
    movl    $buf,%ecx
    movl    $1,%ebx        #stdout
    movl    $4,%eax         #sys_write
    int     $0x80

    movl    $0,%ebx
    movl    $1,%eax        #sys_exit
    int    $0x80

    .bss
    .lcomm    r,4
    .lcomm    buf,16


(27) PL/I

Using Iron Srping PL/I Compiler 0.9.4.
plic -ew -o nr.o nr.pli
ld -z muldefs -e main -o nr nr.o -lprf
cat sample.txt | ./nr

Option "-ew" inform plic to disable some warnings.

"-z muldefs" let ld to allow multiple definitions.
"-e main" let ld use "main" as the entry point.
"-lprf" let ld to link the PL/I library "libprf.a".

The result is always one larger, why?
If the last line of input data has no newline, it would raise ERROR condition.

nrx: procedure options (main);
   dcl r    char (120) varying;
   dcl (nr, eof) fixed bin;

   eof = 0;
   on endfile(SYSIN) eof = 1;

   nr = 0;
   get edit (r) (L);
   do while (eof = 0);
      nr = nr + 1;
      get edit (r) (L);
   end;
   put skip list (nr);
end nrx;


(28) COBOL

Using open-cobol (OpenCOBOL, GNU Cobol).

cobc -x -o nr nr.cob
cat sample.txt | ./nr

Option "-x" inform cobc to generate the "main" function.

       IDENTIFICATION DIVISION.
       PROGRAM-ID. nrx.

       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT in-stream ASSIGN TO KEYBOARD
               ORGANIZATION LINE SEQUENTIAL
               FILE STATUS in-stream-status.

       DATA DIVISION.
       FILE SECTION.
       FD  in-stream.
       01  stream-line                 PIC X(80).

       WORKING-STORAGE SECTION.
       01  in-stream-status            PIC 99.
           88  end-of-stream           VALUE 10.
       01  line-count                  PIC 9(6).

       PROCEDURE DIVISION.
           OPEN INPUT in-stream

           PERFORM UNTIL end-of-stream
               READ in-stream
                   AT END
                       SET end-of-stream TO TRUE
                  NOT AT END
               ADD 1 to line-count
               END-READ
           END-PERFORM

        DISPLAY line-count
           CLOSE in-stream
           .
       END PROGRAM nrx.


(29) FORTRAN

Using gfortran (GNU FORTRAN compiler):

cat sample.txt | (echo 'program nrx; character*1024 :: r; n=0; do; read(*, "(A)", iostat=k) r; if (k /= 0) exit; n = n + 1; end do; print *, n; end program nrx' | gfortran -ffree-form -x f95 - && ./a.out)

Option "-ffree-form" informs gfortran to use "free-form" but not traditional "fixed-form".

Option "-x f95" informs gfortran the source code syntax is "FORTRAN 95".

Note 1: The result of "print *, n" is not left justified.  It would take more codes to justify left.

Note 2: The variable type of "n" and "k" is "integer" by default.


(30) Lua

cat sample.txt | lua -e 'n=0; for r in io.lines() do n=n+1; end; print(n)'

Option "-e" informs lua to execute the statement.

Note: Some ";" (semicolons) are not necessary, because Lua can use "whitespace" as "separator".  But for the sake of clarity, some ";" are reserved here.


(31) Scala

cat sample.txt | scala -e 'var n=0; for (r <- io.Source.stdin.getLines()) n+=1; println(n)'

Option "-e" informs scala to execute the expression(s).

It seems quite neater than Java.


(32) Go

Using go 1.2: (for Linux x86 32-bit system)

cat sample.txt | (echo 'package main; import ("bufio"; "os"); func main() {si := bufio.NewReader(os.Stdin); n:=0; for {_, _, k := si.ReadLine(); if k!=nil {break}; n+=1}; println(n)}' > tmp.go; go run tmp.go)

Note 1: "ReadLine" has 3 return values, line, isPrefix, and err.

Note 2: The internal buffer of "ReadLine" has 4K bytes.  It should be large enough for most cases.  In case the line is too long, the 2nd return value "isPrefix" will be set.


(33) Standard ML

Using MLton:

Recursive version:

cat sample.txt | (echo 'fun nr(f) = if (TextIO.inputLine(f) <> NONE) then 1 + nr(f) else 0; val n = nr(TextIO.stdIn); print (Int.toString(n) ^ "\n");' > tmp.sml; mlton tmp.sml && ./tmp)

Operator "^" is used to concat the strings.

The compiling speed seems a little slow.

or

While loop (with variable reference) version:

cat sample.txt | (echo 'val n = ref 0; while (TextIO.inputLine(TextIO.stdIn) <> NONE) do n := !n + 1; print (Int.toString(!n) ^ "\n");' > tmp.sml; mlton tmp.sml && ./tmp)

"ref" is used to specify a "reference", and "!" is used to "deference".


(34) Ada

Using gnat (GNU GNAT) compiler:

cat sample.txt | (echo 'with Ada.Text_Io; use Ada.Text_Io; with Ada.Integer_Text_IO; use Ada.Integer_Text_IO; procedure nr is n: Natural :=0; r: String(1..1024); c: Natural; begin while not End_Of_File loop Get_Line(r, c); n:=n+1; end loop; Put(n,1); end nr;' > nr.adb; gnatmake -q nr && ./nr)

Option "-q" informs gnatmake to be quiet.

The "1" in "Put(n, 1)" means 1 position is enough, so the result will be left justified.

"Natural" is a subrange type of Integer.  It begins from 0.

Note 1: "Put(n, 1)" is in the "Ada.Integer_Text_IO" package.  If we use "Put_Line(Natural'Image(n))" (of the Ada.Text_Io package), the ' (single quote) would be annoying, and the output result would be right justified.

Note 2: The compiler asks the file name muse be consistent with the "unit name".  So there are 5 places with the same name: procedure XXX, end XXX, > XXX.adb, gnatmake -q XXX, and the execution file ./XXX)

Note 3: The last empty line would not be counted.  Why?  Is it an implementation problem or due to the language specification?


(35) R

Using r (littler, which means "Little R"):

cat sample.txt | r -e 'rs<-readLines(con="stdin",warn=FALSE); cat(length(rs),"\n")'

Option "-e" informs r to execute the script.

"warn=FLASE" of readLines() disables the warning on missing of the final EOL.

or

Using R (GNU R):

cat sample.txt | R --slave -e 'rs<-readLines(con="stdin",warn=FALSE); cat(length(rs),"\n")'

Option "--slave" informs R to run as quiet as possible.  Just option "-q" is not quiet enough.


(36) Prolog

Using swipl (SWI-Prolog):

cat sample.txt | (echo 'nr([H|T],N) :- read_line_to_codes(user_input, H), H \= end_of_file, !, nr(T,M), N is M+1. nr([],0).' > tmp.pl; swipl -q -s tmp -g "prompt(_,''), nr(X,N), write(N),nl,halt")

Option "-q" to inform swipl to run quietly.

Option "-s" to load the scipt file.  The extension name (.pl) can be omitted.

Option "-g" to inform swipl to set the "goals" to resolve.

"!" in Prolog is used to "cut" the (unnecessary) backtracing.  It is used here because we want the whole standard input, but not the partial lines.

prompt(_,'') to set the prompt to be empty, but not the default ':- ', which will be displayed when reading from the standard input.  The string in Prolog is delimited by '...' (single quotes).

nl to output the newline.

halt to quit the execution, not to enter the interactive mode.


(37) Scheme

Another Lisp language, using guile (GNU Ubiquitous Intelligent Language for Extensions):

cat sample.txt | guile -c '(use-modules (ice-9 rdelim)) (let ((n 0)) (while (not (eof-object? (read-line))) (set! n (+ n 1))) (display n)(newline))'

Option "-c" informs guile to evaluate the scheme expression.

In order to use "read-line", it is necessary to "use-modules (ice-9 rdelim)".  Without any argument, read-line will read from current-input-port (the standard input).

"eof-object? x" will check whether its argument is an end-of-file object.

"set!" acts like "setf" in Common Lisp.

"display x" to output its argument to the current-output-port (the standard output).

"newline" to output a newline obviously, just like (display "\n") does.

It would be neater if "read-line" is built-in.


(38) Emacs Lisp

Another Lisp language, using emacs (GNU Emacs):

cat sample.txt | emacs -batch -Q -eval '(let ((n 0)) (condition-case nil (while (read-string "") (setq n (+ n 1))) (error (prin1 n)(terpri))))'

Option "-batch" inform emacs to enter Batch Mode.  When in the batch mode, reading from "minibuffer" (read-string) would be redirected to read from the standard input.

Option "-Q" inform emacs to run quickly, without welcome messages, etc...

Option "-eval" asks emacs to evaluate the expression.

The argument in (read-string "") is to set the prompt empty.

(terpri), which means "terminate print", is used to output a newline just as (princ "\n") does, but shorter.

On eof, read-string does not return nil but raise an error "Error reading from stdin", sadly.  So we need to use (condition-case nil (...) (error ...)) to catch and handle the error.


(39) APL

Using GNU APL:

cat sample.txt | (cat; echo; echo EOF) | (echo $'∇N←NR' > tmp.apl; echo 'N←0' >> tmp.apl; echo "LOOP:R←⍞ ⋄ →(R≡'EOF')/0 ⋄ N←N+1 ⋄ →LOOP" >> tmp.apl; echo '∇' >> tmp.apl; echo 'NR' >> tmp.apl; echo ')OFF' >> tmp.apl; apl -s -f tmp.apl) | tail -2 | head -1

Option "-s" informs apl to enter the "script mode".

Option "-f" informs apl to load some file.

Note 1: Because not knowking how to check the eof situation, a sentinel text (EOF) is appended to the input stream, and the APL program (R≡'EOF') uses it to check the eof situation.  Just work around.

Note 2: Although in the script mode, the input strings (R←⍞) are still echoed to the stdout, so some utilities (tail -2 | head -1) are used to extract the final result (the last 2nd line).

Note 3: In the script mode, we muse use ")OFF" to quit apl.


(40) Erlang

Using erlc and erl (Erlang Compiler and Erlang Emulator):

cat sample.txt | (echo '-module(nr). -export([start/0]). nr() -> case io:get_line("") of eof -> 0; R -> 1 + nr() end. start() -> N = nr(), io:format("~w~n",[N]), init:stop().' > nr.erl; erlc -W0 nr.erl && erl -noshell -s nr)

Option "-W0" informs erlc to disable warning message (variable 'R' is unused).

Option "-noshell" to start erl runtime with no shell, suitable for pipes.

Option "-s" informs erl to run some function (default to "start()") of some module ("nr" here).

Note 1:  The module name used in "-module(nr)." muse be the same with the base filename.

Note 2: The argument of io:get_line("") is the prompt (not used here).

Note 3: The "~w" in io:format("~w~n",[N]) is the normal format for output, and "~n" for newline.

Note 4: "init:stop()." inform erl to stop, not entering the erl shell.  It could also be put on the command line of erl such as "erl -noshell -s nr -s init stop".

or

Using escipt (Erlang scripting):

cat sample.txt | (echo $'#\nnr() -> case io:get_line("") of eof -> 0; _Else -> 1 + nr() end. main(_) -> io:format("~w~n",[nr()]).\n' > nrs; escript nrs)


ANSI-C Quoting ($'...') is used here to generate newlines.

Note 1: The 1st line of the script must be reserved (for the shebang usage, such as: #!/usr/bin/escript).  And escript will just ignore it.

Note 2: The entry point of Erlang scripting is "main(_)".  The "_" (underscore) means matching any argument.

Note 3: The last newline is added, otherwise escript would complain ("Premature end of file reached").


(41) Factor


cat sample.txt | factor -e='USING: io math kernel prettyprint ; 0 [ readln ] [ 1 + ] while .'

Option "-e" informs factor to evaluate the codelet.

Note 1: while is in the kernel vocabulary (module), readln is in io, + is in math, and . is in prettyprint.

Note 2: Line counter is put on the stack, initial to be 0, and increment 1 (1 +) while readln get some string, and print out (with '.') at last.

Note 3:  [ ... ] , named quotation, means a piece of codelet (an anonymous function).

Note 4:  There is another 'factor' program (factoring a number into primes).  The PATH muse be set correctly.


(42) Groovy

cat sample.txt | groovy -e 'n=0; System.in.readLines().each{n+=1}; println n'

Option "-e" informs groovy to execute a command line script.

The code is quite neat.  But it runs a little slow (because the code is compiled into bytecodes first, and runs on JVM).


(43) Icon

Using icont (Icon Interpreter)

cat sample.txt | (echo 'procedure main(); n:=0; while read() do n:=n+1; write(n); end' > tmp.icn; icont -s tmp -x)

Option "-s" informs Icon to suppress messages.

Option "-x" informs Icon to execute.

Note 1: The extension name (.icn) of Icon files can be omitted on the command line.

Note 2: read() will get the next line or fail on eof.  Once fail, the while loop will stop.

Note 3: Icon expressions usually generate fail or something (on succeed).  So the expressions can be combined together.  Funny! 


(44) Io

cat sample.txt | (echo 'writeln(File standardInput readLines size)' > tmp.io; io tmp.io)

"File standardInput" will get the stdin stream.

"readLines" will get a list.

"size" of a list will get its number of items.  (While the size of a File will get file size in bytes.)


(45) J

Using ijconsole (J console) to count the newlines:

cat nonewline.txt | ijconsole -js 'exit echo +/ LF = (1!:1) 3'

Option "-js" informs ijconsole to execute the following script expressions.

"(1!:1) 3" to read (1!:1) the stdin (3) as a string.

"LF = ..." to compare each character with LF (LineFeed, the newline character), and generate a boolean (1 or 0) list.

"+/ ..." to sumup.

"echo ..." to format and output with a newline.

"exit ..." to end the script.

Note: The original name "jconsole" conflicts with the Jave console.  It is installed as "ijconsole" on Ubuntu.


(46) Forth

Using Gforth 0.7.0:

cat sample.txt | gforth -e '1024 constant m  create r m allot  : nr 0 begin r m stdin read-line throw while drop 1+ repeat drop ; nr . cr bye'

Option "-e" informs gforth to execute the following codes.

constant ( w "name" – ) to define a symbol (name) with a fixed value (w).

create ( "name" – ) to create a symbol (name).  It can create an array with a following allot.

allot ( n –  ) can allocate some unit (n) storage.

: ( "name" – ) to begin define a new word (name).  The definition is ended with ";".

0 is pushed onto stack as the line counter here.

begin code1 flag while code2 repeat is a quite excellent loop control structure with initialization (code1), condition (flag), and loop body (code2).

stdin ( – wfileid ) to get the standard input file id.

read-line ( c_addr u1 wfileid – u2 flag wior ) to read a line into a buffer (c_addr) with maximum length (u1) from some file (wfileid).  It returns how many data are read (u2), is there more data flag, and the io status (wior).

throw ( y1 .. ym nerror – y1 .. ym / z1 .. zn error ) to check the io status, if it is ok then just skip, otherwise the flow will transfer to the exception handler.

drop ( w – ) to just drop the top item on the stack.  It is the length of read data which is not used here.  The 'drop' after the 'repeat' is the same reason.

1+ ( n1 – n2 ) to add 1 to the top item on the stack.  It is one more line read here, so we increase the counter.

. ( n – ) to output the number on the stack top.

cr ( – ) to output a newline (cr means Carrage Return).

bye ( - ) to quit directly (not to enter the interactive mode).


(47) Mercury

Using mmc (Melbourne Mercury Compiler):

cat sample.txt | (echo ':- module nr. :- interface. :- import_module io. :- pred main(io::di, io::uo) is det. :- pred nr(int::out, io::di, io::uo) is det. :- implementation. :- import_module int. nr(N, !IO) :- io.read_line_as_string(Result, !IO), (if Result = eof then N = 0 else nr(M, !IO), N = M+1). main(!IO) :- nr(N, !IO), io.write_int(N, !IO), io.nl(!IO).' > nr.m; mmc nr && ./nr)

The basename must be the same with the module name (it is "nr" here).

Note: It took very long time to install Mercury (to run "sudo make install" on Ubuntu).


(48) Ocaml

The functional manner:

cat nonewline.txt | (echo 'let rec nr n = try input_line stdin; nr(n + 1) with End_of_file -> print_int n; print_newline();; nr 0' > nrf.ml; ocaml -w -10 nrf.ml)

Option "-w -10" informs Ocaml to disable warning message no.10 (Expression on the left-hand side of a sequence that doesn't have type unit), because the left hand side of "let r = input_line stdin" is omitted, which is not used here.

"let rec" is used to define a recursive function.

"try ... with ..." is used to capture exceptions ("End_of_file" exception here).

"input_line stdin" is used to read a line from the standard input channel.  The complete expression is "let r = input_line stdin in ...".

";" (semicolon) is used to separate sub-expressions.

"nr(n+1)" is a recursive call with the line number counter increased.

"print_int n" and "print_newline()" are used to output an integer and a newline.

";;" (double semicolon) is used to separate expressions.

"nr 0" is the first place to execute with the counter initialized to 0.


Or the imperative way:

cat sample.txt | (echo 'let n = ref 0;; try while true do input_line stdin; n := !n + 1 done with End_of_file -> print_int !n; print_newline()' > nr.ml; ocaml -w -10 nr.ml)

Option "-w -10" informs Ocaml to disable warning message no.10 (Expression on the left-hand side of a sequence that doesn't have type unit), because the left hand side of "let r = input_line stdin" is omitted, which is not used here.

"let n = ref 0" to declare a reference.  "!n" to dereference it.  "n := !n + 1" to increment it.

";;" (double semicolon) is used to separate expressions.

"try ... with ..." is used to capture exceptions ("End_of_file" exception here).

"while ... do ... done" is the while loop structure.

"input_line stdin" is used to read a line from the standard input channel.  The complete expression is "let r = input_line stdin in ...".

";" (semicolon) is used to separate sub-expressions.

"print_int !n" and "print_newline()" are used to output an integer and a newline.


(49) Pure

cat sample.txt | (echo 'using system; let s = fget stdin; let n = if null s then 0 else # split "\n" s; printf "%d\n" n;' > nr.pure; pure nr.pure)

"using system" declares the system interface for basic I/O operations here, including fget, stdin, and printf.

";" (semicolon) is used to end a rule.

"let ... = ...;" is used to bind some expression to a variable.

"fget ..." can read the entire file into a string.

"stdin" is the standard input stream.

"if ... then ... else ..." is the conditional expression.

"null ..." is a condition test.

"# ..." is used to count the elements in a list.

"split delim s" is used to split a string into a list with some delimiter (newline, "\n", here).

"printf format args" is used to output some args according to the format. 

Note: The result would be 1 more if the last character is a newline because "split" is used here.


(50) ALGOL

Using a68g (Algol 68 Genie)

cat sample.txt | a68g -e 'on logical file end(stand in, (REF FILE f)BOOL: done); INT n := 0; DO STRING r; read(r); n +:= 1; read(new line) OD; done: printf(($%d$, n))'

Option "-e" informs a68g to execute the following script.

"on logical file end" declares the event handler when "logical file end" happens.

"stand in" means the standard input stream.

"(REF FILE f)BOOL: done" is the event handler (or label).

";" (semicolon) is used to end an expression.
"INT n := 0;" declares an INT variable with an initial value.

"DO ... OD;" is an infinite loop structure.  It will end when the "file end" event happens.

"STRING r;" declares an STRING variable.

"read(r);" will read a string from "stand in" without the ending newline.

"n +:= 1;" increments the line counter by 1.

"read(new line)" will read (skip) the "new line" (character).

"done:" is the event handler declared before.

"printf(($%d$, n))" will output the line counter (n).  "$...$" is the format string.  "%d" is the decimal integer format of the C style.  If we use "print(n)" it will output some spaces and a "+" sign by default, such as "         +3".

Note: The identifiers of ALGOL can have spaces within, such as "logical file end", "stand in", "new line", etc...


(51) Clojure

Clojure is somewhat like a functional Scheme.  It is famous by its immutable variables.

cat sample.txt | (echo '(let [n (atom 0)] (while (read-line) (swap! n inc)) (println @n))' > nr.clj; clojure nr.clj)

"(let [...] ...)" will bind vaiables and do some work in a lexical scope.

"[n (atom 0)]" is a vector, used to bind something (atom 0 here) to a variable (n here) in let.

"(atom 0)" to make an atom.

"(while (...) ...)" is the loop structure, test some condition, and do some works when the condition is true (non nil or non FALSE).

"(read-line)" will read a line from the *in* stream.  On eof, it will return nil.

"(swap! n inc)" will apply some function (inc here) to the atom (n here), and swap the atom value to this new value.  Interesting.

"(inc n)" will return its argument by 1 (n + 1 here).

"(println @n)" will print its argument with a newline.

"@n" is the same as "(deref n)".  It will dereference its argument.

Note: Although clojure support "-e ..." option to execute a script, but it does not work in this short codelet (always get 0).  Why?


(52) GAP

GAP (Groups, Algorithms and Programming) is a computer algebra system with a built-in programming language.

cat sample.txt | (echo 'stdin := InputTextUser(); n := 0; while ReadLine(stdin) <> fail do n := n + 1; od; Display(n); QUIT;' > nr.g; gap -q nr.g)

Option "-q" informs gap to be quiet, not to show welcome messages, etc.

"var := expr;" is the assignment statement.

"InputTextUser()" will return the standard input stream.

"while bool-expr do statements od;" is the while loop structure.

"ReadLine(stream)" will return a line string or fail.

"... <> fail" is an inequality comparison.  It is used to check whether eof happened here.

"Display(n)" is used to print formatted output with a newline.

"QUIT" is used to leave gap interactive environment.  It must be capitalized.  Small case "quit" will not work.

Note: GAP identifers are case-sensitive.


(53) Rexx

Rexx (Restructured Extended Executor) appeared in 1979, and is used for scripting or being a macro language. Using regina 3.5 (Regina Rexx Interpreter) here:

cat sample.txt | (echo 'n=0; do while lines()>0; r=linein(); n=n+1; end; say n' > nr.rex; rexx nr)

";" (semicolon) is not necessary always if there is a newline.

"do while bool-expr; ... end" is the while loop structure.

"lines()" will return the number of lines which are not being read.  But actually it return 0 or 1 in this little test.  Stdin will be used as the input stream if there is no parameter.

"linein()" will read 1 line.  Stdin will be used as the input stream if there is no parameter.

"say" will output with newline.

Note: The testing result will be 1 even if the input is empty.  And the result will be 1 more if the last character is a newline.  This version of implementation seems having some minor bugs, although its effect is not important.


(54) GUN Octave

GNU Octave is used for numerical computation mainly.

cat sample.txt | octave -q --eval 'n=0; while ~feof(stdin) r=fgetl(stdin); n++; end; printf("%d\n",n)'

Option "-q" informs octave to be quiet, not to printout welcome messages.

Option "--eval" informs octave to evaluate the following code.

";" (semicolon) is used as separators or terminators.

"while bool-exp ... end;" is the while loop structure.  "while (cond) ... endwhile" also works.

"~" is the boolean not operator.  "!" also works.

"feof(fid)" will return 1 on eof condition happened.

"stdin" or "stdin()" will return the standard input stream id.

"fgetl(fid)" will read 1 line without the newline attached.

"printf(format, ...)" support the C-style formatted output.  If we use "disp(n)" instead, there would be extra spaces.

Note: feof() will always return 0 at the first, even if the file is empty.  It will return 1 only if the eof condition did happen (after the read operation).  So, the result will be 1 if the input file is empty.  Sadly.


(55) Oz

Oz (the Mozart Programming System) is a logic programming language basically.  "ozc" is the OZ Compiler (and executer).

cat sample.txt | (echo 'local  class TextFile from Open.file Open.text end  In = {New TextFile init(name:stdin)}  fun {Nr In} if {In getS($)} == false then 0 else 1 + {Nr In} end end  in {System.show {Nr In}} end' > nr.oz; ozc -l System,Open nr)

Option "-l" informs ozc which modules are to be used (System and Open here).

"local ... in ... end" is the overall structure.

"class ... from ... end" is the class structure.  Class "TextFile" is defined here, which inherits Open.file (for init) and Open.text (a mixin, for getS).

"In = {New TextFile init(name:stdin)}" declares an immutable variable (In) of class TextFile which will read data from the standard input stream.

"fun {funname par} ... end"  is the function definition structure.

"if bool-expr then ... else ... end" is the if structure.

"{...}" is used to call a function (or procedure). 

"getS()" will return the text string without the ending newline, or false on eof.

"{System.show ...}" will output with a newline.


(56) PARI/GP

PARI/GP is an algebra system basically.  Its focus is on the computation of number theory.  PARI is the library.  GP is the name of its script language, and also its command line interface (gp).

cat sample.txt | (echo 'print(#readstr("/dev/stdin"))' > nr.gp; gp -q nr)

Option "-q" informs gp to run quietly without extra messages.

"print(...)" will print a string with an ending newline.


"#x" or "length(x)" will return the no. of elements in x.

"readstr(filename)" will return a vector of lines of the whole file.  This function was added at version 2.5.5.

This solution is quite neat!


(57) PicoLisp

PicoLisp is a succinct dialect of Lisp.

cat sample.txt | picolisp -'prinl (lines "/dev/stdin")' -bye

"(prinl 'any ..)" will print out with an ending newline.  The last character in "prinl" is the letter "l" not the number "1".

"(lines 'filename ..)" will return the sum of the number of lines in the file(s).  But actually it is counting the number of newlines.

"(bye)" will do some housekeeping work and then exit picolisp.


(58) Pike

Pike is a little like C with OO extensions, etc.

cat sample.txt | pike -e 'int n; while (Stdio.stdin->gets()) n++; write(n+"\n")'

Option "-e" informs pike to execute the following expression (and exit).

"int n" declares the variable n (with initial value 0 by default)

"while (...) ...;" is the while loop structure.

"Stdio.stdin->gets()" will return 1 line or 0 if no more data.

"n++" to increment n by 1.

"write(n+"\n")" to output the data.  "\n" is the newline.

Note: If we want to use the script file we have to add "void main() {...;}" around this command-line script (do not forget the last ";").  Save it as xxx.pike, and run it such as "cat sample.txt | pike xxx".


(59) Racket

Racket is a "programmable programming language" of the Lisp family.

cat sample.txt | racket -e '(define (nr) (if (eof-object? (read-line)) 0 (+ (nr) 1))) (displayln (nr))'

Option "-e" infroms racket to evaluate the following expression (and print the results).

"(define (fn arg) body)" is used to define a function.

 "(if test true-expr false-expr)" is the simple conditional expression.

"(read-line)" will read a line from the "(current-input-port)" (default to stdin) when there is no argument.

"(eof-object? a)" will return #t (true) if its argument a is eof.

"(displayln datum)" will output the datum with a newline.










沒有留言:

張貼留言