Let Emacs be your file processing engine, and use shell to drive it for batch processing!

Emacs has so many wonderful text processing functions that are not available elsewhere, such as align-regexp. Therefore, it will be very attractive to write an elisp script, and use shell to call Emacs to apply the script to many files.

I have some examples for this implementation:

In ~/my_elisp.el

;; my_elisp.el starts here

;; define the function for text processing
(defun format-rpt ()
“A function to format the OCR-processed reports for further excel import.”
(interactive)
;; align the table of data–> aligned and make the numbers comma separated
(goto-char (point-min))
(re-search-forward “land-mark-regexp-pattern” nil t) ;; find the land mark in the file to start the alignment
(beginning-of-line)
(let ((beg (point))) (align-regexp beg (point-max) “\\(\\s-*\\) ” 1 1 nil)) ;; align the first time
(while (re-search-forward ” \\{2,\\}\\b” nil t) ;; add comma for excel importing “comma separated file”
(insert “,”))
(goto-char (point-min))
(re-search-forward “land-mark-regexp-pattern” nil t) ;; find the land mark in the file to start the alignment
(beginning-of-line)
(let ((beg (point))) (align-regexp beg (point-max) “\\(\\s-*\\),” 1 4 nil)) ;; align the second time
(write-file “./save_to.txt” nil) ;; save file
)

(format-rpt) ;; call the function for text processing

;; my_elisp.el ends here

Then in bash (I used cygwin), use find to drive the Emacs:

find . -name “ocr.txt” -printf ‘%h\n’| while read dir; do (cd “$dir”; emacs –no-site-file -nw –batch ocr.txt -l ~/my_lisp) done

This special find calling pattern can deal with the spaces in path in Windows OS, and make it easy for cygwin.

mintty parameters

When starting Mintty, it’s better to include the following parameters:
-e /bin/bash –login

Otherwise Mintty will use the path of windows: system PATH first, user PATH second, then cygwin path ==> this will cause Windows versions of cygwin commands such as FIND to be executed.

Other tip: /cygwin/etc/profile and ~/.bashrc are the common two files controlling the behavior of bash shells.