Emacs has so many wonderful text processing functions that are not available elsewhere, such as align-regexp. Therefore, it will be very attractive to write an elisp script, and use shell to call Emacs to apply the script to many files.
I have some examples for this implementation:
In ~/my_elisp.el
;; my_elisp.el starts here
;; define the function for text processing
(defun format-rpt ()
“A function to format the OCR-processed reports for further excel import.”
(interactive)
;; align the table of data–> aligned and make the numbers comma separated
(goto-char (point-min))
(re-search-forward “land-mark-regexp-pattern” nil t) ;; find the land mark in the file to start the alignment
(beginning-of-line)
(let ((beg (point))) (align-regexp beg (point-max) “\\(\\s-*\\) ” 1 1 nil)) ;; align the first time
(while (re-search-forward ” \\{2,\\}\\b” nil t) ;; add comma for excel importing “comma separated file”
(insert “,”))
(goto-char (point-min))
(re-search-forward “land-mark-regexp-pattern” nil t) ;; find the land mark in the file to start the alignment
(beginning-of-line)
(let ((beg (point))) (align-regexp beg (point-max) “\\(\\s-*\\),” 1 4 nil)) ;; align the second time
(write-file “./save_to.txt” nil) ;; save file
)
(format-rpt) ;; call the function for text processing
;; my_elisp.el ends here
Then in bash (I used cygwin), use find to drive the Emacs:
find . -name “ocr.txt” -printf ‘%h\n’| while read dir; do (cd “$dir”; emacs –no-site-file -nw –batch ocr.txt -l ~/my_lisp) done
This special find calling pattern can deal with the spaces in path in Windows OS, and make it easy for cygwin.
Why not put a sesquicolon line at the top of ~/my_elisp.el?
http://www.emacswiki.org/emacs/EmacsScripts
Can you explain more about the sesquicolon line? What’s the meaning of it?