Amazon

Friday, September 2, 2011

Programming Praxis: Two String Exercises

Today's exercise has two parts. First, replace multiple occurrences of a character in a string with just the first occurrence. So, aaabbb becomes ab and abcbd becomes abcd.

The second part is to replace multiple consecutive spaces with one space. In this case a b stays unchanged while a   b becomes a b.

These kinds of problems are rather trivial in Clojure due to the rich set of functions you get out of the box.

;;Remove dups & mult spaces
(def s1 "aaabbb")
(def s2 "abcbd")
(def s3 "abcd")
(def s4 "a b");
(def s5 "a  bb    x b");

(defn remove-dup-chars [s]
  (apply str (distinct s)))

(defn remove-consecutive-spaces [s]
  (apply str (mapcat (fn [l] (if (= (first l) \space)
                               (distinct l)
                               l))
                     (partition-by
                      (partial = \space)
                      s))))

In the first case, we use distinct to remove the duplicates and the apply str to turn it back into a string.

In the second case, we partition the string breaking on strings. Then we mapcat using our function that looks at the sublists and removes duplicate spaces. Since mapcat flattens our sublists into 1 list we can just use apply str on the result to turn it back into a string.

gist

4 comments:

  1. i don't know the syntax off hand, but i'm pretty sure you can split on whitespace and then join with only one space.

    ReplyDelete
  2. You can also use a set for the first problem. Distinct is quite a bit slower (but it is lazy).

    strings=>
    (time (apply str (into #{} "aaabbbbcccc")))
    "Elapsed time: 0.173819 msecs"
    "abc"

    strings=>
    (time (apply str (distinct "aaabbbbcccc")))
    "Elapsed time: 2.528396 msecs"
    "abc"

    ReplyDelete
  3. Upon further testing, it appears set doesn't always maintain the order of the string. So distinct is the way to go if you need to preserve order.

    ReplyDelete
  4. Using Clojure String utilities:

    (clojure.string/replace s5 #"\s+" " ")

    ReplyDelete