Swift Tip: An NSScanner Alternative
For one reason or another, we find ourselves writing small scanners and parsers quite often. Sometimes we parse a specific file format, or a small expression language, or just a file name that conforms to a certain naming scheme.
One approach is to use the Scanner
class from Foundation (it used to be called NSScanner
). A Scanner
instance stores the scanned String
and a scan location, which is the position in the string. For example, scanning a single character just returns the character at the current scan location and increases the scan location.
In pure Swift, there's another type that stores a String
and an offset into that String
: Substring
. Instead of using a scanner, we could write mutating methods on Substring
. As an illustration, here are three such methods. The first matches a character that matches a certain condition, the second scans exactly count
characters, and the last scans a specific prefix:
extension Substring {
mutating func scan(_ condition: (Element) -> Bool) -> Element? {
guard let f = first, condition(f) else { return nil }
return removeFirst()
}
mutating func scan(count: Int) -> Substring? {
let result = prefix(count)
guard result.count == count else { return nil }
removeFirst(count)
return result
}
mutating func scan<C>(prefix: C) -> Bool where C: Collection, C.Element == Character {
guard starts(with: prefix) else { return false }
removeFirst(prefix.count)
return true
}
}
To use this with strings, we first need to make a mutable Substring
out of a String
, and then we can call the scan
method:
var remainder = "value: 123"[...]
if remainder.scan(prefix: "value: "),
let firstDigit = remainder.scan({ "0123456789".contains($0) }) {
print(firstDigit)
}
You can write a whole bunch of these scanning methods, there is no need for an extra Scanner
type. You can even write "higher-order" scanners, like this:
extension Substring {
mutating func many<A>(until end: Character, _ f: (inout Substring) throws -> A, separator: (inout Substring) throws -> Bool) throws -> [A] {
// ... left as an exercise
}
}
So far, we could have done similar things with a Scanner
. However, one of the fun things about Swift is that the code we write is actually far more generic! Instead of defining it on Substring
, we can define it on any Collection
that supports removeFirst
. Reviewing the method's definition, we learn that it exists on any collection that has itself as a Subsequence
. This means we only have to change the definition of the method, but not the method body:
extension Collection where SubSequence == Self {
mutating func scan(_ condition: (Element) -> Bool) -> Element? {
guard let f = first, condition(f) else { return nil }
return removeFirst()
}
mutating func scan(count: Int) -> Self? {
let result = prefix(count)
guard result.count == count else { return nil }
removeFirst(count)
return result
}
}
extension Collection where SubSequence == Self, Element: Equatable {
mutating func scan<C>(prefix: C) -> Bool where C: Collection, C.Element == Element {
guard starts(with: prefix) else { return false }
removeFirst(prefix.count)
return true
}
}
Now we can use our scan
method on many other types as well, most notably ArraySlice
and Data
. For example, we can use it to parse the beginning of a GIF header:
var t = try! Data(contentsOf: URL(string: "https://media.giphy.com/media/gw3IWyGkC0rsazTi/giphy.gif")!)[...]
guard t.scan(prefix: [71,73,70]), // GIF
let version = t.scan(count: 3), // 87a or 89a
let width = t.scan(count: 2),
let height = t.scan(count: 2)
else {
fatalError()
}
print(version, width, height)
For further inspiration, see this gist by Michael Ilseman.
In Swift Talk Episode 78 (a public episode), we show how to work with Swift's String
and Substring
types by writing a simple CSV parser.
To support our work you can subscribe, or give someone a gift.