본문 바로가기
VB.net

VB.NET에서 정규식 사용하기

by 호야호잇 2018. 9. 17.

VB.NET 2008 에서 정규식 사용하기

 

정규식 클래스 : System.Text.RegularExpressions.Regex

 

예제1 - 일치하는 패턴 목록 구하기

Dim regexPattern    As String
Dim regex           As Regex
Dim regexMatches    As MatchCollection
Dim strSource    As String


strSource   = "~~~~~원본 문자열~~~~~"
regexPattern  = "(?<lbl>([\x27]([l]|[L])([b]|[B])([l]|[L])_\w*[\x27])|([\x22]([l]|[L])([b]|[B])([l]|[L])_\w*[\x22]))"
regex     = New Regex(regexPattern)
regexMatches  = regex.Matches(strSource)

For j As Integer = 0 To regexMatches.Count - 1
    '찾은 문자열에 대해서 어떤 구체적인 작업
Next

 

 

예제2 - 일치하는 패턴 치환하기

 
The program creates a Regex object, passing its constructor a regular expression pattern that will identify text to replace. It calls the object's Replace method, passing it the replacement pattern.
 
Private Sub btnGo_Click(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles btnGo.Click Try Dim reg_exp As New Regex(txtPattern.Text) lblResult.Text = reg_exp.Replace(txtInput.Text, _ txtReplacementPattern.Text) Catch ex As Exception MessageBox.Show(ex.Message) End Try End Sub
The trick in this example lies in the search and replacement patterns. In this example, the search pattern is "(?m)^([^,]*), (.*)$". The pieces of this expression have the following meanings:

(?m)This is an option directive that indicates a multi-line string. This makes the ^ and $ characters match the beginning and end of a line rather than the beginning and end of the string.
^Match the beginning of a line.
([^,]*)Match any character other than comma any number of times. This part is enclosed in parentheses so it forms the first match group.
,Match a comma followed by a space.
(.*)Match any character any number of times. This part is enclosed in parentheses so it forms the second match group.
$Match the end of the line.

The replacement pattern is "$2 $1". This says to replace the stuff that was matched with the second match group, a space, and then the first match group.

This example takes this text:

Archer, Ann Baker, Bob Carter, Cindy Deevers, Dan

And converts it into this:

Ann Archer Bob Baker Cindy Carter Dan Deevers

 

 

 

정규식 표현법

Metacharacters Defined
MCharDefinition
^Start of a string.
$End of a string.
.Any character (except \n newline)
|Alternation.
{...}Explicit quantifier notation.
[...]Explicit set of characters to match.
(...)Logical grouping of part of an expression.
*0 or more of previous expression.
+1 or more of previous expression.
?0 or 1 of previous expression; also forces minimal matching when an expression might match several strings within a search string.
\Preceding one of the above, it makes it a literal instead of a special character. Preceding a special matching character, see below.
Character Escapes http://tinyurl.com/5wm3wl
Escaped CharDescription
ordinary charactersCharacters other than . $ ^ { [ ( | ) ] } * + ? \ match themselves.
\aMatches a bell (alarm) \u0007.
\bMatches a backspace \u0008 if in a []; otherwise matches a word boundary (between \w and \W characters).
\tMatches a tab \u0009.
\rMatches a carriage return \u000D.
\vMatches a vertical tab \u000B.
\fMatches a form feed \u000C.
\nMatches a new line \u000A.
\eMatches an escape \u001B.
\040Matches an ASCII character as octal (up to three digits); numbers with no leading zero are backreferences if they have only one digit or if they correspond to a capturing group number. (For more information, see Backreferences.) For example, the character \040 represents a space.
\x20Matches an ASCII character using hexadecimal representation (exactly two digits).
\cCMatches an ASCII control character; for example \cC is control-C.
\u0020Matches a Unicode character using a hexadecimal representation (exactly four digits).
\*When followed by a character that is not recognized as an escaped character, matches that character. For example, \* is the same as \x2A.

 

 

Character Classes http://tinyurl.com/5ck4ll
Char ClassDescription
.Matches any character except \n. If modified by the Singleline option, a period character matches any character. For more information, see Regular Expression Options.
[aeiou]Matches any single character included in the specified set of characters.
[^aeiou]Matches any single character not in the specified set of characters.
[0-9a-fA-F]Use of a hyphen (–) allows specification of contiguous character ranges.
\p{name}Matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.
\P{name}Matches text not included in groups and block ranges specified in {name}.
\wMatches any word character. Equivalent to the Unicode character categories [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is equivalent to [a-zA-Z_0-9].
\WMatches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \W is equivalent to [^a-zA-Z_0-9].
\sMatches any white-space character. Equivalent to the Unicode character categories [\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \s is equivalent to [ \f\n\r\t\v].
\SMatches any non-white-space character. Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \S is equivalent to [^ \f\n\r\t\v].
\dMatches any decimal digit. Equivalent to \p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior.
\DMatches any nondigit. Equivalent to \P{Nd} for Unicode and [^0-9] for non-Unicode, ECMAScript behavior.

 

정규 표현식에서 사용하는 Special characters

\\ 다음에 나오는 특수 문자를 문자열로 인식
^라인의 처음과 패턴과 매치
$라인의 끝과 패턴과 매치
*0개 이상의 문자와 매치(모든것이라는 의미)
+1개 이상의 문자와 매치, {1,}와 같은 의미
?0 또는 1개의 문자
.1개의 문자와 일치
()한번 match를 수행해서 나온 결과를 기억
|OR
{n}정확히 n개의 문자
{n,}n개 이상의 문자
{n,m}n이상 m이하의 문자
[xyz]캐릭터 셋
[^xyz]네가티브(-) 캐릭터 셋
[\b]백스페이스와 매치
\b단어의 시작 또는 끝에서 빈 문자열과 매치
\B단어의 시작 또는 끝이 아닌 곳에서의 빈 문자열과 매치
\cXcontrol 문자와 매치
\d0부터 9까지의 아라비아 숫자와 매치. [0-9]과 같은 의미
\fform-feed와 매치
\nlinefeed와 매치
\r캐리지 리턴과 매치
\s화이트스페이스 문자와 매치. [ \t\n\r\f\v]과 같은 의미
\S\s가 아닌 문자들과 매치. [^ \t\n\r\f\v]과 같은 의미
\t탭 의미
\v수직 탭 의미
\ww는 문자가 아닌 0, 1, 2, 3 ... 등과 같은 숫자를 의미
\WW는 문자가 아닌 요소, 즉 % 등과 같은 특수 문자를 의미함
\nn은 마지막 일치하는 문장, n은 1-9의 정수

 

정규 표현식과 함께 사용하는 함수들

exec문장에서 매치를 위해 검색을 수행하는 정규 표현식 메소드
test문장에서 매치를 위해 테스트하는 정규표현식 메소드
match문장에서 매치를 위해 검색을 수행하는 string 메소드
search문장에서 매치를 위해 테스트하는 string 메소드
replace문장에서 매치를 위해 검색을 실행하고 문장을 대체하는 String 메소드
split문장에서 매치하는 부분을 배열에 할당하는 String 메소드

 

 

Sample Expression  

 

PatternDescription
^\d{5}$5 numeric digits, such as a US ZIP code.
^(\d{5})|(\d{5}-\d{4}$ 5 numeric digits, or 5 digits-dash-4 digits. This matches a US ZIP or US ZIP+4 format. 
^(\d{5})(-\d{4})?$  Same as previous, but more efficient. Uses ? to make the -4 digits portion of the pattern optional, rather than requiring two separate patterns to be compared individually (via alternation).
^[+-]?\d+(\.\d+)?$ Matches any real number with optional sign. 
^[+-]?\d*\.?\d*$ Same as above, but also matches empty string. 
^(20|21|22|23|[01]\d)[0-5]\d$ Matches any 24-hour time value. 
/\*.*\*/ Matches the contents of a C-style comment /* … */ 

--.*          SQL Comment --