Module v.token stdlib

v.token
Version:
0.3.3
License:
MIT
Dependencies from vmod:
0
Imports:
0
Imported by:
17
Repository:
OS-specific
Show selected OS-specific symbols.
Backend-specific
Show selected Backend-specific symbols.

Dependencies defined in v.mod

This section is empty.

Imports

This section is empty.

Overview

v.token is a module providing the basic building blocks of the V syntax - the tokens, as well as utilities for working with them.

KeywordsMatcherTrie

KeywordsMatcherTrie provides a faster way of determining whether a given name is a reserved word (belongs to a given set of previously known words R). It works by exploiting the fact, that the set of reserved words is small, and the words short.

KeywordsMatcherTrie uses an ordered set of tries, one per each word length, that was added, so that rejecting that something is a reserved word, can be done in constant time for words smaller or larger in length than all the reserved ones.

After a word w, is confirmed by this initial check by length n, that it could belong to a trie Tn, responsible for all known reserved words of that length, then Tn is used to further verify or reject the word quickly. In order to do so, Tn prepares in advance an array of all possible continuations (letters), at each index of the words R, after any given prefix, belonging to R.

For example, if we have added the word asm to the trie T3, its tree (its nodes) may look like this (note that the 0 pointers in children, mean that there was no word in R, that had that corresponding letter at that specific index):

TrieNode 0:  a b c d e f g h i j k l m n o p q r s t u v w x y z ... |
| children:  1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... | children[`a`] = 1 -> TrieNode 1
|   prefix so far: ''    | value: 0                                  |
|
TrieNode 1:  a b c d e f g h i j k l m n o p q r s t u v w x y z ... |
| children:  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 ... | children[`s`] = 2 -> TrieNode 2
|   prefix so far: 'a'   | value: 0                                  |
|
TrieNode 2:  a b c d e f g h i j k l m n o p q r s t u v w x y z ... |
| children:  0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 ... | children[`m`] = 3 -> TrieNode 3
|   prefix so far: 'as'  | value: 0                                  | Note: `as` is a keyword with length 2,
|                                                                      but we are searching in T3 trie.
|
TrieNode 3:  a b c d e f g h i j k l m n o p q r s t u v w x y z ... |
| children:  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... | all of children are 0
|   prefix so far: 'asm' | value: int(token.Kind.asm)                |

Matching any given word in the trie, after you have prepared it, is then simple: just read each character of the word, and follow the corresponding pointer from the children array (indexed by character). When the pointer is nil, there was NO match, and the word is rejected, which happens very often, and early for most words that are not in the set of the previously added reserved words. One significant benefit compared to just comparing the checked word against a linear list of all known words, is that once you have found that a word is not a match at any given level/trie node, then you know that it is not a match to any of them.

Note: benchmarking shows that it is ~300% to 400% faster, compared to just using token.keywords[name] on average, when there is a match, but it can be 17x faster in the case, where there is a length mismatch. After changes to KeywordsMatcherTrie, please do v -prod run vlib/v/tests/bench/bench_compare_tokens.v to verify, that there is no performance regression.

Aliases

This section is empty.

Constants

#constant assign_tokens

assign_tokens   = [Kind.assign, .plus_assign, .minus_assign, .mult_assign, .div_assign,
	.xor_assign, .mod_assign, .or_assign, .and_assign, .right_shift_assign, .left_shift_assign,
	.unsigned_right_shift_assign]

#constant valid_at_tokens

valid_at_tokens = ['@VROOT', '@VMODROOT', '@VEXEROOT', '@FN', '@METHOD', '@MOD', '@STRUCT',
	'@VEXE', '@FILE', '@LINE', '@COLUMN', '@VHASH', '@VMOD_FILE', '@FILE_LINE']

#constant token_str

token_str       = build_token_str()

#constant keywords

keywords        = build_keys()

#constant scanner_matcher

pub const scanner_matcher = new_keywords_matcher_trie[Kind](keywords)

Sum types

This section is empty.

Functions

#fn assign_op_to_infix_op

fn assign_op_to_infix_op(op Kind) Kind

#fn build_precedences

fn build_precedences() []Precedence

#fn is_decl

inline
fn is_decl(t Kind) bool

#fn is_key

inline
fn is_key(key string) bool

#fn kind_from_string

fn kind_from_string(s string) !Kind

#fn kind_to_string

fn kind_to_string(k Kind) string

#fn new_keywords_matcher_from_array_trie

fn new_keywords_matcher_from_array_trie(names []string) KeywordsMatcherTrie

new_keywords_matcher_from_array_trie creates a new KeywordsMatcherTrie instance from a given array of strings. The values for the strings, that find will return, will be the indexes in that array.

Structs

#struct KeywordsMatcherTrie

heap
pub struct KeywordsMatcherTrie {
pub mut:
	nodes   []&TrieNode
	min_len int = 999999
	max_len int
}

KeywordsMatcherTrie provides a faster way of determinining whether a given name is a reserved word (belongs to a given set of previously known words R).

See the module description for more details.

#fn new_keywords_matcher_trie[T]

fn new_keywords_matcher_trie[T](kw_map map[string]T) KeywordsMatcherTrie

new_keywords_matcher_trie creates a new KeywordsMatcherTrie instance from a given map with string keys, and integer or enum values.

#fn (&KeywordsMatcherTrie) find

direct_array_access
fn (km &KeywordsMatcherTrie) find(word string) int

find tries to find the given word in the set of all previously added words to the KeywordsMatcherTrie instance. It returns -1 if the word was NOT found there at all. If the word was found, find will return the value (value => 0), associated with the word, when it was added.

#fn (&KeywordsMatcherTrie) matches

inline
fn (km &KeywordsMatcherTrie) matches(word string) bool

#fn (&KeywordsMatcherTrie) add_word

direct_array_access
fn (mut km &KeywordsMatcherTrie) add_word(word string, value int)

add_word adds the given word to the KeywordsMatcherTrie instance. It associates a non negative integer value to it, so later find could return the value, when it succeeds.

#struct TrieNode

pub struct TrieNode {
pub mut:
	children [123]&TrieNode
	value    int = -1 // when != -1, it is a leaf node representing a match
}

TrieNode is a single node from a trie, used by KeywordsMatcherTrie

#fn new_trie_node

fn new_trie_node() &TrieNode

new_trie_node creates a new TrieNode instance

#fn (&TrieNode) show

fn (node &TrieNode) show(level int)

show displays the information in node, in a more compact/readable format (recursively)

#fn (&TrieNode) add_word

fn (mut node &TrieNode) add_word(word string, value int, word_idx int)

add_word adds another word and value pair into the trie, starting from node (recursively).

word_idx is jsut used as an accumulator, and starts from 0 at the root of the tree.

#fn (&TrieNode) find

direct_array_access
fn (root &TrieNode) find(word string) int

find tries to find a match for word to the trie (the set of all previously added words).

It returns -1 if there is no match, or the value associated with the previously added matching word by add_word.

#struct Pos

pub struct Pos {
pub:
	len     int // length of the literal in the source
	line_nr int // the line number in the source where the token occured
	pos     int // the position of the token in scanner text
	col     int // the column in the source where the token occured
pub mut:
	last_line int // the line number where the ast object ends (used by vfmt)
}

#fn (&Pos) free

unsafe
fn (mut p &Pos) free()

#fn (Pos) line_str

fn (p Pos) line_str() string

#fn (Pos) extend

fn (pos Pos) extend(end Pos) Pos

#fn (Pos) extend_with_last_line

fn (pos Pos) extend_with_last_line(end Pos, last_line int) Pos

#fn (&Pos) update_last_line

fn (mut pos &Pos) update_last_line(last_line int)

#struct Token

minify
pub struct Token {
pub:
	kind    Kind   // the token number/enum; for quick comparisons
	lit     string // literal representation of the token
	line_nr int    // the line number in the source where the token occurred
	col     int    // the column in the source where the token occurred
	// name_idx int // name table index for O(1) lookup
	pos  int // the position of the token in scanner text
	len  int // length of the literal
	tidx int // the index of the token
}

#fn (&Token) pos

inline
fn (tok &Token) pos() Pos

#fn (Token) str

fn (t Token) str() string

#fn (Token) debug

fn (t Token) debug() string

#fn (Token) precedence

direct_array_accessinline
fn (tok Token) precedence() int

precedence returns a tokens precedence if defined, otherwise 0

#fn (Token) is_scalar

inline
fn (tok Token) is_scalar() bool

is_scalar returns true if the token is a scalar

#fn (Token) is_unary

inline
fn (tok Token) is_unary() bool

is_unary returns true if the token can be in a unary expression

Interfaces

This section is empty.

Enums

#enum Kind

pub enum Kind {
	unknown
	eof
	name // user
	number // 123
	string // 'foo'
	str_inter // 'name=$user.name'
	chartoken // `A` - rune
	plus // +
	minus // -
	mul // *
	div // /
	mod // %
	xor // ^
	pipe // |
	inc // ++
	dec // --
	and // &&
	logical_or // ||
	not // !
	bit_not // ~
	question // ?
	comma // ,
	semicolon // ;
	colon // :
	arrow // <-
	amp // &
	hash // #
	dollar // $
	at // @
	str_dollar
	left_shift // <<
	right_shift // >>
	unsigned_right_shift // >>>
	not_in // !in
	not_is // !is
	assign // =
	decl_assign // :=
	plus_assign // +=
	minus_assign // -=
	div_assign // /=
	mult_assign // *=
	xor_assign // ^=
	mod_assign // %=
	or_assign // |=
	and_assign // &=
	right_shift_assign // <<=
	left_shift_assign // >>=
	unsigned_right_shift_assign // >>>=
	lcbr // {
	rcbr // }
	lpar // (
	rpar // )
	lsbr // [
	nilsbr // #[
	rsbr // ]
	eq // ==
	ne // !=
	gt // >
	lt // <
	ge // >=
	le // <=
	comment
	nl
	dot // .
	dotdot // ..
	ellipsis // ...
	keyword_beg
	key_as
	key_asm
	key_assert
	key_atomic
	key_break
	key_const
	key_continue
	key_defer
	key_else
	key_enum
	key_false
	key_for
	key_fn
	key_global
	key_go
	key_goto
	key_if
	key_import
	key_in
	key_interface
	key_is
	key_match
	key_module
	key_mut
	key_nil
	key_shared
	key_lock
	key_rlock
	key_none
	key_return
	key_select
	key_like
	key_sizeof
	key_isreftype
	key_likely
	key_unlikely
	key_offsetof
	key_struct
	key_true
	key_type
	key_typeof
	key_dump
	key_orelse
	key_union
	key_pub
	key_static
	key_volatile
	key_unsafe
	key_spawn
	keyword_end
	_end_
}

#fn (Kind) is_assign

inline
fn (t Kind) is_assign() bool

#fn (Kind) str

inline
fn (t Kind) str() string

note: used for some code generation, so no quoting

#fn (Kind) precedence

direct_array_accessinline
fn (kind Kind) precedence() int

precedence returns the precedence of the given token kind if defined, otherwise 0

#fn (Kind) is_relational

inline
fn (tok Kind) is_relational() bool

#fn (Kind) is_start_of_type

inline
fn (k Kind) is_start_of_type() bool

#fn (Kind) is_prefix

inline
fn (kind Kind) is_prefix() bool

#fn (Kind) is_infix

inline
fn (kind Kind) is_infix() bool

#fn (Kind) is_postfix

inline
fn (kind Kind) is_postfix() bool

#enum AtKind

pub enum AtKind {
	unknown
	fn_name
	method_name
	mod_name
	struct_name
	vexe_path
	file_path
	line_nr
	column_nr
	vhash
	vmod_file
	vmodroot_path
	vroot_path // obsolete
	vexeroot_path
	file_path_line_nr
}

@FN => will be substituted with the name of the current V function @METHOD => will be substituted with ReceiverType.MethodName @MOD => will be substituted with the name of the current V module @STRUCT => will be substituted with the name of the current V struct @FILE => will be substituted with the path of the V source file @LINE => will be substituted with the V line number where it appears (as a string).

@COLUMN => will be substituted with the column where it appears (as a string).

@VHASH => will be substituted with the shortened commit hash of the V compiler (as a string).

@VMOD_FILE => will be substituted with the contents of the nearest v.mod file (as a string).

@VMODROOT => will be substituted with the folder where the nearest v.mod file is (as a string).

@VEXE => will be substituted with the path to the V compiler @VEXEROOT => will be substituted with the folder where the V executable is (as a string).

@VROOT => the old name for @VMODROOT; sometimes it was used as @VEXEROOT; Note: @VROOT is now deprecated, use either @VMODROOT or @VEXEROOT instead.

Note: @VEXEROOT & @VMODROOT are used for compilation options like this:

#include "@VMODROOT/include/abc.h" #flag -L@VEXEROOT/thirdparty/libgc

The @XYZ tokens allow for code like this:

println( 'file: ' + @FILE + ' | line: ' + @LINE + ' | fn: ' + @MOD + '.' + @FN) ... which is useful while debugging/tracing.

@ is allowed for keyword variable names. E.g. 'type'

#enum Precedence

pub enum Precedence {
	lowest
	cond // OR or AND
	in_as
	assign // =
	eq // == or !=
	// less_greater // > or <
	sum // + - | ^
	product // * / << >> >>> &
	// mod // %
	prefix // -X or !X; TODO: seems unused
	postfix // ++ or --
	call // func(X) or foo.method(X)
	index // array[index], map[key]
	highest
}

Representation of highest and lowest precedence