U***A 发帖数: 849 | 1 就是没有看懂题目。
Given a byte array, which is an encoding of characters. Here is the rule:
a. If the first bit of a byte is 0, that byte stands for a one-byte
character
b. If the first bit of a byte is 1, that byte and its following byte
together stand for a two-byte character
Now implement a function to decide if the last character is a one-byte
character or a two-byte character
Constraint: You must scan the byte array from the end to the start.
Otherwise it will be very trivial.
能否请高人给个例子。谢谢 | w****a 发帖数: 710 | | U***A 发帖数: 849 | | U***A 发帖数: 849 | | l**n 发帖数: 43 | 5 二进制表示的byte串
01101111 10101010 01010101
第一个byte 01101111 组成了第一个character,因为第一个byte以0开头
第二三个byte 10101010 01010101组成了第二个character,因为第二个byte以1开头
从尾向前扫描一遍,就可以判断最后一个byte是属于一个单字节的字符还是双字节的字符
【在 U***A 的大作中提到】 : 就是啊,我没有看懂题目,太弱了。
| U***A 发帖数: 849 | | b******i 发帖数: 914 | 7 Hi, not so hard using recursion:
Explanation:
For every index, I maintain three states: ONEBYTE (single byte),
TWOBYTEFIRST (1st element of a two-byte), TWOBYTELAST (2nd element of a two-
byte). If you now the state of i-1'th element, you can deduce the state of i
'th.
One minor case is that if the last element is identified as the first byte
of a two-byte, the below logic still returns one-byte. But this can be
modified.
------------------------
class Solution {
public:
int last_byte_class(string bytes) {
// return 1: one-byte, 2: two-byte
int n = bytes.length();
int st = helper(bytes, n-1);
if(st == TWOBYTELAST)
return 2;
else
return 1;
}
int helper(string bytes, int last_index) {
char c = bytes[last_index];
int first_bit = find_first_bit(c), curr_state;
if(last_index == 0) {
curr_state = (first_bit == 0) ? ONEBYTE : TWOBYTEFIRST;
} else {
int prev_state = helper(bytes, last_index-1);
if(prev_state == ONEBYTE || prev_state == TWOBYTELAST)
curr_state = (first_bit == 0) ? ONEBYTE: TWOBYTEFIRST;
else
curr_state = TWOBYTELAST;
}
return curr_state;
}
int find_first_bit(char c) {
return (c >> 7) & 1;
}
private:
enum State {ONEBYTE, TWOBYTELAST, TWOBYTEFIRST};
};
【在 U***A 的大作中提到】 : 就是没有看懂题目。 : Given a byte array, which is an encoding of characters. Here is the rule: : a. If the first bit of a byte is 0, that byte stands for a one-byte : character : b. If the first bit of a byte is 1, that byte and its following byte : together stand for a two-byte character : Now implement a function to decide if the last character is a one-byte : character or a two-byte character : Constraint: You must scan the byte array from the end to the start. : Otherwise it will be very trivial.
| n*n 发帖数: 157 | 8 是不就是当年 UC-DOS 这类系统实现汉字编码的原理嘛。
标准 ascii 码定义了127个符号。编号为1-127,转换为二进制就是 0000 0001 - 0111
1111 这个段。编号128-255的字符是扩展字符,在不同的语言中代表不同的符号。
但是汉字的数量远大于128,一个字节的编码是不够的,于是就需要两个字节来代表一
个汉字。在解码的是时候需要判断第一个字节的最高位是否为0,如果为0,那这个字节
代表一个ascii字符,如果是1,那么这个字节和下一个字节组成一个双字节的汉字字符
。 | s********x 发帖数: 81 | 9 不会出现nest的情况吧?
1aaa 1bbb 0ccc ... |
|