This week's exercise is to create a C program in order to determine if a bytestream contains non-ASCII data. Please name this file "filter-ascii.c".
Remember that ASCII is a 7 bit encoding; a byte that has the high bit set is considered to not be ASCII. That is, if you find a byte that is greater than 127, that byte is considered to be non-ASCII.
So your program "filter-ascii" should read from stdin (file descriptor 0); if all of the bytes that come across stdin are less than 128, then the data can be considered ASCII. If that's the case, then at the end of the bytestream, print a message "ASCII data":
bash-4.2$ ./filter-ascii < hosts ASCII data.
If you look at a hex dump for the file "hosts", it looks like this:
bash-4.2$ xxd < hosts 0000000: 3132 372e 302e 302e 3120 2020 6c6f 6361 127.0.0.1 loca 0000010: 6c68 6f73 7420 6c6f 6361 6c68 6f73 742e lhost localhost. 0000020: 6c6f 6361 6c64 6f6d 6169 6e20 6c6f 6361 localdomain loca 0000030: 6c68 6f73 7434 206c 6f63 616c 686f 7374 lhost4 localhost 0000040: 342e 6c6f 6361 6c64 6f6d 6169 6e34 0a3a 4.localdomain4.: 0000050: 3a31 2020 2020 2020 2020 206c 6f63 616c :1 local 0000060: 686f 7374 206c 6f63 616c 686f 7374 2e6c host localhost.l 0000070: 6f63 616c 646f 6d61 696e 206c 6f63 616c ocaldomain local 0000080: 686f 7374 3620 6c6f 6361 6c68 6f73 7436 host6 localhost6 0000090: 2e6c 6f63 616c 646f 6d61 696e 360a .localdomain6. bash-4.2$
However, if any of the bytes in the input stream are greater than 127, then your program should stop, and report the byte where the first non-ascii value was found, and print any immediately following non-ASCII bytes:
bash-4.2$ ./filter-ascii < i3status.conf Not an ASCII file, found a 195 value at byte 175: ß
(The final character there is a "sharp s" (or "eszett", if you prefer the German version)).
A hex dump of the first bytes in "i3status.conf" shows that, indeed, at bytes 175 and 176 (written at position 0xae since the hexdump starts counting at 0, not 1) that there are two non-ASCII values "0xc39f":
bash-4.2$ xxd < i3status.conf | head -20 0000000: 2320 6933 7374 6174 7573 2063 6f6e 6669 # i3status confi 0000010: 6775 7261 7469 6f6e 2066 696c 652e 0a23 guration file..# 0000020: 2073 6565 2022 6d61 6e20 6933 7374 6174 see "man i3stat 0000030: 7573 2220 666f 7220 646f 6375 6d65 6e74 us" for document 0000040: 6174 696f 6e2e 0a0a 2320 4974 2069 7320 ation...# It is 0000050: 696d 706f 7274 616e 7420 7468 6174 2074 important that t 0000060: 6869 7320 6669 6c65 2069 7320 6564 6974 his file is edit 0000070: 6564 2061 7320 5554 462d 382e 0a23 2054 ed as UTF-8..# T 0000080: 6865 2066 6f6c 6c6f 7769 6e67 206c 696e he following lin 0000090: 6520 7368 6f75 6c64 2063 6f6e 7461 696e e should contain 00000a0: 2061 2073 6861 7270 2073 3a0a 2320 c39f a sharp s:.# .. 00000b0: 0a23 2049 6620 7468 6520 6162 6f76 6520 .# If the abo
You can find the test files files "hosts" and "i3status.conf" at
https://www.cs.fsu.edu/~langley/CIS4930/Assignments/BinaryFormats/Assignment01-Files/hosts https://www.cs.fsu.edu/~langley/CIS4930/Assignments/BinaryFormats/Assignment01-Files/i3status.conf
You can also find my compiled version of a solution at
https://www.cs.fsu.edu/~langley/CIS4930/Assignments/BinaryFormats/Assignment01-Files/filter-ascii
My filter-ascii.c file ended up with 95 lines (1741 characters) in total. The header files that I used were:
#include <unistd.h> #include <stdint.h> #include <stdbool.h>
Please don't use any other header files in your code. I have also excerpted some of the bits from my code into a "skeleton" that you should use to write your code; it includes routines that substitute for routines that you might would have accessed using "stdio.h" (such as printf(3)) and "string.h" (such as strlen(3)):
https://www.cs.fsu.edu/~langley/CIS4930/Assignments/BinaryFormats/Assignment01-Files/filter-ascii--skeleton.c
Please submit your C file "filter-ascii.c" on Canvas by 11:59pm on Wednesday, May 28.