When I try to compile C code that uses the gets() function with GCC,

I get a warning:

(.text+0x34): warning: the `gets' function is dangerous and should not be used.

I remember this has something to do with stack protection and security, but I'm not sure exactly why.

Can someone help me with removing this warning and explain why there is such a warning?

If gets() is so "dangerous" then why can't we remove it?

11 Answers 11

up vote 118 down vote accepted

In order to use gets safely, you have to know exactly how many characters you will be reading, so that you can make your buffer large enough. You will only know that if you know exactly what data you will be reading.

Instead of using gets, you want to use fgets, which has the signature

char* fgets(char *string, int length, FILE * stream);

(fgets, if it reads an entire line, will leave the '\n' in the string; you'll have to deal with that.)

It remained an official part of the language up to the 1999 ISO C standard, but it was officially removed by the 2011 standard. Most C implementations still support it, but at least gcc issues a warning for any code that uses it.

47 upvote
  flag
It's actually not gcc which warns, it's the glibc which contains a pragma or attribute on gets() that causes the compiler to emit a warning when used. – fuz

You can't remove API functions without breaking the API. If you would, many applications would no longer compile or run at all.

This is the reason that one reference gives:

Reading a line that overflows the array pointed to by s results in undefined behavior. The use of fgets() is recommended.

Because gets doesn't do any kind of check while getting bytes from stdin and putting them somewhere. A simple example:

char array1[] = "12345";
char array2[] = "67890";

gets(array1);

Now, first of all you are allowed to input how many characters you want, gets won't care about it. Secondly the bytes over the size of the array in which you put them (in this case array1) will overwrite whatever they find in memory because gets will write them. In the previous example this means that if you input "abcdefghijklmnopqrts" maybe, unpredictably, it will overwrite also array2 or whatever.

The function is unsafe because it assumes consistent input. NEVER USE IT!

2 upvote
  flag
What makes gets outright unusable is that it doesn't have an array length/count parameter that it takes; had it been there, it'd just be another ordinary C standard function. – legends2k
upvote
  flag
@legends2k: I'm curious what the intended usage for gets was, and why no standard fgets variant was made as convenient for use cases where the newline is not desired as part of the input? – supercat
1 upvote
  flag
@supercat gets was, as the name suggests, designed to get a string from stdin, however the rationale for not having a size parameter may have been from the spirit of C: Trust the programmer. This function was removed in C11 and the replacement given gets_s takes in the size of the input buffer. I've no clue about the fgets part though. – legends2k
upvote
  flag
@legends2k: The only context I can see in which gets might be excusable would be if one was using a hardware-line-buffered I/O system which was physically incapable of submitting a line over a certain length, and the intended lifetime of the program was shorter than the lifetime of the hardware. In that case, if hardware is incapable of submitting lines over 127 bytes long it might be justifiable to gets into a 128-byte buffer, though I would think the advantages of being able to specify a shorter buffer when expecting smaller input would more than justify the cost. – supercat
upvote
  flag
@legends2k: Actually, what might have been ideal would have been to have a "string pointer" identify a byte that would select among a few different string/buffer/buffer-info formats, with one value of prefix byte indicating a struct that contained the prefix byte [plus padding], plus the buffer size, used size, and address of the actual text. Such a pattern would make it possible for code to pass an arbitrary substring (not just the tail) of another string without having to copy anything, and would allow methods like gets and strcat to safely accept as much as will fit. – supercat
upvote
  flag
@legends2k: A buffer holding strings up to 63 bytes would only need one byte of overhead (use 63 prefix-byte values to indicate a "full" buffer, and 63 to indicate a buffer whose last byte will indicate the length of the string therein). Longer strings would have more overhead. Code which could receive an arbitrary string would call a standard-library function to generate a space+length+data structure; code that needed to resize a string could call a standard-library function to do that [if a prefix-byte value were used for "indirect reference to heap-allocated string, ... – supercat
upvote
  flag
...such a resize method could even perform realloc on heap-allocated strings, making it possible for a gets which accepted just a string parameter to accept either a direct reference to a small array on a local stack, or a reference to an "auto-sizing heap string" descriptor (allowing heap storage to be allocated as needed). – supercat
upvote
  flag
Communication with a forked-off child process that runs the same executable as the parent might be a perfectly safe use of gets because then you control both the reader and the writer. – PSkocik

I read recently, in a USENET post to comp.lang.c, that gets() is getting removed from the Standard. WOOHOO

You'll be happy to know that the committee just voted (unanimously, as it turns out) to remove gets() from the draft as well.

3 upvote
  flag
It is excellent that it is being removed from the standard. However, most implementations will provide it as a 'now non-standard extension' for at least the next 20 years, because of backwards compatibility. – Jonathan Leffler
1 upvote
  flag
Yes, right, but when you compile with gcc -std=c2012 -pedantic ... gets() will not get through. (I just made up the -std parameter) – pmg

fgets.

To read from the stdin:

char string[512];

fgets(string, sizeof(string), stdin); /* no buffer overflows here, you're safe! */
9 upvote
  flag
Even better: fgets(string, sizeof string, stdin); – John Bode

Why is gets() dangerous

The first internet worm (the Morris Internet Worm) escaped 27 years ago (1988-11-02), and it used gets() and a buffer overflow as one of its methods of propagating from system to system. The basic problem is that the function doesn't know how big the buffer is, so it continues reading until it finds a newline or encounters EOF, and may overflow the bounds of the buffer it was given.

You should forget you ever heard that gets() existed.

The C11 standard ISO/IEC 9899:2011 eliminated gets() as a standard function, which is A Good Thing™. Sadly, it will remain in libraries for many years (meaning 'decades') for reasons of backwards compatibility. If it were up to me, the implementation of gets() would become:

char *gets(char *buffer)
{
    assert(buffer != 0);
    abort();
    return 0;
}

Given that your code will crash anyway, sooner or later, it is better to head the trouble off sooner rather than later. I'd be prepared to add an error message:

fputs("obsolete and dangerous function gets() called\n", stderr);

Modern versions of the Linux compilation system generates warnings if you link gets() — and also for some other functions that also have security problems (mktemp(), …).

Alternatives to gets()

fgets()

As everyone else said, the canonical alternative to gets() is fgets() specifying stdin as the file stream.

char buffer[BUFSIZ];

while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
    ...process line of data...
}

What no-one else yet mentioned is that gets() does not include the newline but fgets() does. So, you might need to use a wrapper around fgets() that deletes the newline:

char *fgets_wrapper(char *buffer, size_t buflen, FILE *fp)
{
    if (fgets(buffer, buflen, fp) != 0)
    {
        size_t len = strlen(buffer);
        if (len > 0 && buffer[len-1] == '\n')
            buffer[len-1] = '\0';
        return buffer;
    }
    return 0;
}

Also, as caf points out in a comment and paxdiablo shows in his answer, with fgets() you might have data left over on a line. My wrapper code leaves that data to be read next time; you can readily modify it to gobble the rest of the line of data if you prefer:

        if (len > 0 && buffer[len-1] == '\n')
            buffer[len-1] = '\0';
        else
        {
             int ch;
             while ((ch = getc(fp)) != EOF && ch != '\n')
                 ;
        }

The residual problem is how to report the three different result states — EOF or error, line read and not truncated, and partial line read but data was truncated.

This problem doesn't arise with gets() because it doesn't know where your buffer ends and merrily tramples beyond the end, wreaking havoc on your beautifully tended memory layout, often messing up the return stack (a Stack Overflow) if the buffer is allocated on the stack, or trampling over the control information if the buffer is dynamically allocated, or copying data over other precious global (or module) variables if the buffer is statically allocated. None of these is a good idea — they epitomize the phrase 'undefined behaviour`.


There is also the TR 24731-1 (Technical Report from the C Standard Committee) which provides safer alternatives to a variety of functions, including gets():

§6.5.4.1 The gets_s function

Synopsis

#define __STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);

Runtime-constraints

s shall not be a null pointer. n shall neither be equal to zero nor be greater than RSIZE_MAX. A new-line character, end-of-file, or read error shall occur within reading n-1 characters from stdin.25)

3 If there is a runtime-constraint violation, s[0] is set to the null character, and characters are read and discarded from stdin until a new-line character is read, or end-of-file or a read error occurs.

Description

4 The gets_s function reads at most one less than the number of characters specified by n from the stream pointed to by stdin, into the array pointed to by s. No additional characters are read after a new-line character (which is discarded) or after end-of-file. The discarded new-line character does not count towards number of characters read. A null character is written immediately after the last character read into the array.

5 If end-of-file is encountered and no characters have been read into the array, or if a read error occurs during the operation, then s[0] is set to the null character, and the other elements of s take unspecified values.

Recommended practice

6 The fgets function allows properly-written programs to safely process input lines too long to store in the result array. In general this requires that callers of fgets pay attention to the presence or absence of a new-line character in the result array. Consider using fgets (along with any needed processing based on new-line characters) instead of gets_s.

25) The gets_s function, unlike gets, makes it a runtime-constraint violation for a line of input to overflow the buffer to store it. Unlike fgets, gets_s maintains a one-to-one relationship between input lines and successful calls to gets_s. Programs that use gets expect such a relationship.

The Microsoft Visual Studio compilers implement an approximation to the TR 24731-1 standard, but there are differences between the signatures implemented by Microsoft and those in the TR.

The C11 standard, ISO/IEC 9899-2011, includes TR24731 in Annex K as an optional part of the library. Unfortunately, it is seldom implemented on Unix-like systems.


getline() — POSIX

POSIX 2008 also provides a safe alternative to gets() called getline(). It allocates space for the line dynamically, so you end up needing to free it. It removes the limitation on line length, therefore. It also returns the length of the data that was read, or -1 (and not EOF!), which means that null bytes in the input can be handled reliably. There is also a 'choose your own single-character delimiter' variation called getdelim(); this can be useful if you are dealing with the output from find -print0 where the ends of the file names are marked with an ASCII NUL '\0' character, for example.

5 upvote
  flag
It's also worth pointing out that fgets() and your fgets_wrapper() version will leave the trailing portion of an over-long line in the input buffer, to be read by the next input function. In many cases, you will want to read-and-discard these characters. – caf
2 upvote
  flag
I wonder why they didn't add an fgets() alternative that allows one to use its functionality without having to make a silly strlen call. For example, an fgets variant which returned the number of bytes read into the string would make it easy for code to see if the last byte read was a newline. If the behavior of passing a null pointer for the buffer was defined as "read and discard up to n-1 bytes until the next newline", that would allow code to easily discard the tail of over-length lines. – supercat
upvote
  flag
@supercat: Yes, I agree -- it is a pity. The nearest approach to that is probably POSIX getline() and its relative getdelim(), which do return the length of the 'line' read by the commands, allocating space as required to be able to store the whole line. Even that can cause problems if you end up with a single-line JSON file that is multiple gigabytes in size; can you afford all that memory? (And while we're at it, can we have strcpy() and strcat() variants that return a pointer to the null byte at the end? Etc.) – Jonathan Leffler
upvote
  flag
@JonathanLeffler: What makes the fgets() situation particularly annoying is that on many systems there's no way to write a user function which behaves like fgets() but is anywhere near as fast. The same may be true of a few strcpy() implementations, but most will be comparable in speed to either strlen+memmove or to a simple character-by-character copy loop. – supercat
2 upvote
  flag
@supercat: the other problem with fgets() is that if the file contains a null byte, you can't tell how much data there is after the null byte up to the end of line (or EOF). strlen() can only report up to the null byte in the data; after that, it is guesswork and therefore almost certainly wrong. – Jonathan Leffler
upvote
  flag
@JonathanLeffler: Indeed that's a problem with fgets(). My point was that the lack of a decent string-concatenation function in the standard library doesn't prevent a programmer from writing one, but on many systems there's no portable way to write an fgets() replacement that will yield decent performance. – supercat
upvote
  flag
Seems to me you abandoned any regret that Unix-like systems abstain from Annex K here... – Deduplicator
upvote
  flag
@Deduplicator: Mostly, yes. Until I found that Microsoft's implementation didn't match the Annex K interface (even approximately; the number of arguments varies between the two for some functions), I had hopes that using Annex K (TR 24731-1) functions would allow for source code portability between Unix and Windows without needing conditional compilation and without requiring compiler-specific pragmata to suppress warnings, etc. Because they're different, they don't help after all. Sad, but so — IMNSHO of course. – Jonathan Leffler
4 upvote
  flag
"forget you ever heard that gets() existed." When I do this I run into it again and come back here. Are you hacking stackoverflow to get upvotes? – CandiedOrange
upvote
  flag
Considering worms, evil users and user input: the only way to get a null character before a '\n' is 1) if the user enters a null character as part of input. 2) Buffer is full 3) or maybe buffer size if pathologically small like 1, 0, or negative. Consider input <null character> <Enter>. This answer will enter the \n consuming while loop attempting to eat the next \n. fgets() is much better than gets(), but retains corner weaknesses. – chux
upvote
  flag
getline() is OK, but does not limit user input. This allows the evil user to consume/control memory usage. A robust user input function would limit the user input to some sane upper bound like fgets() and report the length read, like getline(). Should the buffer be pre-allocated or adjusted on the fly is of secondary concern. All-in-all I like your answer though. – chux
upvote
  flag
@chux: Typically, when you get '\0' before '\n', you're reading some sort of binary file (eg an executable) instead of the text file you expected — line-based input doesn't make sense on binary files, usually. But fgets() doesn't allow you to detect reliably how much data was read if the input contains any null bytes. If you're worried about memory over-consumption, then you have to write your own variant on POSIX getline() — it isn't very hard to write getline() (I've done it), and after you've designed an appropriate interface specifying the upper bound, neither is the alternative. – Jonathan Leffler
upvote
  flag
As you introduced the Morris Internet Worm, code considerations here are beyond typical. Also note that many text based files use UTF-16 encoding contain many null characters and using fgets() to inadvertently read one of those is a real possibility. Yes - writing ones own wrapper is likely the only way to defend against evil or errant user input. – chux
upvote
  flag
Chuck Falconer's public domain ggets function is a good alternative for systems where getline is unavailable. – jamesdlin

You should not use gets since it has no way to stop a buffer overflow. If the user types in more data than can fit in your buffer, you will most likely end up with corruption or worse.

In fact, ISO have actually taken the step of removing gets from the C standard (as of C11, though it was deprecated in C99) which, given how highly they rate backward compatibility, should be an indication of how bad that function was.

The correct thing to do is to use the fgets function with the stdin file handle since you can limit the characters read from the user.

But this also has its problems such as:

  • extra characters entered by the user will be picked up the next time around.
  • there's no quick notification that the user entered too much data.

To that end, almost every C coder at some point in their career will write a more useful wrapper around fgets as well. Here's mine:

#include <stdio.h>
#include <string.h>

#define OK       0
#define NO_INPUT 1
#define TOO_LONG 2
static int getLine (char *prmpt, char *buff, size_t sz) {
    int ch, extra;

    // Get line with buffer overrun protection.
    if (prmpt != NULL) {
        printf ("%s", prmpt);
        fflush (stdout);
    }
    if (fgets (buff, sz, stdin) == NULL)
        return NO_INPUT;

    // If it was too long, there'll be no newline. In that case, we flush
    // to end of line so that excess doesn't affect the next call.
    if (buff[strlen(buff)-1] != '\n') {
        extra = 0;
        while (((ch = getchar()) != '\n') && (ch != EOF))
            extra = 1;
        return (extra == 1) ? TOO_LONG : OK;
    }

    // Otherwise remove newline and give string back to caller.
    buff[strlen(buff)-1] = '\0';
    return OK;
}

with some test code:

// Test program for getLine().

int main (void) {
    int rc;
    char buff[10];

    rc = getLine ("Enter string> ", buff, sizeof(buff));
    if (rc == NO_INPUT) {
        printf ("No input\n");
        return 1;
    }

    if (rc == TOO_LONG) {
        printf ("Input too long\n");
        return 1;
    }

    printf ("OK [%s]\n", buff);

    return 0;
}

It provides the same protections as fgets in that it prevents buffer overflows but it also notifies the caller as to what happened and clears out the excess characters so that they do not affect your next input operation.

Feel free to use it as you wish, I hereby release it under the "do what you damn well want to" licence :-)

upvote
  flag
Actually, the original C99 standard did not explicitly deprecate gets() either in section 7.19.7.7 where it is defined or in section 7.26.9 Future library directions and the sub-section for <stdio.h>. There isn't even a footnote on it being dangerous. (Having said that, I see "It's deprecated in ISO/IEC 9899:1999/Cor.3:2007(E))" in the answer by Yu Hao.) But C11 did remove it from the standard — and not before time! – Jonathan Leffler
upvote
  flag
int getLine (char *prmpt, char *buff, size_t sz) { ... if (fgets (buff, sz, stdin) == NULL) hides the size_t to int conversion of sz. sz > INT_MAX || sz < 2 would catch strange values of sz. – chux
upvote
  flag
if (buff[strlen(buff)-1] != '\n') { is a hacker exploit as the evil user's first character entered could be an embedded null character rendering buff[strlen(buff)-1] UB. while (((ch = getchar())... has troubles should a user enter a null character. – chux

In C11(ISO/IEC 9899:201x), gets() has been removed. (It's deprecated in ISO/IEC 9899:1999/Cor.3:2007(E))

In addition to fgets(), C11 introduces a new safe alternative gets_s():

C11 K.3.5.4.1 The gets_s function

#define __STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);

However, in the Recommended practice section, fgets() is still preferred.

The fgets function allows properly-written programs to safely process input lines too long to store in the result array. In general this requires that callers of fgets pay attention to the presence or absence of a new-line character in the result array. Consider using fgets (along with any needed processing based on new-line characters) instead of gets_s.

I would like to extend an earnest invitation to any C library maintainers out there who are still including gets in their libraries "just in case anyone is still depending on it": Please replace your implementation with the equivalent of

char *gets(char *str)
{
    strcpy(str, "Never use gets!");
    return str;
}

This will help make sure nobody is still depending on it. Thank you.

The C gets function is dangerous and has been a very costly mistake. Tony Hoare singles it out for specific mention in his talk "Null References: The Billion Dollar Mistake":

http://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare

The whole hour is worth watching but for his comments view from 30 minutes on with the specific gets criticism around 39 minutes.

Hopefully this whets your appetite for the whole talk, which draws attention to how we need more formal correctness proofs in languages and how language designers should be blamed for the mistakes in their languages, not the programmer. This seems to have been the whole dubious reason for designers of bad languages to push the blame to programmers in the guise of 'programmer freedom'.

gets() is dangerous because it is possible for the user to crash the program by typing too much into the prompt. It can't detect the end of available memory, so if you allocate an amount of memory too small for the purpose, it can cause a seg fault and crash. Sometimes it seems very unlikely that a user will type 1000 letters into a prompt meant for a person's name, but as programmers, we need to make our programs bulletproof. (it may also be a security risk if a user can crash a system program by sending too much data).

fgets() allows you to specify how many characters are taken out of the standard input buffer, so they don't overrun the variable.

upvote
  flag
Note that the real danger is not in being able to crash your program, but in being able to make it run arbitrary code. (In general, exploiting undefined behavior.) – Tanz87

Not the answer you're looking for? Browse other questions tagged or ask your own question.