Defining a word is actually a somewhat tricky proposition. It depends partly on your theory of language, and partly on some particular constraints regarding the language in question. Take English as an example — the verb “facebook” has entered into many speakers’ vocabulary, meaning something like “to look up on the social media site Facebook.” But when people started saying, “I’ll facebook you,” they also started saying things like, “She facebooked me.” Are “facebook” and “facebooked” two words, or one? What if you added a prefix, to get something like “unfacebook” or “refacebook”? Should those be considered all the same word, or different ones? It’s a tricky question to answer.
Mostly, linguists describe a word as a meaningful form found in a speaker’s lexicon, which is like your own mental dictionary of the language that you speak. That sidesteps some of the above issues, and it seems to imply that things like prefixes and suffixes should be considered words all on their own, but it’s a good enough working definition. It implies, for one thing, that the utterance “anonymous message” is two words, rather than one, because “anonymous” and “message” are each meaningful on their own, and each presumably has its own entry in a speaker’s lexicon.
So, to address your question: no, not every utterance is a word. But any whole form that an individual has stored in their lexicon is a word, absolutely. It’s highly unlikely — some might even say impossible — for any two speakers to have exactly identical lexicons, simply because it’s so implausible that they’ve had identical experiences with the language and learned an identical vocabulary. If I know the meaning of the word “spanghew,” and you don’t, our lexicons are slightly different. But it doesn’t make “spanghew” any less of a word, even if I’m the last speaker of English to know what it means. By the same logic, if I make up a new word, and I know what it means, it’s a valid word in my own lexicon. In any given conversational interaction, there’s basically always going to be at least one word that only one of the participants knows. So even if I’m the first or last speaker of the entire language population that knows this word, that doesn’t make it any less of a word.
When we talk about languages like English or Spanish or Japanese, we’re really talking about abstractions of linguistic behaviors over a population of speakers. So could you argue that “facebook” isn’t a real English word, on the basis that the majority of English speakers don’t know / use it? Sure, knock yourself out. That’s essentially the same thing we’re doing when we say English doesn’t have multiple modality (“he might could do that”), because that’s an invalid construction in the grammar rules of most English speakers.
The key, though, is to recognize that a feature of someone’s speech isn’t illegitimate in some way just because it’s not being used by a majority of speakers. It may not be representative of your larger abstracted language group, but it’s still a valid example of language! There are sizable groups of English speakers who talk about facebooking, and there are sizable groups of people whose variety of English includes grammatical rules for multiple modality. Even if those groups shrank to a size of one, they would still represent valid and regular uses of language — and the words would still be words.