In practice, even in quite rich foundational systems, let alone in pure set theory, representations of mathematical objects become unwieldy. For example, [bourbaki-sets] estimates28 that even the number 1 if fully expanded would occupy thousands of symbols in their system. So some kind of definitional mechanism is needed, at least for human consumption, and possibly to stop overflow even for a machine! The fundamental property of definitions is that it should be possible in principle to do without them, though we need to say carefully precisely what this means.
The simplest kinds of definitions are just shorthands for more complicated expressions, for example 2 rather than {0,1}, (x,y) rather than {x,{x,y}} and !x. P[x] rather than x. P[x] y. P[y] y = x. We can understand such definitions in two ways, either as metalogical abbreviations used to make formal texts more palatable to the reader, or as a method of sanctioning the addition (in the object logic) of new equational axioms. Generally, the latter is more efficient, since it keeps the underlying terms small, but the former is more flexible: the defined notion need not correspond to any kind of logical object, nor respect any underlying type system. If fact the Nuprl system allows boths kinds of definition.
Assuming that the object language has facilities for variable abstraction (something like lambda-calculus), then any definitional constructs of this kind can be reduced to new abbreviations or axioms of the form c = t where c is a new constant name any t any closed term. For example, instead of writing f(x) = E[x], we use f = x. E[x]. In any reasonable logical system the addition of such axioms makes no difference: in principle we could do without them (replace c by its expansion everywhere), and we do not derive any new facts not mentioning c. They thus fulfil all the criteria we could demand of definitions.
There is a more general means of extending the logic with new constant symbols: using names for things which have been proved to exist. If we can prove x. P[x], then adding a new constant c and an axiom P[c] is a conservative extension of most logical systems, in the sense that anything not involving c could equally well be deduced without using c or its axiom. (Simply because something like the natural deduction -elim rule is usually valid.) It's not quite conservative in the sense that it can be written away from formulas, though any formula Q[c] can be read as (x. P[x]) x. P[x] Q[x], or something like that. Given a theorem x. !y. P[x,y], it is usually conservative to add a new function symbol f and an axiom x. P[x,f(x)]. However if unique existence is weakened to existence, this can smuggle in the Axiom of Choice. The situation for constructive logics is even more delicate.
A useful feature in a practical logic is a descriptor. For example, we may allow a new term constructor x. P[x], read 'the (unique) x such that P[x]', together with an axiom (!x. P[x]) P[x. P[x]], or x. P[x], read 'some x such that P[x]', together with an axiom (x. P[x]) P[x. P[x]]. These are called definite and indefinite descriptors, respectively. Note that the latter is something like the Axiom of Global Choice if we allow P[x] to contain other free variables, and even the former is weakly nonconstructive because it effectively allows us to introduce a total function even though x. P[x] may be conditional on some other property. A descriptor is very useful for rendering informal notions of 'the x such that ...' (it's much used by Bourbaki), and can even be used to implement lambda-binding, set abstraction and so on, e.g. define x. E[x] to be f. x. f(x) = E[x].
If we can prove x. P[x] and we have an indefinite descriptor, then defining c = x. P[x] gives us P[c]. Conversely, for any term t we can prove x. x = t, so the existential kind of definition always sanctions the equational kind. There is thus a close relationship between the two kinds of definition, given a descriptor.
There are more complex definitional mechanisms which cannot be expanded away in such a simpleminded manner, but can nevertheless be understood as inessential abbreviations which could in principle be done without. These are usually referred to as contextual definitions. A good example is the use of ZF proper classes, where statements about 'set' membership are really an encoding of a logical fact. Another example: cardinal numbers can in principle be done without, since all statements about arithmetic and equality of cardinals may be regarded as statements about equinumerosity etc. of sets. For example instead of |X| = |Y|2 |Z| we can write X ~ (Y Y) Z. The most sophisticated example of all is using category-theoretic language like 'the category of all sets' purely as a syntactic sugar on top of quite different statements. One of the interesting questions in formalized mathematics is the extent to which this is possible.
In any case, we shouldn't lay too much stress on the theoretical eliminability of definitions. It's mainly a way of assuring ourselves that we aren't 'really' extending the logic. Life without definitions would be unbearably complicated, and anyway the whole significance of a theorem may subsist in the definitions it depends on.
There are other forms of definition, for example definition by recursion and definition of inductive sets or relations. However given a mathematical theorem justifying the existence of such a function, it can then be defined using one of the mechanisms given above. In programmable systems, the tedious matter of deriving the required case of this theorem can be automated completely.