167 lines
6.4 KiB
Markdown
167 lines
6.4 KiB
Markdown
|
# HTTP Cookies
|
||
|
|
||
|
## Cookie overview
|
||
|
|
||
|
Cookies are `name=contents` pairs that an HTTP server tells the client to
|
||
|
hold and then the client sends back those to the server on subsequent
|
||
|
requests to the same domains and paths for which the cookies were set.
|
||
|
|
||
|
Cookies are either "session cookies" which typically are forgotten when the
|
||
|
session is over which is often translated to equal when browser quits, or
|
||
|
the cookies are not session cookies they have expiration dates after which
|
||
|
the client will throw them away.
|
||
|
|
||
|
Cookies are set to the client with the Set-Cookie: header and are sent to
|
||
|
servers with the Cookie: header.
|
||
|
|
||
|
For a long time, the only spec explaining how to use cookies was the
|
||
|
original [Netscape spec from 1994](https://curl.se/rfc/cookie_spec.html).
|
||
|
|
||
|
In 2011, [RFC 6265](https://www.ietf.org/rfc/rfc6265.txt) was finally
|
||
|
published and details how cookies work within HTTP. In 2016, an update which
|
||
|
added support for prefixes was
|
||
|
[proposed](https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-cookie-prefixes-00),
|
||
|
and in 2017, another update was
|
||
|
[drafted](https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-cookie-alone-01)
|
||
|
to deprecate modification of 'secure' cookies from non-secure origins. Both
|
||
|
of these drafts have been incorporated into a proposal to
|
||
|
[replace](https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-rfc6265bis-11)
|
||
|
RFC 6265. Cookie prefixes and secure cookie modification protection has been
|
||
|
implemented by curl.
|
||
|
|
||
|
curl considers `http://localhost` to be a *secure context*, meaning that it
|
||
|
will allow and use cookies marked with the `secure` keyword even when done
|
||
|
over plain HTTP for this host. curl does this to match how popular browsers
|
||
|
work with secure cookies.
|
||
|
|
||
|
## Super cookies
|
||
|
|
||
|
A single cookie can be set for a domain that matches multiple hosts. Like if
|
||
|
set for `example.com` it gets sent to both `aa.example.com` as well as
|
||
|
`bb.example.com`.
|
||
|
|
||
|
A challenge with this concept is that there are certain domains for which
|
||
|
cookies should not be allowed at all, because they are *Public
|
||
|
Suffixes*. Similarly, a client never accepts cookies set directly for the
|
||
|
top-level domain like for example `.com`. Cookies set for *too broad*
|
||
|
domains are generally referred to as *super cookies*.
|
||
|
|
||
|
If curl is built with PSL (**Public Suffix List**) support, it detects and
|
||
|
discards cookies that are specified for such suffix domains that should not
|
||
|
be allowed to have cookies.
|
||
|
|
||
|
if curl is *not* built with PSL support, it has no ability to stop super
|
||
|
cookies.
|
||
|
|
||
|
## Cookies saved to disk
|
||
|
|
||
|
Netscape once created a file format for storing cookies on disk so that they
|
||
|
would survive browser restarts. curl adopted that file format to allow
|
||
|
sharing the cookies with browsers, only to see browsers move away from that
|
||
|
format. Modern browsers no longer use it, while curl still does.
|
||
|
|
||
|
The Netscape cookie file format stores one cookie per physical line in the
|
||
|
file with a bunch of associated meta data, each field separated with
|
||
|
TAB. That file is called the cookie jar in curl terminology.
|
||
|
|
||
|
When libcurl saves a cookie jar, it creates a file header of its own in
|
||
|
which there is a URL mention that will link to the web version of this
|
||
|
document.
|
||
|
|
||
|
## Cookie file format
|
||
|
|
||
|
The cookie file format is text based and stores one cookie per line. Lines
|
||
|
that start with `#` are treated as comments. An exception is lines that
|
||
|
start with `#HttpOnly_`, which is a prefix for cookies that have the
|
||
|
`HttpOnly` attribute set.
|
||
|
|
||
|
Each line that specifies a single cookie consists of seven text fields
|
||
|
separated with TAB characters. A valid line must end with a newline
|
||
|
character.
|
||
|
|
||
|
### Fields in the file
|
||
|
|
||
|
Field number, what type and example data and the meaning of it:
|
||
|
|
||
|
0. string `example.com` - the domain name
|
||
|
1. boolean `FALSE` - include subdomains
|
||
|
2. string `/foobar/` - path
|
||
|
3. boolean `TRUE` - send/receive over HTTPS only
|
||
|
4. number `1462299217` - expires at - seconds since Jan 1st 1970, or 0
|
||
|
5. string `person` - name of the cookie
|
||
|
6. string `daniel` - value of the cookie
|
||
|
|
||
|
## Cookies with curl the command line tool
|
||
|
|
||
|
curl has a full cookie "engine" built in. If you just activate it, you can
|
||
|
have curl receive and send cookies exactly as mandated in the specs.
|
||
|
|
||
|
Command line options:
|
||
|
|
||
|
`-b, --cookie`
|
||
|
|
||
|
tell curl a file to read cookies from and start the cookie engine, or if it
|
||
|
is not a file it will pass on the given string. `-b name=var` works and so
|
||
|
does `-b cookiefile`.
|
||
|
|
||
|
`-j, --junk-session-cookies`
|
||
|
|
||
|
when used in combination with -b, it will skip all "session cookies" on load
|
||
|
so as to appear to start a new cookie session.
|
||
|
|
||
|
`-c, --cookie-jar`
|
||
|
|
||
|
tell curl to start the cookie engine and write cookies to the given file
|
||
|
after the request(s)
|
||
|
|
||
|
## Cookies with libcurl
|
||
|
|
||
|
libcurl offers several ways to enable and interface the cookie engine. These
|
||
|
options are the ones provided by the native API. libcurl bindings may offer
|
||
|
access to them using other means.
|
||
|
|
||
|
`CURLOPT_COOKIE`
|
||
|
|
||
|
Is used when you want to specify the exact contents of a cookie header to
|
||
|
send to the server.
|
||
|
|
||
|
`CURLOPT_COOKIEFILE`
|
||
|
|
||
|
Tell libcurl to activate the cookie engine, and to read the initial set of
|
||
|
cookies from the given file. Read-only.
|
||
|
|
||
|
`CURLOPT_COOKIEJAR`
|
||
|
|
||
|
Tell libcurl to activate the cookie engine, and when the easy handle is
|
||
|
closed save all known cookies to the given cookie jar file. Write-only.
|
||
|
|
||
|
`CURLOPT_COOKIELIST`
|
||
|
|
||
|
Provide detailed information about a single cookie to add to the internal
|
||
|
storage of cookies. Pass in the cookie as an HTTP header with all the
|
||
|
details set, or pass in a line from a Netscape cookie file. This option can
|
||
|
also be used to flush the cookies etc.
|
||
|
|
||
|
`CURLOPT_COOKIESESSION`
|
||
|
|
||
|
Tell libcurl to ignore all cookies it is about to load that are session
|
||
|
cookies.
|
||
|
|
||
|
`CURLINFO_COOKIELIST`
|
||
|
|
||
|
Extract cookie information from the internal cookie storage as a linked
|
||
|
list.
|
||
|
|
||
|
## Cookies with JavaScript
|
||
|
|
||
|
These days a lot of the web is built up by JavaScript. The web browser loads
|
||
|
complete programs that render the page you see. These JavaScript programs
|
||
|
can also set and access cookies.
|
||
|
|
||
|
Since curl and libcurl are plain HTTP clients without any knowledge of or
|
||
|
capability to handle JavaScript, such cookies will not be detected or used.
|
||
|
|
||
|
Often, if you want to mimic what a browser does on such websites, you can
|
||
|
record web browser HTTP traffic when using such a site and then repeat the
|
||
|
cookie operations using curl or libcurl.
|