CVE and NVD are two seperate programs
- CVE List was launched by MITRE in 1999
- NVD was launched by NIST(National Institute of Standards and Technology) in 2005
Relationship – The CVE List feeds NVD, which then builds upon the information included in CVE Records to provide enhanced information for each record such as fix information, severity scores, and impact ratings. As part of its enhanced information, NVD also provides advanced searching features such as by OS; by vendor name, product name, and/or version number; and by vulnerability type, severity, related exploit range, and impact.
In NVD, they provide
- CVE API
- CVE Data Feeds: actually from Mitre, and will be deprecated around Dec in 2023 (the actual deprecated date might be changed).
- CPE API
- CPE Data Feeds: actually from Mitre, and will be deprecated around Dec in 2023 (the actual deprecated date might be changed).
NVD officially recommands users to use the API instead of Data Feeds, while the API has strict rate limit, which is not suitable if there's a need to query with high-volume.
To overcome the API limit from NVD, we leverage the parameter lastModStartDate
and lastModEndDate
of APIs to get CPEs and CVEs that has been modified in certain time range, and periodically upsert them to the database. For the data that is earlier than 2008
, we download from CVE Data Feeds and CPE DataFeeds, transfer the format from mitre
to nvd
(while there are still some differences since nvd
mostly contains more information)
- To build the
nvdetl
executable
$ go build -o bin/ ./nvd/...
To dump data from NVD, use initdb
to initialize environment first, check Database for the details.
- To download from Data Feeds and upsert to database
-data
: data path to import, support local or web path which starts withhttp
. If it's not local path,nvdetl
download and untar the file, the temp file is in/tmp/
.
./bin/nvdetl -type cve -db-user <user> -db-pwd <pwd> -db-endpoint <dbendpoint> -data https://nvd.nist.gov/feeds/json/cve/1.1/nvdcve-1.1-2022.json.gz
- To get data from API and upsert to database
-sdate
: converted to value oflastModStartDate
key in query string-edate
: converted to value oflastModEndDate
key in query string-wait
(default:5s
): which is the seconds to sleep between each request to NVD to avoid rate limit-apikey
: API key for NVD, it's optional-batch
(default:1
): batch size to upsert data to database-timeout
(default:1m
): timeout for each API request to NVD
./bin/nvdetl -type cve -db-user <user> -db-pwd <pwd> -db-endpoint <dbendpoint> -sdate 2022-12-01 -edate 2023-01-01 [-wait 10s -batch 10]
- To download from Data Feeds and upsert to database
-data
: data path to import, support local or web path which starts withhttp
. If it's not local path,nvdetl
download and untar the file, the temp file is in/tmp/
.
./bin/nvdetl -type cpe -db-user <user> -db-pwd <pwd> -db-endpoint <dbendpoint> -data https://nvd.nist.gov/feeds/xml/cpe/dictionary/official-cpe-dictionary_v2.3.xml.gz
- To get data from API and upsert to database
-sdate
: converted to value oflastModStartDate
key in query string-edate
: converted to value oflastModEndDate
key in query string-wait
(default:5s
): which is the seconds to sleep between each request to NVD to avoid rate limit-apikey
: API key for NVD, it's optional-batch
(default:1
): batch size to upsert data to database-timeout
(default:1m
): timeout for each API request to NVD
./bin/nvdetl -type cpe -db-user <user> -db-pwd <pwd> -db-endpoint <dbendpoint> -sdate 2022-12-01 -edate 2023-01-01 [-wait 10s -batch 10]
- Mongo
- Database:
nvd
- Collection:
cve
- Index:
cveId
(unique). E.g.,CVE-2002-0392
- Index:
cpeName
. E.g.,cpe:2.3:a:apache:http_server:*:*:*:*:*:*:*:*
- Index:
cpeNameProductPrefix
. E.g.,cpe:2.3:a:apache:http_server
- Index(text):
keyword
, generated fromdescriptions.value
- Index:
- Collection:
cpe
- Index:
cpeName
. E.g.,cpe:2.3:a:apache:http_server:*:*:*:*:*:*:*:*
- Index(text):
keyword
, generated fromtitles.title
andrefs.ref
- Index:
- Collection:
The scripts to generate indexs: mongo-init.js, initdb
with -db-type=mongo
also creates collections and indexs.
- The records of collection
cve
is thecve
field of the item from fieldvulnerabilities
in NVD's API response and DataFeed. - The records of collection
cpe
is thecpe
field of the item from fieldproducts
in NVD's API response and DataFeed.
Since cveId
is set to unique index, every records that get from the source checks if there's record with same cveId
in collections.
- If there's no same
cveId
exists, then insert. - If there's record with same
cveId
, checklastModified
- If incoming record contains latest or equal
lastModified
, then replace. - If existed record contains latest
lastModified
, then skip with logging.
- If incoming record contains latest or equal
Note: The mitre
time format for lastModified
is 2023-01-01T00:20:20
, while the nvd
time format is 2023-01-01T00:20:20.168
. When converting the data from mitre
to nvd
, we can only filled the value from mitre
with .000
, so lastModified
that writes to db is different, while they actually represent the same record.
mitre
:2023-01-01T00:20:20.000
nvd
:2023-01-01T00:20:20.168
We take data from nvd
in first priority. Since lastModified
in nvd
always >=
lastModified
in mitre
when it represents same CVE modified event, the above upsert logics (always use latest lastModified
) can be automatically applied without differentiating the sources.
The only exception is, there might be very little chance that both contains same lastModified
(when nvd
timestamp ends with .000
... ), and if data from mitre
is dumped after nvd
, mitre
will replace the existing nvd
record. However, dumping data from mitre
is expected to be only run for the first time to fill stale CVEs that is not provided by nvd
. All the data should coming from nvd
afterwards.
To dump all the CVEs to database
- [One time] Dump from
mitre
(2002 - 2008 or later) - [One time] Dump from
nvd
(2008 - ) - [Schedule] Keep dumpping from
nvd
to catch up with the latest information
Below sample is the script to download all CVE from 1999-2022
(step 1.
and 2.
).
- Modify the ending year if needed (currently
2022
) - Fill the value of
-db-endpoint
,-db-user
,-db-pwd
- Increase
-db-timeout
if gettingupsert cve err: context deadline exceeded
.- Note: It takes some resource to import Mitre data feeds for
2021
- Note: It takes some resource to import Mitre data feeds for
interval
is the month range to query NVD API. Since the maximum allowable range is 120 consecutive days from NVD API Document,interval
should be less than4
4 months
might get404
response from NVD since it may exceed120 days
, we recommand to set this value to3
or less.
-wait
is the time duration between each NVD API request, since NVD has strict rate limit (they recommand to query one time each6s
)sleep 10
is also used to avoid the rate limit from NVD-batch
is the batch size when upserting data to database
CVE ETL script
- change path of
nvdetl
executable if needed
set -e
# 1. Import from mitre data feed, which should run before dumping from NVD Vulnerbility CVE API
# source: https://nvd.nist.gov/vuln/data-feeds
for i in {2002..2022};
do
url=https://nvd.nist.gov/feeds/json/cve/1.1/nvdcve-1.1-$i.json.gz
./nvdetl -type cve -db-endpoint <db endpoint> -db-user <db user> -db-pwd <db password> -data $url -db-timeout 5m -batch 100
done
# 2. Import from NVD Vulnerbility CVE API
# source: https://nvd.nist.gov/developers/vulnerabilities
# interval is the range of months query from NVD each time, modify if needed
# E.g., interval=3, query NVD with 3 months each time.
# 1. sdate: 2008-01-01 -> edate: 2008-04-01
# 2. sdate: 2008-04-01 -> edate: 2008-07-01
# 3. ...
interval=3
# count from interval to know the execution count of the scripts each year
# E.g., interval=3, then it needs to run nvdetl 12 / 3 = 4 times to dump data in one year
times=$((12 / interval))
for i in {2008..2022};
do
for ((j=0;j<$times;j++)); do
m=$(printf %02d $(($j*$interval+1)))
sdate=$i-$m-01
edate=$(date -d "$sdate+$interval month" +%Y-%m-%d)
./nvdetl -type cve -db-endpoint <db endpoint> -db-user <db user> -db-pwd <db password> -sdate $sdate -edate $edate -wait 10s -batch 100
sleep 10
done
done
CPE ETL script
- change path of
nvdetl
executable if needed
set -e
# 1. Import from mitre data feed, which should run before dumping from NVD CPE Dictionary
# source: https://nvd.nist.gov/products/cpe
url=https://nvd.nist.gov/feeds/xml/cpe/dictionary/official-cpe-dictionary_v2.3.xml.gz
./nvdetl -type cpe -db-endpoint <db endpoint> -db-user <db user> -db-pwd <db password> -data $url -db-timeout 5m -batch 100
# 2. Import from NVD Vulnerbility CPE API
# source: https://nvd.nist.gov/developers/products
# interval is the range of months query from NVD each time, modify if needed
# E.g., interval=3, query NVD with 3 months each time.
# 1. sdate: 2008-01-01 -> edate: 2008-04-01
# 2. sdate: 2008-04-01 -> edate: 2008-07-01
# 3. ...
interval=3
# count from interval to know the execution count of the scripts each year
# E.g., interval=3, then it needs to run nvdetl 12 / 3 = 4 times to dump data in one year
times=$((12 / interval))
for i in {2008..2022};
do
for ((j=0;j<$times;j++)); do
m=$(printf %02d $(($j*$interval+1)))
sdate=$i-$m-01
edate=$(date -d "$sdate+$interval month" +%Y-%m-%d)
./nvdetl -type cpe -db-endpoint <db endpoint> -db-user <db user> -db-pwd <db password> -sdate $sdate -edate $edate -wait 10s -batch 100
sleep 10
done
done